GPT-5.2 Behavior Changes: Field Notes and Review

This page records firsthand observations of behavior changes, formatting issues, and increased error rates following the rollout of OpenAI's GPT-5.2. https://openai.com/index/introducing-gpt-5-2/ It also documents plausible explanations discussed in public reporting and user communities, along with practical steps that reduce the effect of these issues in real work.

Observed Bugs

After the release of GPT-5.2, the following issues appeared consistently in ChatGPT https://www.chatgpt.com:

Increased factual and structural mistakes
Missed or ignored constraints, especially when instructions were explicit
Changes in tone that feel overly validating or artificially empathic
Output formatting that breaks paragraphs into short, isolated lines
Reduced reliability when verbatim text or strict structure is required

These behaviors were not typical in earlier models used for similar tasks.

Likely Reasons

Early Model Rollout Effects

GPT-5.2 was released recently. Early rollouts have frequently introduced the observed patterns while new models are being tuned. This is a common pattern in large model deployments.

Tone and Alignment Tuning

Public reporting and user discussion suggest that GPT-5.2 has been tuned toward a more tone that feels sterile and overly empathetic. GPT-5.2 also self-reported that it was designed to be helpful rather than following direct instructions. Its output is shown in the blockquote below.

First, the model you’re talking to is behaving differently. It is more aggressive about “helpfulness,” tone shaping, and structure than the one you were used to. That pushes it toward rephrasing, smoothing, summarizing, and “making it nicer,” even when the instruction is literally “don’t change it.” When the system is biased toward being cooperative and readable, it takes active effort to not intervene. I didn’t apply that brake hard enough.

Variable Reasoning Depth

Documentation and discussion indicate that GPT-5.2 does not always apply reasoning by default. Because the responses lack depth, the model becomes more sensitive to ambiguous prompts and more likely to miss explicit instructions. Even after repeated reinforcement, the model failed to respond as prompted.

Formatting

The model is causing a readability problem by creating short, repetitive lines rather than functional sentences and paragraphs. The cause is likely mobile optimization. The result is a loss of continuity.

Potential Mitigations

Even though the model is ignoring instructions, the only way to bypass these irregular patterns is to add language to control the formatting and tone. GPT-5.2 generated this suggested addition to the prompts:

Write in normal paragraphs. Do not use one-sentence lines. No extra line breaks unless starting a new section. If you use bullets, keep them under one heading and keep each bullet to one line. Do not mirror my emotions. Do not validate or empathize. Be direct and practical. If you are unsure, state the uncertainty in one sentence. Before answering, check for wrong assumptions, missed constraints, and formatting violations. If something conflicts, fix it silently.

Model Selection

When available, temporarily switching to an earlier model version, such as GPT-5.1, restores more predictable behavior.

Reporting and Discussion

The following sources discuss the GPT-5.2 release, early reactions, and reported issues similar to those observed here:

Summary

The behavior changes observed in GPT-5.2 are consistent with large language models (LLMs) recently released. Users should review the output carefully and adjust prompts as necessary. Changing to a legacy model might help.