After years of speculation, the release of GPT-5 was mostly disappointing.1 It's a good model, better than GPT-4, yet compared with o3 it feels different, not better.
Objectively, GPT-5 is better at math/physics and hallucinates less.2 The primary model characteristic that bothered me is its writing style. GPT-5 draws from GPT-4o/4.5 personality and style experiments, but even after developer prompt tweaks, the responses still feel unenjoyable. Too much vibe, not enough substance.
Each week I fed ChatGPT the same question twice to compare the models, but I couldn't articulate the difference I felt.
Why do I enjoy reading o3's answers more?
After weeks of wondering, I think I have a reasonable explanation: o3 writes for an educated audience and doesn't try to be your friend.
To wrap this hypothesis in more than just vibes, I gathered five recent ChatGPT prompts, generated responses with o3 and GPT-5, and ran Flesch-Kincaid readability tests on each (words/sentence, syllables/word).
What I found was that despite nearly identical content and nothing exceptionally different between the responses, even the syntax itself is fairly close with tables, em dashes, and overall structure.
The delta quickly appears when you measure readability.
GPT-5 always attempts a conversation, ignoring prompts or personality, and focuses on addressing the person it's chatting with directly.
o3 writes in an academic tone with many clause-laden sentences and elevated vocabulary that lift readability requirements to the college-graduate territory. GPT-5 packs the same ideas into short, parallel statements, using plain verbs, pushing the grade level down while preserving substance.
o3 writes academic papers, GPT-5 writes a conversation.
The most interesting characteristic of GPT-5 is that despite instructions not to use punchy, contrastive not x but instead y
statements, it still cannot help itself. This behavior is common in GPT-4-era models and is likely something carried over from GPT-4o personality post-training.
Across every topic, GPT-5 texts demand a lower grade level and yield higher Reading-Ease scores, and subtly use pronouns. When analyzing content (paper, podcast) the models converge closer to the source material.
The primary lever is sentence length: o3 adds between 2 and 9 extra words per sentence, raising perceived reading difficulty even when vocabulary density shifts only a tenth of a syllable.
There are other subtle quirks, including o3's preference for British English.
In OpenAI's quest to meet the needs of most in a population of hundreds of millions, the language and writing bend toward easier reading.
By chasing mass-market readability, GPT-5 trades that signal for speed, likely driven by human-preference feedback.
o3 was the first model that truly felt smarter than most people I know, and it's primarily because of how it communicates, not its underlying intelligence.
Writing style shapes perceived intelligence. A model that communicates in graduate-level prose signals intellect, even when raw reasoning is similar.
Hopefully, mass-market preference does not drive us to Idiocracy.