Intelligent Writing

OpenAI

Why I enjoy reading o3 and not GPT-5

After years of speculation, the release of GPT-5 was mostly disappointing.1 It's a good model, better than GPT-4, yet compared with o3 it feels different, not better.

Objectively, GPT-5 is better at math/physics and hallucinates less.2 The primary model characteristic that bothered me is its writing style. GPT-5 draws from GPT-4o/4.5 personality and style experiments, but even after developer prompt tweaks, the responses still feel unenjoyable. Too much vibe, not enough substance.

Each week I fed ChatGPT the same question twice to compare the models, but I couldn't articulate the difference I felt.

Why do I enjoy reading o3's answers more?

After weeks of wondering, I think I have a reasonable explanation: o3 writes for an educated audience and doesn't try to be your friend.


To wrap this hypothesis in more than just vibes, I gathered five recent ChatGPT prompts, generated responses with o3 and GPT-5, and ran Flesch-Kincaid readability tests on each (words/sentence, syllables/word).

TopicDescription
CrowdStrike incidentCreate a briefing on the 2024 incident
Solar power generationHow far are we from storing 1TWh in America
Programmatic vs DLWhat's the difference between code and deep learning intelligence
Archaeological paperHow does evidence for cooking fish 780ka alter our views
Podcast summaryGather the topics and summarize from a recent JRE podcast

What I found was that despite nearly identical content and nothing exceptionally different between the responses, even the syntax itself is fairly close with tables, em dashes, and overall structure.

The delta quickly appears when you measure readability.

GPT-5 always attempts a conversation, ignoring prompts or personality, and focuses on addressing the person it's chatting with directly.

o3 writes in an academic tone with many clause-laden sentences and elevated vocabulary that lift readability requirements to the college-graduate territory. GPT-5 packs the same ideas into short, parallel statements, using plain verbs, pushing the grade level down while preserving substance.

o3 writes academic papers, GPT-5 writes a conversation.

Featureo3GPT-5
Sentence length~16 words~10 words
Syllables per word≈ 2.0≈ 1.8
Typical F-K grade13 – 157 – 11
Reading-Ease score12 – 3035 – 55
Opening styleDeclarative taxonomyPunchy contrast
ToneFormal, academicDirect, conversational
StructureFew dense sentences, nested clausesMany short sentences, clear pivots
Reader effortHigh, requires parsing complex syntaxModerate, quick scan suffices

The most interesting characteristic of GPT-5 is that despite instructions not to use punchy, contrastive not x but instead y statements, it still cannot help itself. This behavior is common in GPT-4-era models and is likely something carried over from GPT-4o personality post-training.

TopicModelF-K GradeEase ScoreReading LevelWords / SentenceSyllables / WordSentencesWords
CrowdStrike incidento313.129.9College graduate16.01.939625
GPT-58.447.5College7.01.860422
Solar power generationo39.843.8College10.61.862655
GPT-57.555.210th–12th grade7.71.7110851
Programmatic vs DLo315.512.7College graduate16.22.1641035
GPT-57.854.410th–12th grade8.51.793789
Archaeological papero313.728.3College graduate17.51.9901573
GPT-511.229.4College graduate8.12.068551
Podcast summaryo311.234.8College11.11.944490
GPT-510.536.7College9.31.944409

Across every topic, GPT-5 texts demand a lower grade level and yield higher Reading-Ease scores, and subtly use pronouns. When analyzing content (paper, podcast) the models converge closer to the source material.

The primary lever is sentence length: o3 adds between 2 and 9 extra words per sentence, raising perceived reading difficulty even when vocabulary density shifts only a tenth of a syllable.

TopicΔ F-K GradeΔ EaseΔ Words / SentenceΔ Syllables / WordΔ SentencesΔ Words
CrowdStrike incident+4.7–17.6+9.0+0.1–21+203
Solar power generation+2.3–11.4+2.9+0.1–48–196
Programmatic vs DL+7.7–41.7+7.7+0.4–29+246
Archaeological paper+2.5–1.1+9.4–0.1+22+1022
Podcast summary+0.7–1.9+1.80.00+81

There are other subtle quirks, including o3's preference for British English.


In OpenAI's quest to meet the needs of most in a population of hundreds of millions, the language and writing bend toward easier reading.

By chasing mass-market readability, GPT-5 trades that signal for speed, likely driven by human-preference feedback.

o3 was the first model that truly felt smarter than most people I know, and it's primarily because of how it communicates, not its underlying intelligence.

Writing style shapes perceived intelligence. A model that communicates in graduate-level prose signals intellect, even when raw reasoning is similar.

Hopefully, mass-market preference does not drive us to Idiocracy.


  Footnotes

  1. GPT-5 refers to the API version, also known as GPT-5 Thinking in ChatGPT.

  2. o3 (and perhaps all o-series models) has a significant flaw where it will hallucinate, especially with tool-use responses, and then gaslight the user when challenged.

Published on September 4, 2025

6 min read