Claude 4 | Stephen M. Walker II

Anthropic is back — finally — with Sonnet and Opus 4. Self-reported benchmarks push the models to the top of leaderboards.

Anthropic says they've fixed Claude 3.7's overeagerness, but in my tests, Sonnet 4 is still too eager to take actions I didn't request.

Opus 4 still behind Grok 3 and OpenAI o3 in its Swift development capabilities.