Anthropic is back — finally — with Sonnet and Opus 4. Self-reported benchmarks push the models to the top of leaderboards.
Anthropic says they've fixed Claude 3.7's overeagerness, but in my tests, Sonnet 4 is still too eager to take actions I didn't request.
Opus 4 still behind Grok 3 and OpenAI o3 in its Swift development capabilities.