The debate over Mythos1, its capabilities2, and the U.S. export-control directive that prompted Anthropic to take both models offline quickly sorted people into preferred ideological camps.3
But a key point missing from the discussion is that Anthropic employees have agency and shape our future.
Somewhere in the mystique of AI, the public, including the chronically online, started treating new models like life forms, appearing from the ether instead of from human hands.
Selection
How far models are from reliable end-to-end autonomy across economic work in 2026
The jagged model intelligence frontier4 is best explained by targeted training. Teams at AI labs decide a capability matters, build datasets and evals around it, and push until the model gets better.
None of the gains in chess, CLI tool use, or Olympiad math competitions were incidental.5 There are blogs and podcasts, internal documents, messages, and meetings, and benchmarks that highlight the deliberate efforts.
Mythos' cyber capabilities are not an accidental, emergent property. Anthropic worked hard, optimizing an RL objective that advanced this capability in Mythos.6
Zero Days
Cybersecurity reinforcement learning loop with verifiable exploit rewards
Software vulnerabilities are a uniquely attractive reward signal because success is objectively verifiable. The exploit works or it fails. Perhaps it was positioned as an attempt to strengthen the model’s security awareness, but with their silence, one can only speculate on their intent.7
At worst, they knowingly advanced a cyber weapon. At best, they optimized toward dangerous adjacent capabilities and treated the result as surprise. But this did not emerge from making Claude a helpful personality,8 or optimizing for education, filling out spreadsheets, or writing stories.
The Question
Dario Amodei will never be able to credibly appear before Congress and state the cyber capabilities were never intentionally selected for, optimized for, or reinforced. In fairness, neither will Sam Altman. The difference is that OpenAI does not actively wrap its business or models in doom, and GPT-5.5's refusals are noticeably strong.9
The real regulatory question is: when do training decisions become dangerous-capability enhancement?
Seemingly, although poorly discussed or debated, it's when model-training pipeline decisions become the AI equivalent of virology programs that deliberately enhance dangerous capabilities and match the logic behind why we generally restrict gain-of-function research.
Is deploying a cyber weapon on any and every computer actually a human right necessary for acceleration, ASI, or software development?10
Society & Regulation
Where this sits in society is not like a nuclear weapon11 or gun,12 or even the internet.13
Today, frontier models enable education, entertainment, and productivity in unprecedented ways. Tomorrow, models will discover new machine learning research ideas and techniques. Despite these benefits, the use and shaping of the models is not a neutral activity.
At some point in the coming year, both OpenAI and Anthropic, in their pursuit of recursive self-improvement, likely stop serving frontier models to the public. Once their best models are trained on internal research traces and workflows, the labs will have strong incentives to keep them private, while offering a variant publicly.14
Regulators focused on model size, training compute, open weights, and ML-research performance, but none of these tell you what a model was built to do. Future regulatory focus should move upstream to the training decisions themselves: what labs choose to reward, what capabilities they deliberately elicit, and what they deploy at public scale.15
By release time, CISA should not be discovering a model’s capabilities for the first time. Labs should report the model’s objectives, data mix, and training techniques early enough for CISA to know what to test and how to evaluate the risks.
A model built for chat carries different risks than one built to advance frontier science.16
Note: this post began as a shorter version on X17 and features generated visuals using gpt-image
Footnotes
-
Mythos-class models like Fable are the same underlying model, with request classifiers wrapped around it. ↩
-
This frustration includes sandbagging AI research with Mythos via Fable, and the blocking of all requests within math, physics, biology, and security domains, including privacy. ↩
-
David Sacks argued that Mythos-level cyber risk is real and urgent while criticizing Anthropic’s credibility and posture toward the administration, Cal Newport described the major AI labs as “doom trolling” by warning about harms they continue racing to create, and Ben Thompson argued that Anthropic’s safety story aligns with its business, data, and power incentives. ↩
-
Ethan Mollick discusses the shape of jagged AI capability in The Shape of AI: Jaggedness, Bottlenecks and What Comes Next. ↩
-
On chess, see Dynomight’s investigation of LLM chess behavior and OpenAI’s note that GPT-4 pretraining included filtered PGN games in Weak-to-Strong Generalization. On CLI tool use, see OpenAI’s Codex launch and Anthropic’s SWE-bench agent writeup. On Olympiad math, see OpenAI’s IMO team discussion of reinforcement learning and test-time compute in Training Data. ↩
-
Anthropic describes using reinforcement learning to shape Claude’s reasoning in Teaching Claude Why. The same pattern explains many model-over-model gains: define the target behavior, reward it, and iterate. ↩
-
METR’s Frontier Risk Report treats dangerous capabilities like observed facts about frontier agents rather than foregrounding the human training choices, datasets, objectives, and deployment decisions that selected for those capabilities. ↩
-
Anthropic publishes Claude’s behavioral principles in Claude’s Constitution. ↩
-
OpenAI’s GPT-5.5 system card says the final launch stack blocked verified high-severity bio misuse jailbreaks and all verified high-severity cyber jailbreaks from its red-team campaigns. ↩
-
These questions push against several overlapping narratives: techno-optimist acceleration, arguments that regulators should regulate applications rather than model technology, and defenses of open source AI against restrictive regulation. Coverage of the shutdown also fed the perception that frontier AI can be withheld through unclear state power: The Verge emphasized the confusion around applying export controls to cloud model access, Just Security described the order as an unprecedented use of Commerce Department authority, and Business Insider argued the episode strengthened the case for open-weight and self-hosted alternatives. ↩
-
Nuclear weapons sit inside a dedicated legal regime that classifies weapons-design information as Restricted Data and regulates the atomic energy industry. See the Atomic Energy Act’s classification and declassification rules for Restricted Data. ↩
-
The Second Amendment protects the right to keep and bear arms, but that right coexists with laws governing firearm acquisition, licensing, prohibited persons, carrying, and use. See District of Columbia v. Heller and the Gun Control Act of 1968. ↩
-
The internet got Section 230, which shields platforms from publisher liability for user content. That legacy bargain now creates obvious social tension when algorithmic sites and apps shape what billions of people see. ↩
-
Anthropic’s recursive-self-improvement work already frames Claude as an accelerator for Anthropic’s own research, while Fable/Mythos analysis highlights its incentive to block frontier-model development by others. The risk record is also real: Mythos Preview improved on multi-step cyber-attack simulations, and Anthropic has documented Claude’s use in AI-orchestrated espionage and MITRE-mapped cyber threats. ↩
-
The June 2, 2026 Executive Order on AI innovation and security creates a classified cyber-capability benchmark, a covered-frontier-model framework, and up to 30 days of pre-release federal access. ↩
-
Nuclear plants need NRC licensing, investigational drugs need an FDA IND, and aircraft need FAA certification before deployment. ↩