Anthropic Unveils Claude Fable 5 with Built in Cybersecurity Guardrails

The new Mythos class model routes risky cybersecurity prompts to a less capable model while offering a defensive version to government partners.

CSBadmin
2 Min Read

The New Mythos Tier

Anthropic has released Claude Fable 5, marking the debut of its Mythos capability tier. This model surpasses the Claude Opus line, achieving state of the art results on complex, multi step reasoning tasks. The company acknowledges that such advanced capabilities are dual use, as the model can discover and exploit software vulnerabilities and perform agentic hacking across a full attack lifecycle.

Built in Safeguards

Rather than outright refusing risky prompts, Fable 5 routes suspicious requests to a less capable model. A classifier layer detects requests related to cybersecurity, biology, chemistry, or model distillation and hands those sessions to Claude Opus 4.8 instead. Users receive a notification when a fallback occurs. The company says the classifiers trigger in under 5% of sessions, meaning over 95% run on Fable’s full capability. Internal evaluations show the classifiers effectively block meaningful progress on offensive cybersecurity tasks.

Defensive Deployment

Anthropic is also offering Claude Mythos 5, the same model but with cybersecurity safeguards removed, to a restricted group of defenders and infrastructure providers. This version is deployed through Project Glasswing in collaboration with the US government. The company reports that external red teaming found no universal jailbreaks across more than 1,000 hours of testing, though the UK AI Safety Institute made early progress within a short testing window.

Source: Cyber Security News

CSBadmin

The latest in cybersecurity news and updates.

Share This Article
Follow:
The latest in cybersecurity news and updates.