Claude Fable 5 and Mythos 5: The Cyber Safeguard Is Now the Product Boundary

TL;DR

What: Anthropic released Claude Fable 5 for general use and Claude Mythos 5, the identical underlying model with cybersecurity safeguards removed, restricted to Project Glasswing partners and a forthcoming trusted-access program.
Impact: Frontier offensive-cyber capability now sits behind a classifier-and-fallback boundary rather than being absent from the model, and a sanctioned variant ships with that boundary deliberately lifted. The capability curve moved again; assume the attacker side moves with it.
Fix / mitigation: Treat AI-speed exploitation as a baseline assumption in your threat model, not an edge case. Instrument detection for hours-not-days exploit timelines, apply least privilege to any agentic tooling, and build genuine AI fluency inside the security team so controls match how these systems actually behave.
Who's at risk: Every defender, and especially any organization whose security posture quietly assumes a model "will refuse" rather than relying on its own controls.

On June 9, 2026, Anthropic released Claude Fable 5, a Mythos-class model it describes as state of the art across most capability benchmarks. Alongside it came Claude Mythos 5: the same underlying model with its cybersecurity safeguards removed, made available only to authorized partners through Project Glasswing and a forthcoming trusted-access program.

The benchmark numbers are impressive, and we will get to them. But for defenders, the benchmarks are not the story. The story is structural. Frontier offensive-cyber capability is no longer something a model lacks. It is something a model has, held back by a control. And a vendor has now formalized a second product whose defining feature is that the control is switched off for vetted users. The safeguard has become the product boundary.

The Safeguard Is the Product, Not the Absence of Capability

Anthropic is explicit that Fable 5 and Mythos 5 are the same model. The difference is a set of safety classifiers layered on top. Three of them matter to anyone thinking about risk:

Cybersecurity protection blocks offensive cyber tasks: exploitation, reconnaissance, lateral movement, and agentic hacking. Anthropic states its classifiers "prevent Fable from making any progress on these tasks."
Biology and chemistry protection covers dual-use research risk, with safeguards the company describes as intentionally broad, prioritizing safety over user experience.
Distillation prevention blocks attempts to extract the model's capabilities for unauthorized reuse.

The mechanism is worth sitting with, because it is a meaningful design choice. When a classifier trips, the query does not get a refusal. It gets quietly answered by Claude Opus 4.8 instead, the prior-generation model without the frontier capability. Anthropic reports that more than 95% of Fable sessions involve no fallback at all, and that safeguards trigger in fewer than 5% of sessions. External testing claimed zero harmful single-turn requests succeeded across 30 different public jailbreak techniques, and that over 1,000 hours of external red-teaming produced no universal jailbreak.

The Detail Most Coverage Will Skip

The UK AI Safety Institute "made progress toward a universal jailbreak within a brief initial testing window." Read plainly: a capable, well-resourced red team made measurable headway against the boundary quickly. The safeguard is a strong control, not a law of physics. Treat it as a control, with the failure modes every control has.

This is a more honest architecture than a hard refusal, and arguably a better one. A silent downgrade to a weaker model is harder to probe than a refusal message that tells an attacker exactly where the fence is. But the defensive lesson is the same either way: the dangerous capability is present in the weights. What stands between it and misuse is a classifier and a fallback, both of which are software, and software has a defect rate.

What the Capabilities Mean for the Offense/Defense Balance

Strip the marketing and the capability claims still describe a model that compresses expert work into hours. Anthropic cites Stripe reporting that Fable 5 "compressed months of engineering into days" on a 50-million-line Ruby migration. It reports state-of-the-art coding scores at medium effort, senior-grade financial reasoning, and vision strong enough to rebuild a web app's source from screenshots alone and to complete Pokemon FireRed from raw game pixels with no helper harness. On the science side, internal teams describe accelerating parts of drug design by roughly ten times and running a week of autonomous genomics research that trained a custom model 100 times smaller than a published benchmark while outperforming it.

We track the security consequence of exactly this curve in our own reporting: AI-driven exploitation has already collapsed vulnerability windows from days to hours while median patch times drift past 40. A model that can autonomously plan and execute long-horizon engineering work is, viewed from the other side of the table, a model that can autonomously plan and execute long-horizon intrusion work once the safeguard is the only thing in the way. Mythos 5 is the explicit acknowledgment that the safeguard can be the only thing in the way, by design, for someone.

The Planning Assumption to Change

If any part of your security model rests on "an attacker would need a specialist to do that," retire it. The specialist is increasingly a frontier model under thin supervision. Reconnaissance, exploit development, and lateral movement are precisely the tasks these systems are getting good at, and precisely the tasks the public safeguard exists to block, which tells you where the capability already sits.

Project Glasswing and the Governance of Lifted Safeguards

Mythos 5 is gated. Access to the cyber-lifted variant runs through Project Glasswing partners; the biology-lifted path runs through a separate forthcoming program for researchers. Anthropic pairs this with controls that are themselves instructive: a mandatory 30-day data-retention window for all Mythos-class traffic on business accounts, data excluded from training and non-safety uses, logged human access, and an alignment assessment finding Mythos 5's level of misaligned behavior "low, and similar to that of Opus 4.8."

This is a reasonable governance posture, and it also creates a new trust boundary that defenders should name out loud. A pool of organizations now holds sanctioned access to a frontier model with offensive-cyber guardrails removed. That access is a high-value target in its own right. The relevant questions are familiar ones, asked of a new asset class: who holds the keys, how is that access authenticated and monitored, and what happens to the credential or the session if a Glasswing partner is itself compromised. Gated access is a control surface, and control surfaces get attacked.

It is also worth being fair about the upside. A safeguarded-by-default frontier model with a transparent fallback, broad dual-use limits, published red-team hours, and mandatory logging on the lifted variant is a more defensible design than the alternative of shipping raw capability to everyone. The architecture is not the problem. Pretending the architecture removes the underlying capability is the problem.

What Defenders Should Actually Do This Quarter

None of this calls for panic, and none of it is hypothetical enough to defer. The practical moves are continuous with where mature programs were already heading:

Plan for capability parity. Assume motivated adversaries reach frontier-grade assistance, whether through jailbreaks, a leaked or misused trusted-access credential, or open-weight models catching up. Build your threat model around the capability being available, not absent.
Instrument for AI speed. The meaningful metric is no longer just whether you can detect an intrusion, but whether you can detect it inside an hours-long exploitation window. Revisit alerting thresholds, patch SLAs, and the time-to-contain assumptions baked into your runbooks.
Lock down agentic tooling. The same long-horizon autonomy that makes these models useful is the autonomy that makes a compromised agent dangerous. Apply least privilege to tool scopes, require review before agents reach production, and log agent actions as first-class security telemetry. See our analysis of the agentic AI blind spot.
Build real AI fluency in the team. You cannot reason about a classifier-and-fallback boundary, or about the blast radius of a Glasswing-class credential, from a vendor datasheet. Hands-on familiarity is the prerequisite for controls that match reality.

The shift Fable 5 and Mythos 5 make concrete is not that AI can now do offensive security. That has been true and rising for a while. It is that a vendor has drawn the line cleanly: the capability is in the model, a control holds it back, and the control can be lifted for the right buyer. For defenders, the only safe assumption is to plan as if it has been lifted for someone who is not on your side.

Is your detection built for AI-speed intrusion?

RedEye Security assesses where your program assumes capability that an attacker no longer lacks, and what to change before it becomes an incident.

Talk to us