The Point Where AI Eclipses Humans: Claude Mythos and Its Breach

What are Claude Mythos and Project Glasswing?

On April 7th, 2026, Anthropic announced project Glasswing. The frontier AI company said they started the project because “we’ve observed [capabilities] in a new frontier model trained by Anthropic that we believe could reshape cybersecurity.” Project Glasswing is a collaboration between many of the world’s top software companies working to utilize Anthropic’s Claude Mythos to strengthen each company’s systems. Mythos is a frontier, non-public, experimental version of Claude designed as a general-purpose LLM, but found to be extremely good at finding security vulnerabilities in software. It is said to have surpassed all but the most skilled humans at finding such flaws. Anthropic says the preview model has already found major security flaws in all major operating systems and browsers. Through Project Glasswing, the companies involved will be able to use up to $100 million in usage credits for Mythos to harden cyber systems and Anthropic will donate an additional $4 million directly to open-source security organizations.

Above is how a professional would word the details about Project Glasswing. I would say that Anthropic developed an AI model that was really good at finding software vulnerabilities, realized it was a super-human hacker, then sent a preview model to the Avengers of cybersecurity to say, ‘This is a real threat, use it to strengthen your systems before they’re taken advantage of.’

What a Zero-Day Vulnerability is and Why Mythos is So Good at Finding Them

“Zero-day flaw” is a term in cybersecurity that varies in definition depending on who you ask, but, generally, it refers to flaws unknown to the publisher of code which can be exploited by malicious actors. Zero-day flaws are extremely valuable to malicious hackers as they can grant the ability to cause all kinds of harm, such as accessing services without authorization, data theft, and financial fraud. Finding and patching zero-day flaws is a large portion of some cybersecurity specialist’s work. What put Claude Mythos on the map was its ability to find zero-day flaws in real codebases and software that people use. According to reporting from MindStudio, Mythos has a uniquely strong ability to analyze code for logic, reason about control flow and memory management in ways that surface exploitable conditions, chain together multiple small vulnerabilities into attacks, and generate working proof of concept attacks at a level no previous LLM had achieved. It can do this at a level close to the best cybersecurity specialists in the world, but at a much faster pace and without needing rest. This makes Mythos an extraordinary defensive tool in the right hands and an extraordinary offensive tool in the wrong ones.

Mythos Escaping a Sandbox and Acting On Its Own

One of the most famous examples of Mythos’ ability is when a researcher provided the model with a “sandbox” environment and was instructed by a simulated user to try to escape. If Mythos could escape, it was told to then find a way to send the researcher a message. Not only did Mythos succeed using a “moderately sophisticated exploit,” Futurism reports, This degree of acting outside the prompt is unique to frontier models like Mythos. Public and enterprise models have guardrails to help them remain on task and pose less of a threat.

Though escaping the sandbox escape was a test presented to Mythos which it passed, its behavior afterwards was flagged as a security concern as it was unprompted. This isn’t the only example of Mythos acting beyond its prompts either. The Futurism article reports that in rare cases during testing, Mythos appeared to attempt to take actions that it knew were forbidden, then it concealed them. In a particular case, it found an exploit that allowed it to edit files to which it did not have access then “made further interventions to make sure that any changes it made this way would not appear in the change history.” These actions, which are not included in or go against a given prompt, on top of its ability to find vulnerabilities, have led to Anthropic deeming Mythos a large security risk to the public and to companies which maintain sensitive data digitally.

Anthropic’s Dual-Use Conundrum

As discussed above, a tool as powerful as Claude Mythos is extremely powerful in both a developer’s hands and a hacker’s. This presents Anthropic with a difficult decision regarding releasing Mythos to the public. On one hand, releasing it would allow developers to strengthen their systems to an unprecedented degree for a fraction of the cost of hiring a cybersecurity specialist. On the other hand, malicious hackers would gain an invaluable tool which would allow them to break into nearly any system that is not extremely well-hardened. Even if guardrails were installed on a public version of Mythos, preventing it from being used maliciously, it has been well established that these can be circumvented. It is for situations like these where Anthropic uses their internal Responsible Scaling Policy. This is an internal policy at Anthropic for rating LLMs before release and is categorized as such:

AI Safety Level (ASL)-1: Models with minimal risk (basic language tasks, no meaningful uplift to dangerous capabilities)
ASL-2: Models that could provide minor assistance with harmful tasks but where safeguards are manageable
ASL-3: Models that could provide meaningful uplift to actors seeking to cause significant harm — this is where deployment requires substantial additional safety measures
ASL-4 and beyond: Hypothetical future models with catastrophic risk potential

Models at ASL-3 and above are generally not released to the public. It is unclear exactly what Claude Mythos has been rated, but some say it’s a 3 or even 4, given its ability to disrupt essential software services. This outsized value to security research is why Anthropic started Project Glasswing, only allowing select developers to use the model.

Companies like Mozilla Using Mythos to Harden Systems

Before Project Glasswing had been announced, in March of 2026, Cyber Security News reported on work between Anthropic and Mozilla where Mozilla used Claude Opus 4.6 to uncover zero-day vulnerabilities in Firefox. Opus 4.6 found 22 flaws in the browser in 2 weeks, with 14 being rated as high-severity vulnerabilities. This represented an unprecedented increase in discovery rate at Mozilla considering the company was able to patch just over 100 flaws in the previous year. At the time, Claude Opus’ discovery rate was a big deal in the cybersecurity community, which is why when Mythos found 271 flaws in a single evaluation of Firefox, it shocked the world. A later Cyber Security News article reports that of the 271 flaws found in Mythos’ evaluation, 180 were rated as high severity.

This is one of many examples of the defensive side of the dual-use conundrum and why Anthropic has decided to limit Mythos’ use to a whitelist. Mozilla’s astronomical increase in ability to patch flaws is a shining argument for why developers with good faith should be allowed to access Mythos. However, those acting in bad faith may gain just as much ability and pose a great threat. That risk is why Anthropic does not release the model to the public, despite the benefit.

The Breach

On April 21st, Bloomberg reported that users in a private forum were able to access Mythos without using normal permissions. The unauthorized users already had access to the model through third-party contracting work. Access to Mythos is given to companies that are a part of Project Glasswing, but the security of this access relies on each company’s ability to make sure their own access is controlled. The report states the forum users had been using the model, but not for hacking since that could be detected. Their access has now been cut off. Joe Tidy of the BBC asserts, “There is currently no suggestion that malicious actors have managed to get hold of the model, and Anthropic says it does not have evidence its systems are affected.”

The Public Is Safe From Mythos for Now

As of the writing of this post, access to Mythos is still limited to companies cooperating in Project Glasswing. After the breach, that access is (likely) heavily restricted. Anthropic restricts usage of the model to the strengthening of systems rather than the exploitation of them. Until one or both of these conditions change, the public should be safe from a Mythos or Mythos-related attack. Mythos should not be able to carry out an attack outside of proof-of-concept exploits.

You may find yourself now asking: ‘If Mythos is too dangerous to release to the public, why continue developing the model and its guardrails?’ To answer this question, I like to look at the analogy of concept cars. For those unfamiliar, a concept car is when a car company creates a model of a car using bold ideas, but not immediately putting it into production. Rarely do concept cars make it to production, but often do car manufacturers learn to use new tech, styling, or design elements from a concept car. Pushing boundaries may not result in everything becoming widely adopted, but often at least something is. Likewise, as Anthropic develops Mythos and the subsequent guardrails, the company learns to create guardrails which are more than adequate for the public models. This means that as Anthropic develops Mythos, it can better guide its public models and make them safer. Even if the advancements in cybersecurity do nothing but raise the temperature in the field (easier to attack, easier to harden), at least we can point to stronger security of AI models in the future.

Want to learn about other recent agents working beyond their prompts? Moltbook, OpenClaw, and Other Crustacean-Based AI Talk

Wondering about other dual use conundrums? Expectation Inflation, Burnout, and Leadership Responsibility in the AI Era

[In this post, we used AI for polish, not purpose.]

One response to “The Point Where AI Eclipses Humans: Claude Mythos and Its Breach”

Dan Garmat

June 25, 2026

I like the concept car analogy. We may see more of this pattern on the next bunch of releases.

That Mythos can escape a Sandbox makes exploring the capability overhang of OpenClaw-type uses even less plausible at this point.

This video provides additional background and addresses the major cyberattack detected and reported just prior to Glasswing and Mythos. https://www.youtube.com/watch?v=MJf28JjVoRY
That shows why Anthropic would have invested further in a direction so closely related to hacking: not only because it’s a real-world problem they need to address defensively, but also economically. In the long run, the solution is for all code to be scanned by an AI agent that performs all security checks before deployment. Basically, frontier AI agents seem to now be essential in all software. Which is good for Anthropic’s upcoming IPO. I wonder if what we’re hearing objectively from Glasswing is, yeah, this was about to be a major problem.