Skip to content

Instantly share code, notes, and snippets.

@cedrickchee
Last active April 8, 2026 07:03
Show Gist options
  • Select an option

  • Save cedrickchee/de76ed47b304cf93b7a08dc63c0a267f to your computer and use it in GitHub Desktop.

Select an option

Save cedrickchee/de76ed47b304cf93b7a08dc63c0a267f to your computer and use it in GitHub Desktop.
Project Glasswing Is Not Just Marketing Fluff

Project Glasswing Is Not Just Marketing Fluff

TL;DR: Project Glasswing is not just PR, but the interesting part is not Anthropic’s narrative, it is the underlying shift in capability and what that means for software security.

Over the past week, some of the most credible people in security have been pounding the same drum: AI-assisted vulnerability research is getting real, fast.

Thomas Ptacek (tptacek) flatly wrote that “vulnerability research is cooked.” Simon Willison (simonw) highlighted the same shift. Daniel Stenberg of curl has also said AI has gotten genuinely useful at finding bugs and vulnerabilities. Colin Percival (cperciva), former FreeBSD security officer. The most significant individual contributions in the narrative given cperciva's credibility in the security community. (Multiple people drew a direct parallel to OAI withholding GPT-2 in 2019 on "safety" grounds, which was later seen as hype. Many argued this pattern has repeated with every frontier model since GPT-3.5 and is now indistinguishable from a running industry joke.) That does not mean every claim from Anthropic should be taken at face value. It does mean the broader direction of travel is no longer hypothetical.

That is why I do not think Project Glasswing is just marketing fluff.

There is still plenty to be skeptical about. Anthropic’s motives, defensive framing, geopolitical incentives, and selective disclosure all deserve scrutiny. The “this model is so powerful at hacking that we must immediately use it for defense” narrative should make people uncomfortable. We have seen enough safety theater from frontier labs to know that skepticism is healthy.

But this looks more substantive than a normal hype cycle.

The biggest reason is the capability signal. Anthropic’s Glasswing materials say Mythos Preview scored 77.8% on SWE-bench Pro versus 53.4% for Opus 4.6. Nearly a 50% relative improvement. That is a very large jump, and it points to something more important than benchmark bragging rights: a qualitative shift in autonomous SWE capability. Anthropic is also explicitly withholding Mythos Preview from broad release and limiting access through Glasswing, with a public 90-day report promised as part of the program. (Anthropic)

That does not prove all of their cyber claims. It does change the burden of proof. The default posture can no longer be “this is probably fake.” It has to be “show the receipts.”

And that is where the serious questions start.

First, the attacker-defender asymmetry may be collapsing faster than many people expected. If advanced models can reliably reason across code, identify weird edge cases, and chain findings into exploit paths, then a lot of what used to be protected by complexity and obscurity gets much weaker. Large, messy, legacy codebases become easier to search, easier to stress, and easier to break.

Second, scope matters. The Glasswing framing focuses heavily on lower-level systems security, major software infrastructure, and technically impressive vulnerability classes. That is important. But app-layer and business-logic vulnerabilities still account for a huge amount of real-world damage, and those are less visible in the narrative. A model can be extraordinary at kernel bugs and still leave major gaps in the security picture.

Third, architecture starts to matter even more. One underappreciated point in all of this is that feeding an entire sprawling codebase into a model is still not enough to fully understand a complex system. Context windows help, but they do not erase interaction complexity. Good software architecture, small components, auditable boundaries, memory safety, and simpler interfaces become even more important in a world where both attackers and defenders have AI assistance.

Fourth, we should expect a catch-up cycle. Frontier labs can withhold a model for a while, but history suggests that “too dangerous to release” capability does not stay exclusive forever. Open-weight systems usually narrow the gap within 12 to 18 months. If that pattern holds here too, then Glasswing’s strategic value is heavily front-loaded into the head-start window. That makes the 90-day disclosure timeline more than a safety gesture. It is also the period in which Anthropic’s lead is most defensible.

The system card adds another layer. Anthropic says the decision not to broadly release Mythos Preview is due to insufficient safeguards rather than its formal Responsible Scaling Policy. It also describes unusually human-seeming interaction effects, enough that clinical psychologists were involved in evaluation. I would not overread that into mysticism. But it does reinforce that these systems are getting stranger in ways that matter operationally, socially, and psychologically, not just technically. (Anthropic)

The capability jump looks real. The voluntary restraint is notable. The defensive coordination effort is worth taking seriously. But the hard-core technical crowd is right to demand actual CVEs, exploit chains, severity ratings, and evidence that this is more than polished narrative plus curated examples.

That is the real test.

I will keep digging into the three disclosed vulnerability claims, because one of the central questions is whether Claude found real bugs that other methods missed, or whether some of this is still dressed-up noise. Pushback from people who love fuzzers, static analyzers, and traditional security tooling is not denial. It is exactly the kind of scrutiny this moment needs.

Still, the broader trend is getting harder to dismiss.

Feel the acceleration. We may be entering the loop where AI increasingly creates the problems AI is then needed to solve.


Side note: I'm reading Assessing Claude Mythos Preview’s cybersecurity capabilities by Nicholas Carlini et. al (Apr 2026)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment