Here's a hot take: a lot of the vulnerability disclosure around AI models is inflated.
When you dig into the papers and the hearsay, a huge portion of the findings are source-assisted. And when you look at the bugs themselves, a lot of them are not exploitable. Or they're in low-tier targets. Or they're in projects that, as I said on stream, "if I sneezed at that project it would have fallen over."
I'm not picking on Anthropic specifically. This is a pattern across the agentic pen testing space broadly.
Before you make investment decisions based on these announcements, dig into the studies. Look at what type of vulnerabilities they're finding. Look at the targets. Look at whether the bugs have been validated as genuinely exploitable.
Critical mindset here is table stakes.
That last one is where I see models struggle. IDOR is a good example. I had an agent flag a "critical IDOR" on a client application recently. I looked at it. That data was public. The endpoint was supposed to be accessible. Without baked-in context about what's private vs. designed to be open, you get a lot of false positives and they're annoying.