Skip to content

Instantly share code, notes, and snippets.

@bparanj
Created May 11, 2026 01:43
Show Gist options
  • Select an option

  • Save bparanj/a4b4849d80ada549158f8fe123aa4a82 to your computer and use it in GitHub Desktop.

Select an option

Save bparanj/a4b4849d80ada549158f8fe123aa4a82 to your computer and use it in GitHub Desktop.

Here's a hot take: a lot of the vulnerability disclosure around AI models is inflated.

When you dig into the papers and the hearsay, a huge portion of the findings are source-assisted. And when you look at the bugs themselves, a lot of them are not exploitable. Or they're in low-tier targets. Or they're in projects that, as I said on stream, "if I sneezed at that project it would have fallen over."

I'm not picking on Anthropic specifically. This is a pattern across the agentic pen testing space broadly.

Before you make investment decisions based on these announcements, dig into the studies. Look at what type of vulnerabilities they're finding. Look at the targets. Look at whether the bugs have been validated as genuinely exploitable.

Critical mindset here is table stakes.

That last one is where I see models struggle. IDOR is a good example. I had an agent flag a "critical IDOR" on a client application recently. I looked at it. That data was public. The endpoint was supposed to be accessible. Without baked-in context about what's private vs. designed to be open, you get a lot of false positives and they're annoying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment