A few major ideas to understand:
Modern AI agentic loops, even claude code, quite bad at dealing with large codebases. They use a caveman approach - treat it as text, do some greps, read some context around it, and rely on agent "feeling" that it got enough of big picture. And they are very confidently say they are.
Unless you solve this context issue well, all the AI outputs will be very shallow, with a lot of false positives.
For this I have developed completelly different context engine, which treats code as code, knows about AST and so on https://github.com/probelabs/probe. One query with probe can be equal to 10 loops of claude code, yet with taking less context and having way deeper into your code.