Skip to content

Instantly share code, notes, and snippets.

@buger
Created February 10, 2026 10:52
Show Gist options
  • Select an option

  • Save buger/a89cecce2c9266f46de111b9090e2a9d to your computer and use it in GitHub Desktop.

Select an option

Save buger/a89cecce2c9266f46de111b9090e2a9d to your computer and use it in GitHub Desktop.

A few major ideas to understand:

Code context issue

Modern AI agentic loops, even claude code, quite bad at dealing with large codebases. They use a caveman approach - treat it as text, do some greps, read some context around it, and rely on agent "feeling" that it got enough of big picture. And they are very confidently say they are.

Unless you solve this context issue well, all the AI outputs will be very shallow, with a lot of false positives.

For this I have developed completelly different context engine, which treats code as code, knows about AST and so on https://github.com/probelabs/probe. One query with probe can be equal to 10 loops of claude code, yet with taking less context and having way deeper into your code.

How to reason accurately

Another tip is - protect your reasoning context at all costs - e.g. mixing code search, and reasoning on top of it in the same loop is a bad idea. First find exactly which code touch your request, extract only it, and then reason only on top of it without any noise. And overal having a custom agent which has access only to very limited amount of tools focused ONLY on reasoning and planning has a huge value. Also agent havign a tool to delegage, run focused subagents, makes huge difference.

And Probe for example, comes with own super specialized reasoning agent, optimised for code understandment. Other tips include things like having automatic followup to the agent, before its ready to answer the user, to validate the claims. Use JSON schemas with focused outputs - like if you have code-review - have JSON schema super optimized for this goal.

Orchestration

Now the actual multi-project configuration. You need to have some orchestrator layer, which knows about the architecture of the system, all its projects and skills required to do the job.

Rather have 100 agents do 1 thing well, then 1 agent doing 100 things.

Define the pipeline where each step has really clear resposibility. First step should be intent detection, which detects based on user query what kind of tools and capabilities will be required to answer, we can call it tags.

Lets say our intent is chat, and we need access to datadog, jira, and access to code.

Your next agent, will dynamically generate its prompt, based on given skills, and tags from previous steps, and dynamically add needed MCP tools or similar. Talk to the code is one of tools which is other agent.

Talk to code has its own router, and it has access to your high level architecture of your system, and probably your public/intenral docs, bunch of markdown. Its main goal to detect what kind of code resources needed to answer your question. For each project, it runs subagent to ask specialized query about user issue, and tries to glue it all together, and follow up untill all clear.

So essentiall we endup with multi layer of different steps and workflows, and agents callign other agents, but each step is super specialized and doing job very well. ANd as mentioned, protect your reasoning context at all costs!

Gluing it all together

You can see example of such workflow and assistant here https://github.com/probelabs/visor-quickstart And this is the actual product which super tuned for all this use-cases and even more https://probelabs.com/

Also my DM is always open and you can either write to https://www.linkedin.com/in/leonidbugaev/ or [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment