Skip to content

Instantly share code, notes, and snippets.

@VictorTaelin
Last active January 24, 2025 17:49
Show Gist options
  • Save VictorTaelin/23862a856250036f44fb8c5900fb796e to your computer and use it in GitHub Desktop.
Save VictorTaelin/23862a856250036f44fb8c5900fb796e to your computer and use it in GitHub Desktop.
New $10k Bounty - classify blocks that require edit in a 50K-token codebase refactor job

THE BOUNTY

Develop an AI tool that, given the current snapshot of HVM3's codebase and a refactor request, correctly assigns which chunks must be updated, in order to fulfill that request. The AI tool must use at most $1 to complete this task, while achieving a recall of at least 90% and a precision of at least 90%.

Input

  1. HVM3's chunked codebase snapshot (29-12-2024). (static, won't change)

  2. An arbitrary refactor request, such as:

    replace the 'λx body' syntax by '\x body'

Output

A list of chunks that require update. Example:

[215,256,295,353,358,364,369,374,414,433,480]

Validation

I will eval it in a set of private refactor requests.

If it passes the 95% recall, precision, F1 and $1 thresholds, it will be considered valid.

Prize

I will grant:

  • $5k to the fastest valid solution
  • $5k to the cheapest valid solution

Note that I expect this problem to be easy to solve (it should be), thus, the real challenge is to craft the fastest and cheapest solution. A single entry could win both categories.

Deadline

Your solution must be published by January 15, 2025.

Rules

  • To participate, you must join our Discord first.

  • Your solution must be an OSS, and posted on this Gist.

  • Every technique is allowed:

    • It can be just a prompt to an existing AI model;
    • You could make parallel AI calls, as I suggested;
    • You could fine-tune a model, like GPT-4o-mini;
    • You could dismiss LLMs and use any other technique;
    • Anything that passes valudation is considered valid.
  • Training on HVM3's codebase is allowed, but:

    • Total training cost must not exceed $10
    • Total training time must not exceed 1h

Tips

While the example input/output above can be easily solved (by just searching "λ"), real refactor requests require a real understanding of the codebase. For example, if I ask the AI to "make CTRs store only the CID in the Lab field, and move the arity to a global static object in C" - defining which chunks need update requires logical reasoning about the codebase and how different chunks interact. This can not be replicated with traditional RAG techniques - I believe - so I'm personally bearish in these.

I currently think the best bet is to "summarize" the codebase to reduce token count (i.e., preventing the AI "large context dumbing" effect), and then send send it a single target chunk for it to evaluate. Then, doing this in parallel for each chunk, using a fine-tuned small LLM. But that's just my take and I'm sure others will have much better ideas, right?

@umaplehurst
Copy link

The AI tool must use at most $1 to complete this task, while achieving a recall of at least 90% and a precision of at least 90%.
[...]
If it passes the 95% recall, precision, F1 and $1 thresholds, it will be considered valid.

There's a mismatch between these two statements -- one says 90%, the other says 95%.

@russedavid
Copy link

@CiberNin
Copy link

For the given example input is the given example output correct. I think there is some ambiguity in the wording such that it could be interpreted that those are examples of completely unrelated possible input & output.

@bjsi
Copy link

bjsi commented Jan 14, 2025

@Lytes
Copy link

Lytes commented Jan 15, 2025

@Zaffer
Copy link

Zaffer commented Jan 15, 2025

@benjamin-asdf
Copy link

Discord link is invalid. But I am benjaminasdf (user id 240081690058424320) on discord and joinend HOC.
My attempt: https://github.com/benjamin-asdf/chunk-analyzer

@drbh
Copy link

drbh commented Jan 17, 2025

Unofficial submission (post the deadline) but thought I'd share anyway 🙂

https://github.com/drbh/code-chunk

@VictorTaelin
Copy link
Author

Hey, just letting you know that we'll review and benchmark the submissions above and announce a winner soon (hopefully the next week or after)

@Lorenzobattistela
Copy link

@bjsi Hey, how is it going? I'm trying to run your code to evaluate but I'm bumping into RateLimit errors, although using a valid (normal) gemini api key.

 File "/Users/lorenzobattistela/work/aoe/hvm3-refactor-bounty/src/prompts/classify_symbols.py", line 81, in classify_symbols
    if result.is_related and convert_confidence_to_num(result.confidence) >= 0.75]
       ^^^^^^^^^^^^^^^^^
AttributeError: 'RateLimitError' object has no attribute 'is_related'

Does it need a specific setup? Thanks

@bjsi
Copy link

bjsi commented Jan 24, 2025

@Lorenzobattistela Hey! Sorry about that, to get around the issue I lowered ASYNC_MAX_WORKERS in main.py to a lower number. It will be slow, but should avoid rate limits.

@Lorenzobattistela
Copy link

@bjsi yeah, I read that comment, but still getting that error even when running with only 1 ASYNC_MAX_WORKERS. Will try again, perhaps the key is not adequate?

@bjsi
Copy link

bjsi commented Jan 24, 2025

@Lorenzobattistela Huh, that is weird. Did you set up billing? I could DM you my key if you let me know your Discord ID.

@Lorenzobattistela
Copy link

My discord profile is lorenzo_wb, send me a message ther please, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment