VictorTaelin/large_refactor_edit_flag_bounty.md

Last active January 24, 2025 17:49

Star (5) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/VictorTaelin/23862a856250036f44fb8c5900fb796e.js"></script>
Save VictorTaelin/23862a856250036f44fb8c5900fb796e to your computer and use it in GitHub Desktop.

Download ZIP

New $10k Bounty - classify blocks that require edit in a 50K-token codebase refactor job

Raw

large_refactor_edit_flag_bounty.md

THE BOUNTY

Develop an AI tool that, given the current snapshot of HVM3's codebase and a refactor request, correctly assigns which chunks must be updated, in order to fulfill that request. The AI tool must use at most $1 to complete this task, while achieving a recall of at least 90% and a precision of at least 90%.

Input

HVM3's chunked codebase snapshot (29-12-2024). (static, won't change)
An arbitrary refactor request, such as:

replace the 'λx body' syntax by '\x body'

Output

A list of chunks that require update. Example:

[215,256,295,353,358,364,369,374,414,433,480]

Validation

I will eval it in a set of private refactor requests.

If it passes the 95% recall, precision, F1 and $1 thresholds, it will be considered valid.

Prize

I will grant:

$5k to the fastest valid solution
$5k to the cheapest valid solution

Note that I expect this problem to be easy to solve (it should be), thus, the real challenge is to craft the fastest and cheapest solution. A single entry could win both categories.

Deadline

Your solution must be published by January 15, 2025.

Rules

To participate, you must join our Discord first.
Your solution must be an OSS, and posted on this Gist.
Every technique is allowed:
- It can be just a prompt to an existing AI model;
- You could make parallel AI calls, as I suggested;
- You could fine-tune a model, like GPT-4o-mini;
- You could dismiss LLMs and use any other technique;
- Anything that passes valudation is considered valid.
Training on HVM3's codebase is allowed, but:
- Total training cost must not exceed $10
- Total training time must not exceed 1h

Tips

While the example input/output above can be easily solved (by just searching "λ"), real refactor requests require a real understanding of the codebase. For example, if I ask the AI to "make CTRs store only the CID in the Lab field, and move the arity to a global static object in C" - defining which chunks need update requires logical reasoning about the codebase and how different chunks interact. This can not be replicated with traditional RAG techniques - I believe - so I'm personally bearish in these.

I currently think the best bet is to "summarize" the codebase to reduce token count (i.e., preventing the AI "large context dumbing" effect), and then send send it a single target chunk for it to evaluate. Then, doing this in parallel for each chunk, using a fine-tuned small LLM. But that's just my take and I'm sure others will have much better ideas, right?

umaplehurst commented Dec 30, 2024

The AI tool must use at most $1 to complete this task, while achieving a recall of at least 90% and a precision of at least 90%.
[...]
If it passes the 95% recall, precision, F1 and $1 thresholds, it will be considered valid.

There's a mismatch between these two statements -- one says 90%, the other says 95%.

russedavid commented Dec 30, 2024

https://github.com/russedavid/hvm3bountysubmission

CiberNin commented Dec 31, 2024

For the given example input is the given example output correct. I think there is some ambiguity in the wording such that it could be interpreted that those are examples of completely unrelated possible input & output.

benjamin-asdf commented Jan 15, 2025

Discord link is invalid. But I am benjaminasdf (user id 240081690058424320) on discord and joinend HOC.
My attempt: https://github.com/benjamin-asdf/chunk-analyzer

drbh commented Jan 17, 2025

Unofficial submission (post the deadline) but thought I'd share anyway 🙂

https://github.com/drbh/code-chunk

Author

VictorTaelin commented Jan 18, 2025

Hey, just letting you know that we'll review and benchmark the submissions above and announce a winner soon (hopefully the next week or after)

Lorenzobattistela commented Jan 24, 2025

@bjsi Hey, how is it going? I'm trying to run your code to evaluate but I'm bumping into RateLimit errors, although using a valid (normal) gemini api key.

 File "/Users/lorenzobattistela/work/aoe/hvm3-refactor-bounty/src/prompts/classify_symbols.py", line 81, in classify_symbols
    if result.is_related and convert_confidence_to_num(result.confidence) >= 0.75]
       ^^^^^^^^^^^^^^^^^
AttributeError: 'RateLimitError' object has no attribute 'is_related'

Does it need a specific setup? Thanks

bjsi commented Jan 24, 2025

@Lorenzobattistela Hey! Sorry about that, to get around the issue I lowered ASYNC_MAX_WORKERS in main.py to a lower number. It will be slow, but should avoid rate limits.

Lorenzobattistela commented Jan 24, 2025

@bjsi yeah, I read that comment, but still getting that error even when running with only 1 ASYNC_MAX_WORKERS. Will try again, perhaps the key is not adequate?

bjsi commented Jan 24, 2025

@Lorenzobattistela Huh, that is weird. Did you set up billing? I could DM you my key if you let me know your Discord ID.

Lorenzobattistela commented Jan 24, 2025

My discord profile is lorenzo_wb, send me a message ther please, thanks!

VictorTaelin/large_refactor_edit_flag_bounty.md

THE BOUNTY

Input

Output

Validation

Prize

Deadline

Rules

Tips

umaplehurst commented Dec 30, 2024

Uh oh!

russedavid commented Dec 30, 2024

Uh oh!

CiberNin commented Dec 31, 2024

Uh oh!

bjsi commented Jan 14, 2025

Uh oh!

Lytes commented Jan 15, 2025

Uh oh!

Zaffer commented Jan 15, 2025

Uh oh!

benjamin-asdf commented Jan 15, 2025

Uh oh!

drbh commented Jan 17, 2025

Uh oh!

VictorTaelin commented Jan 18, 2025

Uh oh!

Lorenzobattistela commented Jan 24, 2025

Uh oh!

bjsi commented Jan 24, 2025

Uh oh!

Lorenzobattistela commented Jan 24, 2025

Uh oh!

bjsi commented Jan 24, 2025

Uh oh!

Lorenzobattistela commented Jan 24, 2025

Uh oh!