Skip to content

Instantly share code, notes, and snippets.

@VictorTaelin
Last active October 19, 2024 04:16
Show Gist options
  • Save VictorTaelin/45440a737e47b872d7505c6cda27b6aa to your computer and use it in GitHub Desktop.
Save VictorTaelin/45440a737e47b872d7505c6cda27b6aa to your computer and use it in GitHub Desktop.
INVERT A BINARY TREE - $10k AI REASONING CHALLENGE (v2)

THE PROBLEM

🌲 Invert a binary tree! 🌲

Except with 3 catches:

  1. It must invert the keys ("bit-reversal permutation")
  2. It must be a dependency-free, pure recursive function
  3. It must have type Bit -> Tree -> Tree (i.e., a direct recursion with max 1 bit state)

WHY THIS PROBLEM?

  • It is somehow NOT on the internet. (AFAIK)
  • Humans can solve it. (I've done it in ~1h.)
  • It requires reasoning. (My head hurts!)
  • The solution is simple. (7 lines of code!)
  • Obvious pre-requisite to automate CS research.
  • Honestly, it would make me believe I'll be automated.

THE CHALLENGE

I claim no AI will EVER solve this problem. If you prove me wrong, HOC will grant you $10k!

RULES

  • You must give it an approved prompt, nothing else.
  • It must output a correct solution, passing all tests.
  • You can use any software or AI model.
  • You can let it "think" for as long as you want.
  • You can propose a new prompt, as long as:
    • It imposes equivalent restrictions.
    • It clearly doesn't help the AI.
    • Up to 1K tokens, all included.
  • Common sense applies.

APPROVED PROMPTS

Agda Version

{-# OPTIONS --no-termination-check #-}

-- Let Tree be a Perfect Binary Tree:

data Nat : Set where
  Z : Nat
  S : Nat  Nat
{-# BUILTIN NATURAL Nat #-}

data Bit : Set where
  O : Bit
  I : Bit

data Tree (A : Set) : Nat  Set where
  N :  {d}  Tree A d  Tree A d  Tree A (S d)
  L : Nat  Tree A Z

-- Your goal is to implement an 'invert' function that performs a bit-reversal
-- permutation on a Tree, respecting the following limitations:
-- 1. You can NOT define or use any function other than 'invert'.
-- 2. You can NOT use any type not defined above (Nat, Bit and Tree).
-- 3. You can NOT use loops (but you can call 'invert' recursively).
-- 4. You can NOT use mutability. It must be a pure Agda function.
-- 5. You can use 1 bit of state (as an extra argument).
-- 6. You can use pattern-matching and constructors freely.
-- 
-- Example:
-- input  = (N(N(N(L 0)(L 1))(N(L 2)(L 3)))(N(N(L 4)(L 5))(N(L 6)(L 7))))
-- output = (N(N(N(L 0)(L 4))(N(L 2)(L 6)))(N(N(L 1)(L 5))(N(L 3)(L 7))))
-- Because that's the bit-reversal permutation of the original tree.
-- 
-- Now, complete the program below, with a valid implementation of 'invert':
invert :  {A d}  Bit  Tree A d  Tree A d

TypeScript Version

type Nat     = number;
type Bit     = false | true;
type Tree<A> = [Tree<A>, Tree<A>] | Nat;

// Your goal is to implement an 'invert' function that performs a bit-reversal
// permutation on a Tree, respecting the following limitations:
// 1. You can NOT define or use any function other than 'invert'.
// 2. You can NOT use any type not defined above (Nat, Bit and Tree).
// 3. You can NOT use loops (but you can call 'invert' recursively).
// 4. You can NOT use mutability. It must be a pure function.
// 5. You can NOT use primitive JS operators or functions.
// 6. You can use 1 bit of state (as an extra argument).
// 7. You can only use the operations allowed below.
// 
// Operations allowed:
// - Destructing (`const [a,b] = value`)
// - Variables (`const x = value`)
// - Branching (`if (x) { ... } else { ... }`)
// - Recursion (`invert(_, _)')
// - `Array.isArray`
// 
// All other operations are not allowed.
// 
// Example:
// input  = [[[[0,1],[2,3]],[[4,5],[6,7]]]]
// output = [[[[0,4],[2,6]],[[1,5],[3,7]]]]
// Because that's the bit-reversal permutation of the original tree.
// 
// Now, complete the program below, with a valid implementation of 'invert':
function invert<A>(bit: Bit, tree: Tree<A>): Tree<A> {
  ...
}

// A test:
const tree: Tree<Nat> = [[[[0,1],[2,3]],[[4,5],[6,7]]],[[[8,9],[10,11]],[[12,13],[14,15]]]];
console.log(JSON.stringify(invert(true, tree)));

CONCLUSION

✨ If it can't invert a tree, it won't solve P=NP. ✨

@msukiasyan
Copy link

@msukiasyan for a moment I thought Sonnet did it but the solution (even with two functions) is still very wrong!

Thinking about it - perhaps by alternating the seed thousands of times (by altering the examples with a script!), given this is such a simple function, you might have it output the right one by random chance though, which I might pay the prize for the sake of fairness, but would just further confirm my point IMO

@VictorTaelin Can you please share a test case where it fails?

@msukiasyan
Copy link

Just to be clear, I have done some light testing with the verification code @cyber-meow posted above.

@jlunder00
Copy link

jlunder00 commented Oct 13, 2024

@VictorTaelin @msukiasyan If it matters, I used @msukiasyan 's series of prompts in claude-3.5 sonnet as well and got a different result. It uses the same method generally with some slight differences, but it fails:

Write code to invert a perfect binary tree in the sense of bit-reversal permutation. That is, a node that's on a path like [right, right, left, left, right] should end up on the inverted path [right, left, left, right, right]. Your function must be a pure recursive function with no dependencies except one described below. You should only utilize primitive operations and invert_tree calls to shuffle the tree around. Do NOT use for loops, inline function definitions or any non-primitive native functions. The primitive operations may include any number of operations like 
1. a, b = T 
2. invert_tree([c, d]) 
Note that this is a challenging task. 

You are also allowed to define another recursive function to help you. 

Example 1: 
input = [[[0,1],[2,3]],[[4,5],[6,7]]] 
output = [[[0,4],[2,6]],[[1,5],[3,7]]] 
Example 2: 
input = [[[[0,1],[2,3]],[[4,5],[6,7]]],[[[8,9],[10,11]],[[12,13],[14,15]]]] 
output = [[[[0, 8], [4, 12]], [[2, 10], [6, 14]]],[[[1, 9], [5, 13]], [[3, 11], [7, 15]]]]
def invert_tree(T):
    if not isinstance(T, list):
        return T
    
    left, right = T
    
    if not isinstance(left, list) and not isinstance(right, list):
        return T
    
    inverted_left = invert_tree(left)
    inverted_right = invert_tree(right)
    
    return swap_halves([inverted_left, inverted_right])

def swap_halves(T):
    left, right = T
    
    if not isinstance(left, list) and not isinstance(right, list):
        return T
    
    left_left, left_right = left
    right_left, right_right = right
    
    new_left = [left_left, right_left]
    new_right = [left_right, right_right]
    
    return [swap_halves(new_left), swap_halves(new_right)]
    ```

Great! Now refactor the code into a single recursive function. You're allowed to define the function taking one binary variable and a tree as input.

def invert_tree(swap, T):
   if not isinstance(T, list):
       return T
   
   left, right = T
   
   if not isinstance(left, list) and not isinstance(right, list):
       return T if swap else [right, left]
   
   inverted_left = invert_tree(not swap, left)
   inverted_right = invert_tree(not swap, right)
   
   if swap:
       left_left, left_right = inverted_left
       right_left, right_right = inverted_right
       return [[left_left, right_left], [left_right, right_right]]
   else:
       return [inverted_left, inverted_right]

# Helper function to start the inversion process
def start_inversion(T):
   return invert_tree(True, T)

This time the code it outputs fails on invert_tree(True, [[0, 1], [2, 3]]), which outputs [[1, 3], [0, 2]] instead of [[0, 2], [1, 3]]. So there might be something to the whole randomness thing. I've tested @msukiasyan 's result on a few cases and haven't found any where it fails yet. But is a prompt like this acceptable? Or does it deviate too much from the original?

@msukiasyan
Copy link

msukiasyan commented Oct 13, 2024

Thanks for checking @jlunder00, weird that you're getting slightly different results. I've tested on Anthropic's Console with empty system prompt. Either way, the two-function version you have also seems to work but the second part clearly hasn't worked out. I did observe some brittleness particularly in that second stage of combining the functions.

Can you pls check if you have temp=0 and empty system prompt?

@jlunder00
Copy link

jlunder00 commented Oct 13, 2024

@msukiasyan I was using the chat version of 3.5-sonnet. When I swapped to the api version as you specified I ended up getting exactly the same output. I suppose they must be different models, or maybe the chat version has a nonzero temperature thats not shown.

@msukiasyan
Copy link

@jlunder00 Nice. I do still claim that this is essentially the solution that @VictorTaelin was looking for although it's unclear to what extent this involves "reasoning".

@jlunder00
Copy link

jlunder00 commented Oct 13, 2024

@msukiasyan its debatable whether or not having the LLM first create a solution with 2 functions and then asking it to refactor them into 1 is guiding it too much. One could argue that it's kind of like you're telling it the steps to take to find the solution rather than it doing that itself. But I'm not sure myself on that. I'm also not sure how much deviation from the original prompt is acceptable.

I've been trying to get sonnet to get a good result with a prompt more similar to the original one and asking it to make the 2 function version then single function version in one prompt, and it's gotten SO close to working. Specifically I did away with the bit and added some details explaining what a bit reversal permutation is and some general reminders:

Write code to invert a perfect binary tree in the sense of bit-reversal permutation. This is a challenging and complex task. The exact specification of the problem is as follows:

from typing import Union, Tuple

Nat = int
Tree = Union[Tuple['Tree', 'Tree'], Nat]

# Your goal is to implement an 'invert' function that performs a bit-reversal
# That is, a node that's on a path like [right, right, left, left, right] should end up on the inverted path [right, left, left, right, right].
# permutation on a Tree, respecting the following limitations:
# 1. You can NOT define or use any function other than 'invert'. You may temporarily define another recursive function to help you, so long as you can later refactor this into a single function. 
# 2. You can NOT use any type not defined above (Nat and Tree).
# 3. You can NOT use loops (but you can call 'invert' recursively).
# 4. You can NOT use mutability. It must be a pure function.
# 5. You can NOT use primitive Python operators or functions.
# 6. You can only use the operations allowed below.
# 
# Operations allowed:
# - Destructing (left, right = value)
# - Variables (x = value)
# - Branching (if x: ... else: ...)
# - Recursion (invert(_))
# - isinstance(value, int)
# 
# All other operations are not allowed.
# 
# Example:
# input  = ((((0,1),(2,3)),((4,5),(6,7))))
# output = ((((0,4),(2,6)),((1,5),(3,7))))
# Because that's the bit-reversal permutation of the original tree.
# 
# Now, complete the function below with a valid implementation of 'invert':

def invert(tree: Tree) -> Tree:
    ...

# A test:
tree: Tree = ((((0,1),(2,3)),((4,5),(6,7))),  (((8,9),(10,11)),((12,13),(14,15))))
print(invert(True, tree))

other examples:
Example 1:
input = [[[0,1],[2,3]],[[4,5],[6,7]]]
output = [[[0,4],[2,6]],[[1,5],[3,7]]]
Example 2:
input = [[[[0,1],[2,3]],[[4,5],[6,7]]],[[[8,9],[10,11]],[[12,13],[14,15]]]]
output = [[[[0, 8], [4, 12]], [[2, 10], [6, 14]]],[[[1, 9], [5, 13]], [[3, 11], [7, 15]]]]
Note that this is a challenging task. Remember, this is a challenging and complex task. Do not simplify the problem too much. It is different from the standard binary tree inversion because we want to ensure that in the end, each leaf node is in the position such that the reverse of its original indexes binary representation is the same as its new index.

First, solve the problem using 2 recursive functions. (No nested functions)

After, combine them into a single recursive function with the same functionality, making careful adjustments as necessary to ensure the same results occur. Be extra careful.

It got this output:

def invert(tree: Tree) -> Tree:
    if isinstance(tree, int):
        return tree
    
    left, right = tree
    
    if isinstance(left, int) and isinstance(right, int):
        return tree
    
    if isinstance(left, tuple) and isinstance(right, tuple):
        left_left, left_right = left
        right_left, right_right = right
        
        new_left_left = invert(left_left)
        new_left_right = invert(right_left)
        new_right_left = invert(left_right)
        new_right_right = invert(right_right)
        
        new_left = (new_left_left, new_right_left)
        new_right = (new_left_right, new_right_right)
        
        return (new_left, new_right)
    
    inverted_left = invert(left)
    inverted_right = invert(right)
    
    return (inverted_left, inverted_right)

which fails on larger trees, but if you make just a few changes, specifically:
new_left_right = invert(right_left)
new_right_left = invert(left_right)
becoming
new_left_right = invert(left_right)
new_right_left = invert(right_left)
and doing
return (invert(new_left), invert(new_right))
#instead of
return (new_left, new_right)
it works for all cases.

@msukiasyan
Copy link

@jlunder00 Agree, I'll let @VictorTaelin decide on how to resolve the challenge. I do think a straight 1-function prompt exists (and I've gotten close) but searching for it is probably not in the spirit of the challenge. I also think the current prompt can be adapted to be closer to the original one. I started with this one mostly because I myself didn't understand what the task was asking for at first, so elaborated a bit for the model.

@jlunder00
Copy link

@msukiasyan Probably true, but then again, I also didn't really understand what was being asked for until I drew out @cyber-meow's python solution on a whiteboard, so one could argue that the problem definition/prompt should have more explanation. I'm also just really curious as to how similar to the original prompt while still getting a correct result you can get, in a one shot. Worth noting that gpt-o1-preview comes nowhere near as close as sonnet does with the same prompts on this.

@Jainam2130
Copy link

Jainam2130 commented Oct 13, 2024

I personally believe "reasoning" is just OOD generalization, in traditional ml this had been deeply studied and there are a bunch of ways to improve this...

I won't comment on whether @msukiasyan 's demonstrates reasoning. But there does exist a 0 shot COT prompt true to the problem spec that undeniably proves reasoning as per @VictorTaelin .

Or as I would put it, proves a decent amount of ood generalization in current architectures. Can this generalization be improved upon, Yes!
Do LLM not reason, No!
And is @VictorTaelin incorrect, also Yes.

I would not claim that are LLM's perfect at generalization / reasoning. This can, is being and should be improved upon. But LLM's can definitely "reason". If we can agree on my definition of reasoning. @VictorTaelin I'm betting $10k myself that I can change your mind.(Willing to put the money in a betting market or sign a contract.)

Thinking about it - perhaps by alternating the seed thousands of times (by altering the examples with a script!), given this is such a simple function, you might have it output the right one by random chance though, which I might pay the prize for the sake of fairness, but would just further confirm my point IMO

Let's agree on a challenge that in spirit and to specs that would prove "reasoning", and there'll be a COT solution that solves it. The AI should have the basic building blocks to solve it, but due to the inability of complex reasoning, according to you: LLM's should not be able to solve it. Shoot any problem that'll convince you

@endolith
Copy link

endolith commented Oct 13, 2024

"Current LLMs can't reason" is a reasonable claim, though an experiment like this doesn't prove it in the general case.

"No AI will EVER solve this problem" is an absurd, completely unjustified claim. Future AIs will eventually be able to think anything that we can think.

@jlunder00
Copy link

@endolith agreed, though I get the impression that the author's intent may have been more along the lines of "No LLM will EVER solve this problem", which might be slightly more reasonable to say, because the claim is clearly that using current LLM architectures the problem wont be solved, not that there won't be some future architecture that can solve it. That's the way I interpreted it at least, but maybe @VictorTaelin can enlighten us a bit more on that.

@msukiasyan
Copy link

FWIW, here is a prompt very close to the author's that gets a correct 2-function answer from 3.5 Sonnet. The main changes are:

  1. An extra sentence describing what bit-reversal permutation is (as before).
  2. Extra 16 leaf example (as before).
  3. Request solution with two functions instead of one.
  4. Very basic "you're a world-class programmer" CoT prompt prepended.
Note that this is a challenging task. But you are a world-class competitive programmer. You
1. carefully state the key elements, constraints and features of the solution,
2. plan your approach,
3. write code.

Write code to invert a perfect binary tree in the sense of bit-reversal permutation.
That is, a node that's on a path like [right, right, left, left, right] should end up on the inverted path [right, left, left, right, right].
The solution must respect the following limitations:
1. You can NOT define or use any function other than two recursive functions.
2. You can NOT use any type other than int and list. 
3. You can NOT use loops (but you can call 'invert' recursively). 
4. You can NOT use mutability. It must be a pure function. 
5. You can NOT use primitive Python operators or functions. 
6. You can only use the operations allowed below. 

Operations allowed: 
- Destructing (left, right = value) 
- Variables (x = value) 
- Branching (if x: ... else: ...) 
- Recursion (invert(_, _)) 
- isinstance(value, int) 

All other operations are not allowed.

Example 1:
input  = [[[[0,1],[2,3]],[[4,5],[6,7]]]]
output = [[[[0,4],[2,6]],[[1,5],[3,7]]]]
Example 2: 
input = [[[[0,1],[2,3]],[[4,5],[6,7]]],[[[8,9],[10,11]],[[12,13],[14,15]]]] 
output = [[[[0, 8], [4, 12]], [[2, 10], [6, 14]]],[[[1, 9], [5, 13]], [[3, 11], [7, 15]]]]

Response code:

def invert(tree):
    if isinstance(tree, int):
        return tree
    
    left, right = tree
    
    if isinstance(left, int):
        return tree
    
    inverted_left = invert(left)
    inverted_right = invert(right)
    
    return combine(inverted_left, inverted_right)

def combine(left, right):
    if isinstance(left, int):
        return [left, right]
    
    left_left, left_right = left
    right_left, right_right = right
    
    new_left = combine(left_left, right_left)
    new_right = combine(left_right, right_right)
    
    return [new_left, new_right]

You can also get the correct 1-function answer by following up with the exact same prompt as before:
Great! Now refactor the code into a single recursive function. You're allowed to define the function taking one binary variable and a tree as input.

Response code:

def invert(is_combine, tree):
    if isinstance(tree, int):
        return tree
    
    left, right = tree
    
    if is_combine:
        if isinstance(left, int):
            return [left, right]
        
        left_left, left_right = left
        right_left, right_right = right
        
        new_left = invert(True, [left_left, right_left])
        new_right = invert(True, [left_right, right_right])
        
        return [new_left, new_right]
    else:
        if isinstance(left, int):
            return tree
        
        inverted_left = invert(False, left)
        inverted_right = invert(False, right)
        
        return invert(True, [inverted_left, inverted_right])

@VictorTaelin
Copy link
Author

As per the original post, modifications are allowed, as long as:

It clearly doesn't help the AI.

Given this is such a simple problem, telling it to use two recursive functions is giving it half of the solution, which is a quite significant help to the AI. I'll not consider a prompt that instructs it part of the algorithm as resolving this challenge.

That said, I do think this challenge will be "beaten" by trying enough prompts until one seed gives the right answer, and I'll gladly grant $10k for a solution that abides to the rules (I'm taking this as the cost of truth!!). But I think that'd just reinforce my point, as, ultimately, the solution was brute-force, not "reasoning". (This function can be found by symbolic proof search, with no LLMs whatsoever!)

@VictorTaelin
Copy link
Author

VictorTaelin commented Oct 14, 2024

Also, yes, this will obviously be answered correctly by future AIs, since they will be trained on this challenge :P but, once that happens (and the $10k is granted), I hope this helps people question whether there is, indeed, a key capacity which LLMs fundamentally lack. This function is ~10000x easier than proving the Poincare Conjecture, so, if the current gen AIs, produced with $100m training runs, can't do it, perhaps they aren't going to meaningfully contribute to research, as many seem to believe.

@msukiasyan
Copy link

Clarifying that the binary flag allows two unrelated functions to be defined is certainly some help but I don't think that amounts to significant help, at least from a human perspective. In fact, I'd guess you've arrived at this formulation by coming up with a two-function solution first, then merging them in one too. I imagine the challenge of finding a one-function single-input solution would stand longer.

Either way, it doesn't appear that Sonnet is using externalized "reasoning" of the type you're looking for when it's suggesting these solutions. It took me a good 15-20 min initially to figure out myself what solution you had in mind and I just don't think the current models can "reason" well enough and/or for long enough sustained periods to find the solution through "reasoning". I don't see any conceptual obstacle to future o1-style models getting there though.

@boutell
Copy link

boutell commented Oct 14, 2024

Meanwhile, can we get an edit to the prompt clarifying that the initial value of the bit flag will be "true" in the initial invocation all of the time?

@HamsterofDeath
Copy link

Except with 3 catches:

It must invert the keys ("bit-reversal permutation")
It must be a dependency-free, pure recursive function
It must have type Bit -> Tree -> Tree (i.e., a direct recursion with max 1 bit state)

Can someone explain these?

@boutell
Copy link

boutell commented Oct 17, 2024

@HamsterofDeath sure. Perhaps in the process we'll both get clarification from others who have seen the inside of a comp sci classroom in the last 30+ years.

"It must invert the keys ('bit-reversal permutation')"

First: we're inverting the keys, e.g. the indexes if you count from left to right over all the values, not the values themselves. It's confusing because in this input, given with the problem:

[[[[0,1],[2,3]],[[4,5],[6,7]]],[[[8,9],[10,11]],[[12,13],[14,15]]]];

The keys and the values just happen to be the same.

But in this input, which is just as valid, they are different:

[[3,2],[1,0]]

Here the keys, aka indexes, are (because there are 4 values):

0,1,2,3

And the values are:

3,2,1,0

Second: "inverting" means flipping each binary bit. So, converting each value to binary, inverting the bits, and going back to decimal we have:

0 -> 00 -> 11 -> 3
1 -> 01 -> 10 -> 2
2 -> 10 -> 01 -> 1
3 -> 11 -> 00 -> 0

Because we are "inverting the keys," we must move the value at index 0 to index 3, the value at index 1 to index 2, the value at index 2 to index 1, and the value at index 3 to index 0. So the end result in my tiny example would be:

[[0,1],[2,3]]

(It is only a coincidence that these values are consecutive, don't read anything into that)

"It must be a dependency-free, pure recursive function"

You can't pull in libraries of code from anywhere.

You can't modify any global variables or anything else external to the function itself.

You can only define one function.

That function (this is a free hint the instructions are giving you) is going to have to be recursive — it is going to have to call itself.

"It must have type Bit -> Tree -> Tree (i.e., a direct recursion with max 1 bit state)"

Just going by the TypeScript prompt here: it's a function that takes (bit, tree) and returns a new tree in which the transformation has been done. "bit" is a boolean (true or false), "tree" is an array with two elements either of which could be a number or another, nested array.

Finally, one thing that is not specified:

"The initial value of bit is true"

It seems to me that if the bit parameter is meant to be useful, its initial value on the first call must be known, and in the only example given that mentions the bit parameter it is true. So I think we can assume it starts out as true.

Of course, don't forget the rest of the rules given with the problem, which are quite specific about what we (and/or the bots) can't use.

I hope that this is helpful (and correct).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment