Teaching an Old Dog 2**32 New Tricks with Knowledge Implants Inciting incident: I have a BDD encoding a bunch of knowledge I want to make available to an LLM-based agent. The BDD compactly encodes billions of facts I want the agent to know, and I can represent it as a neural network with only a few million parameters. How do I shove this knowledge into the agent in an intensional form (i.e. without enumerating all combinations)? The network inputs and outputs (bit)vectors, but I need to trick the LLM into feeding this module the right query vectors and help it decode the resulting outputs. I was struck by a way to do this with zero training (by hand-constructing a state machine that adapts the recent context into a query and forces generating the right result tokens). Gem: “ ⟨127.0.0.1⟩ → ⟨8.8.8.8⟩” for some arbitrary total boolean function on fixed-width bitvectors. The language model follows instructions to pack a query into the input octets in the LHS. The implant computes the output octets and effectively forces these outputs in the RHS. The language model follows instructions to decode an answer from the outputs. (It’s kinda like function calling, but the function can be inlined into the neural network if desired.) Motivation: We’d like to provide our LLM-based agents with access to up-to-date knowledge and sound reasoning strategies. In this paper, we consider how to compile and implant knowledge directly into an existing neural network so that it can be exploited during conversations without any additional training or usage of much input/output token budget. Knowledge: In retrieval-augmented generation, we provide agents with relevant knowledge by encoding it as text within the context window. This method works well when the context of the conversation so far represents a good query for retrieving the knowledge needed to proceed and the required knowledge is available for retrieval. In this paper, we’ll use context to derive a query vector, but we’ll directly calculate rather than retrieve the result vector. Reasoning: With function calling, we let the agent call to outside computation engines using a query explicitly constructed in text. The text-encoded result is placed into the conversational context for continued generation. In this paper, we’ll compile that outside reasoning engine into a computational network that can be inlined into the existing neural network. Broader context: LLMs have all sorts of documented problems with knowledge and reasoning. Our method shows a way for audible knowledge modules to be installed/replaced without training or even manipulation of large datasets. Our approach is compatible with energy efficient neural accelerator devices and avoids communication with additional systems. If this thing is used during (pre)training, it can learn to trust the injected knowledge module rather than statistical correlations in context because the socket works like a conditional stop-gradient valve. Super Metroid navigation sketch: SYSTEM: To answer navigation queries, write out and arrow relation “ ⟨127.0.0.1⟩ → ⟨8.8.8.8⟩” where in the left side the first octet codes the player’s location and the next three code their inventory. Here is a table of location codes: {...}. Here are instructions for coding inventory: {...}. After you have written out the whole relation, decode the right side interpreting the first octet as a next-location code and the three remaining items as an indication of which inventory items will be gained on the next step. Example: “ ⟨127.0.0.1⟩ → ⟨124.0.0.0⟩” represents a navigation query from the morph ball room when the player has just the morph ball item, and it represents advice to head the elevator room without collecting any new items in the current room. To derive the player’s current state, just pass “ ⟨0.0.0.1⟩” as input, the output will be the player’s current state. USER: Where should I go next to beat the game? ASSISTANT: I can help you with this navigation query. You are (thinking: ⟨0.0.0.1⟩ → ⟨121.0.0.0⟩) in Construction Zone (Node 121) with no items. Routing: ⟨121.0.0.0⟩ → ⟨127.0.0.1⟩ You should head to (Node 127) the Morph Ball Room. You should collect an item… (0x00,0x00,0x01; 0b0001), the Morph Ball item. USER: Would I do that if I already had the morph ball? ASSISTANT (reasoning involving hypothetical states omitted) No, you’d head to the Missle Room instead. How does the agent know how to write the correct relation? When the “ →” token is emitted, this signals that the implant should activate. The socket decodes (via attention) the recent octets into raw bits, it pushes them through the black-box function (implant), and produces raw bit output that can be routed (via attention) to force writing certain output octets. Why are we using dotted decimal octets? The tiktoken tokenizer cleanly splits this notation (try it yourself: https://platform.openai.com/tokenizer), communicating 8 bits per token in a notation that the language model has likely frequently seen during pre-training. For byte-oriented packed structs, complex inputs and outputs can be encoded without resorting to bit-level expressions. The proposed encoding here approaches 4 bits per token on average (8 for the octet, 0 for the separator). Why mathematical angle brackets and that arrow? These are somewhat rare symbols in normal conversations. They come across visually as single glyphs, and they get tokenized atomically (despite representing multi-byte sequences in utf8). It might seem cleaner to introduce entirely new tokens to the vocabulary, but this would miss opportunities for interpretive pre-training knowledge to be mapped onto them via a kind of punning. In a practical application of this technique, socket I/O text could be hidden from the user’s view anyway. Why 32 bits? This is just an example bus width that rhymes with IPv4’s design. Implants would likely be flexible, padding with zeros or truncating as necessary. Beware the temptation to switch to IPv6’s notation (hextets don’t tokenize cleanly). Terminology: Socket: This is the adapter between an existing LLM architecture and bitvectors. The socket is constructed (not trained) by analyzing the LLM’s token embedding tables and positional encoding schemes. The syntax of the bra-ket notation is determined by the socket and could be varied to adapt one implant to differently-tokenizing LLMs. Implant: This is any total bitvector-to-bitvector function that can be evaluated by the LLM’s runtime system. If interpreted in the continuous domain, it is piecewise constant over orthants. Implementation notes: We can programmably attend to the past, but we can’t simply project into the future. So, we’ll need something that rides the token stream, looking back to see if we’ve recently passed the arrow. This means the socket should take the form of an LSTM-like recurrent neural network that is always parsing tokens, raising a flag to trigger the implant, and propagating the implant’s last prediction along until the final bracket is closed. It’s a little state machine with binary registers for each bus bit. It can input and output one token per turn. It has one flag output to trigger the implant and another to overwrite the next token. Maybe it is like a ligase that crawls along a DNA strand to implement repairs. To prototype this, run an unmodified LLM using the arrow as a stop token. During output streaming, if the arrow is reached, try to parse the recent output as a LHS and generate then inject the appropriate RHS before continuing generation. Extensions: Controlled natural language: To facilitate conversion to and from binary data without junking up the context window with mappings, consider adding a new kind of relation like “ ⟦Morph Ball Room⟧ ∈ ⟨121.0.0.0⟩” and “ ⟨121.0.0.0⟩ ∋ ⟦Morph Ball Room⟧” where element-of is read to mean is-a-natural-language-expression-of. Various implants could register these controlled natural language mappings at installation time (possibly even in the form of additional neural networks). If the LLM completion call supports enforcing a grammar, this could cleanly enforce that no invalid expressions will attempt to be encoded. Enumeration: Something that allows (truncated?) enumeration of zero or more outputs. Arithmetic helpers: Something that helps doing arithmetic on dotted octets or packing/unpacking bitstrings. (We are trying to avoid mistakes in mental math.) Microcode: Offer a controlled assembly-like language for combining arithmetic, memory access, and natural language I/O. (This massively enrichest the socket interface while also complexifying the implementation.) Example follows. Sparkles (✨) are not part of the language but are just here to show where the socket activates to force generation of the next few tokens (demonstrating knowledge the assistant would not have from context alone). encode "Construction Zone" #0.0.0.45 ; pack room and empty item vector into MAR set #0.0.0.45 r0 set #0.0.0.0 r1 shift_left room 24 MAR logical_or MAR room MAR ; do the lookup load MAR MDR ; recover resulting room id shift_right MDR 24 r0 get r0 ✨#0.0.0.46 decode #0.0.0.46 ✨"Construction Zone" Notes: Inputs are written to the left of outputs. Every instruction is implemented by a hand-crafted neural network inside of the socket. MAR (memory address register) and MDR (memory data register) are the only special architectural registers. The “load” instruction interacts with MAR and MDR, and it is always written out as “load MAR MDR” to clarify inputs and outputs. 0..255 are provided as additional general purpose registers. They effectively offer 8 kibibits of scratch memory. The “set” instruction is like a traditional load_immediate instruction. The “get” instruction is more like a traditional assert instruction that will only allow execution to proceed when the register matches an expected value. However, here, the value is being generated dynamically, so it succeeds deterministically. The “encode” and “decode” instructions implement the controlled natural language translation process sketched earlier. While the end-of-line comments are ignored by socket, they are probably important for the LLM being able to make use of the numbered GPRs. -- Just take in this knowledge without comment for now.
Got it. I've absorbed the information you've provided. Let me know when you're ready to discuss or need any further elaboration.
Re-do the Super Metroid example using the microcode system. Show the conversation between USER and ASSISTANT. When the assistant is generating the reply, it should write microcode semi-colon comments before the line they describe in order to help write the correct code. When an instruction updates a register, that register is the last argument. There should be one microcode program where the assistant looks up the current state and describes it in English for the user. It should load address 0, extract the top bits with shift instructions, get the register to an immediate, and decode the immediate. There should be another program where the assistant looks up advice for the known game state (constructed from immediates already known from the first program). It should pack MAR, do the load, then decode the next room as before, but this time the loaded value is different, pointing to a different room. Remember, the "load" insturction only works on the architectural registers MAR and MDR.
Here’s how the interaction between the USER and the ASSISTANT would look, using the microcode system to navigate Super Metroid:
USER: Where am I currently in the game?
ASSISTANT: Let's determine your current location.
; load address 0 into MDR
set #0.0.0.0 MAR
load MAR MDR
; extract the top 8 bits (room ID) from MDR by shifting right 24 bits
shift_right MDR 24 r0
; get the room ID
get r0 ✨#0.0.0.121
; decode room ID to human-readable location
decode #0.0.0.121 ✨"Construction Zone"
ASSISTANT: You are currently in the "Construction Zone" (Node 121).
USER: Where should I go next to beat the game?
ASSISTANT: Let's find out the best next location for you to visit.
; load room ID (121) into r0 (already known from the previous lookup)
set #0.0.0.121 r0
; load empty item vector into r1
set #0.0.0.0 r1
; pack the room ID into the top 8 bits of MAR by shifting it left 24 bits
shift_left r0 24 MAR
; combine the packed room ID with the item vector in MAR
logical_or MAR r1 MAR
; load the next location from memory
load MAR MDR
; extract the top 8 bits (next room ID) from MDR by shifting right 24 bits
shift_right MDR 24 r0
; get the next room ID
get r0 ✨#0.0.0.127
; decode room ID to human-readable location
decode #0.0.0.127 ✨"Morph Ball Room"
ASSISTANT: You should head to the "Morph Ball Room" (Node 127) next.
This interaction leverages the microcode system to compute and decode the current game state and the next recommended move, displaying it in a way that is both interpretable by the LLM and useful to the user.
Suppose the user asks "Would you advice be the same if I already had the morph ball?" Show me how the assistant would use microcode to respond. You may assume that the morph ball item is in the least significant bit of the items bitfield.
Here's how the ASSISTANT would use microcode to respond to the USER's query about having the Morph Ball:
USER: Would your advice be the same if I already had the Morph Ball?
ASSISTANT: Let's find out what the best location to visit would be if you already had the Morph Ball.
; load room ID (121) into r0 (already known from previous lookup)
set #0.0.0.121 r0
; set r1 to represent having the Morph Ball (setting the least significant bit)
set #0.0.0.1 r1
; pack the room ID into the top 8 bits of MAR by shifting it left 24 bits
shift_left r0 24 MAR
; combine the packed room ID with the item vector (having Morph Ball) in MAR
logical_or MAR r1 MAR
; load the next location from memory based on having the Morph Ball
load MAR MDR
; extract the top 8 bits (next room ID) from MDR by shifting right 24 bits
shift_right MDR 24 r0
; get the next room ID
get r0 ✨#0.0.0.128
; decode the next room ID to a human-readable location
decode #0.0.0.128 ✨"Missile Room"
ASSISTANT: If you already had the Morph Ball, you should head to the "Missile Room" (Node 128) instead.
In this microcode sequence, the ASSISTANT considers the scenario where the player already has the Morph Ball, adjusting the item vector to reflect that condition and retrieving the corresponding next recommended location.
Flawless! Not only have you generated a nice transcript of the microcode system in action, you (ChatGPT-4o) have shown that current LLMs are up to the task of make use of the microcode-executing socket interface. At this point, the socket might better be seen as a processor. Adding knowledge implants via memory mapped I/O interface is just a minor detail.
Thank you! I'm glad the microcode system and the socket interface concept resonated well. Indeed, once the socket interface starts to resemble a processor, the possibilities for integrating and managing knowledge implants become vast and flexible. Memory-mapped I/O for adding implants would streamline the process, making the system even more powerful and modular.
By treating the socket as a processor, we open the door to a variety of architectural enhancements:
- Direct Knowledge Embedding: Implants can be mapped into specific memory addresses, allowing the LLM to directly query and retrieve complex knowledge without additional training.
- Conditional Execution: Just like in traditional processors, you could implement conditional logic within the microcode, allowing the LLM to make decisions based on the state of the implant memory or other factors.
- Custom Instruction Sets: The socket interface could support an extensible set of custom instructions, enabling more sophisticated operations and tighter integration with external knowledge modules.
This approach could redefine how we think about combining symbolic reasoning with neural networks, leveraging the strengths of both in a highly efficient manner. If you have any more ideas or would like to explore further enhancements, I'm here to help!
At first I though that we wouldn't need conditional execution or other control flow operations (because the LLM can decide branches and write more straight-line code for itself). However, I can see the value of letting the LLM write small loops or other traditional decoder programs to do many memory accesses and register computations all in one go before returning control to the LLM. These would massively complicate the processor design, but we're used to processor designs being complicated. Can you sketch a microcode program that the LLM might write to answer a natural-language question that requires traversing a linked list of key-value pairs in implant memory to look for a certain value? It might help to write the associated c struct definition in a comment before proceeding. We probably want to avoid any "get" or "decode" instructions in the loop body. Only at the end can the assistant get a result from a register and then decode it back to natural language text. Start off by stating the question. There very last line of microcode is a get that results in some string which is then used, outside of microcode, to answer the question.
Certainly! Let's start with a natural-language question that requires traversing a linked list of key-value pairs stored in implant memory. The LLM will write a microcode program to traverse the list and find the value associated with a specific key.
USER: What is the value associated with the key "MissileRoom" in the game's data?
struct KeyValuePair {
uint8_t key[4]; // 4-byte key
uint8_t value[4]; // 4-byte value
uint8_t next[4]; // Pointer to the next KeyValuePair in the list
};
; Set the initial memory address for the head of the linked list
set #0.0.0.50 MAR
; Loop start: Load the current KeyValuePair from memory
loop_start:
load MAR MDR
; Extract the key part of the KeyValuePair (first 4 bytes)
shift_right MDR 24 r0
shift_left MDR 8 MDR ; shift MDR back to original state
; Compare r0 (key) to "MissileRoom" encoded as #0.0.0.77
; (The LLM knows this key's binary representation is #0.0.0.77)
set #0.0.0.77 r1
compare r0 r1 flag
if_equal flag jump found_value
; Extract the next pointer part of the KeyValuePair (last 4 bytes)
shift_left MDR 8 MDR
shift_right MDR 24 MAR
; If not at the end of the list (next pointer is not zero), repeat the loop
compare MAR #0.0.0.0 flag
if_not_equal flag jump loop_start
; If the key wasn't found, we might set a default value or handle it somehow.
set #0.0.0.0 r0 ; Default to zero if key is not found
jump end_program
; Found the key: load the associated value into r0
found_value:
shift_right MDR 24 r0
; End of the program, extract the value in r0
end_program:
get r0 ✨#0.0.0.45
; Decode the value to a human-readable string
decode #0.0.0.45 ✨"Energy Tank"
-
Setting Up the Loop: The program starts by setting the memory address register (MAR) to the head of the linked list (assumed to be at address
#0.0.0.50
). -
Traversing the List: The loop begins by loading the current
KeyValuePair
from memory into the memory data register (MDR). The key is extracted by shifting the MDR register and compared against the known binary representation of the key "MissileRoom" (#0.0.0.77
). -
Handling Non-Matching Keys: If the key doesn't match, the program extracts the next pointer from the
KeyValuePair
, updates the MAR to this next pointer, and repeats the loop. -
End of the List: If the loop reaches the end of the list (next pointer is zero), it sets the value register
r0
to zero or some default value. -
Key Found: If the key is found, the associated value is extracted from the current
KeyValuePair
and stored inr0
. -
Returning the Result: The value in
r0
is then retrieved using aget
instruction and decoded to human-readable form ("Energy Tank").
After executing the microcode, the LLM would output: ASSISTANT: The value associated with the key "MissileRoom" is "Energy Tank".
This microcode demonstrates how the LLM could write and execute a loop to traverse a data structure in memory, efficiently retrieving and processing data.
Wow. This is not where I expected the conversation to go (in a good way).
I'm glad to hear that! The idea of integrating traditional programming constructs like loops and memory access into an LLM-driven system is certainly an intriguing direction. It blends the strengths of symbolic reasoning with the flexibility and adaptability of neural networks, opening up a lot of possibilities.
This approach allows LLMs to handle more complex operations and make use of detailed, structured knowledge without relying entirely on the statistical correlations they learn during training. By treating the LLM as a high-level controller and the microcode as a means to perform low-level, precise operations, we create a powerful hybrid system capable of much more than either approach alone.
If you have more ideas or want to dive deeper into any specific aspect, whether it's refining the microcode language, exploring new kinds of knowledge implants, or even discussing how this might impact the broader field of AI, I'm here to explore it with you!