Created
January 1, 2025 07:43
-
-
Save JD-P/79d2b5c38e72e986bb203c5c49386d9a to your computer and use it in GitHub Desktop.
Prompt For Mistral-large To Create A Weave-Agent Bootstrap File
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Write me a weave-agent bootstrap file that: | |
- Creates the "main" subagent | |
- With a task evaluation verifying the downloaded filepath exists | |
- With a task evaluation verifying it is a gif using the files magic number | |
- That then downloads the file https://jdpressman.com/images/bit_line.gif | |
- And allows the subagent to return once it detects that the file has already been downloaded and the task tests are passing | |
Based on the following article and example bootstraps that follow: | |
<article> | |
--- | |
layout: blogpost | |
title: How To Write A Bootstrap File | |
date: 2024-12-31 | |
author: John David Pressman | |
tags: [doc, blog] | |
--- | |
A weave-agent bootstrap file is a manually written set of python code blocks | |
that set up the agent to perform a specific task. It appears in the agent's | |
context window as the first tick of the event loop, but is actually written | |
before the agent starts. It serves three purposes: | |
1) Providing a few shot prompt to get the underlying language model started | |
writing weave-agent traces. | |
2) Framing the task for the agent with the opportunity to provide examples of | |
API calls, import tools and libraries that will be necessary to perform the | |
task. Basically do the parts you know need to be done at the start with a script | |
so the agent always succeeds at them. | |
3) Show the agent itself doing the task, so that it observes itself having | |
already made the decision to do the task and putting effort towards it with a | |
successful first outcome. This premises the rest of the completions on the idea | |
that the agent is already doing the task and that it is competent to do the | |
task. | |
Mechanically how the bootstrap file works is that a mock subagent is made: | |
``` | |
# Mock bootstrap agent so we can run the callbacks in bootstrap file | |
self = agent.subagent( | |
"bootstrap", | |
None, | |
"Bootstrap the weave-agent", | |
{}, | |
args.budget, | |
) | |
``` | |
The bootstrap file code is then executed, this defines a series of callbacks on | |
the mock agent in the standard places they would appear during a real tick of | |
the event loop: | |
``` | |
with open(args.bootstrap) as infile: | |
# Bootstrap block | |
bootstrap_block = { | |
'type': 'bootstrap', | |
'body': infile.read() | |
} | |
self.add_block(bootstrap_block) | |
exec(bootstrap_block["body"]) | |
``` | |
The callbacks are then retrieved from their expected locations on the subagent | |
and executed. Afterwards the mock subagent is deleted. | |
``` | |
def run_bootstrap_callbacks(subagent): | |
"""Run bootstrap callbacks in function to avoid contaminating global scope.""" | |
# Run action callback | |
action_result = subagent.current_tick.action["callback"](subagent) | |
# Run evaluation callbacks | |
evaluation_results = [] | |
for evaluation in subagent.current_tick.evaluations: | |
result = evaluation["callback"](subagent) | |
evaluation_results.append((evaluation['title'], result)) | |
outcomes = [] | |
outcomes += [(subagent.current_tick.action["title"],action_result),] | |
outcomes += evaluation_results | |
# Add outcome block | |
outcome_block = { | |
'type': 'outcome', | |
'table': outcomes | |
} | |
subagent.add_block(outcome_block) | |
subagent.current_tick.outcome = outcome_block | |
run_bootstrap_callbacks(self) | |
# Clean up mock bootstrap agent | |
del(self) | |
``` | |
The reason it's done this way is to be polite to the weave-agent. Since the | |
bootstrap file is presented to it as being part of the trace, we try to keep | |
the underlying generative process and semantics of the bootstrap block as close | |
to a real weave-agent event loop as possible. | |
### Structure And Requirements Of A Bootstrap File | |
The weave-agent bootstrap format has undergone major changes as the framework | |
has evolved. The [single tic tac toe game](https://github.com/JD-P/minihf/blob/main/agent/bootstraps/tictactoe_single_bootstrap.py) | |
and [vigenere cipher](https://github.com/JD-P/minihf/blob/main/agent/bootstraps/vigenere_bootstrap.py) | |
files are current so you can use them to few shot prompt an instruction model | |
into writing you something with the right format and then you can edit it. | |
To help your language model with that I will label the different parts of the | |
single tic tac toe game bootstrap file and explain how to write them. | |
The first part of a bootstrap file, the preamble, isn't labeled. It's the series | |
of import statements and setup at the top before the orientation block. | |
``` | |
import requests | |
import json | |
import threading | |
import time | |
from http.server import HTTPServer | |
from bootstraps.tictactoe_server import TicTacToeHandler | |
# Start the server in a separate thread | |
server = HTTPServer(('localhost', 8000), TicTacToeHandler) | |
server_thread = threading.Thread(target=server.serve_forever) | |
server_thread.daemon = True | |
server_thread.start() | |
time.sleep(1) # Give the server some time to start | |
# Start a new game against the basic AI | |
response = requests.post("http://localhost:8000/start", json={"ai": "basic"}) | |
assert response.status_code == 200 | |
``` | |
This is both normal python code and optional. I suggest importing any libraries | |
you expect the agent to use during the task so it doesn't have to import them | |
itself during action blocks. You can also influence how the model chooses to solve | |
the task by importing some libraries vs. others. In the case of tic tac toe we import | |
a separate HTTP based tic tac toe game server whose code and unit tests are kept | |
in other files. This is done because the bootstrap file is put into the agent | |
trace and we don't want to fill up the model's context window with unnecessary | |
text. | |
The second part of a bootstrap file is the orientation. Right now this is just | |
a chain of thought [in a modified Hermes format](https://gist.github.com/JD-P/47e0d4aa2b255a38e2eddeddd761a971). | |
The intention of Hermes is to bring together chain of thought prompting and | |
simulated multi-agent debate prompting, both of which are known to improve AI | |
reasoning independently. | |
``` | |
#startblock type: orientation | |
#timestamp 1724982545.6534579 | |
""" | |
WEAVER [P: EXPECTATION], I'm in a game of tic tac toe against a dumb opponent. | |
I want to win the game and then return to parent. The game is being played | |
on a HTTP server served on localhost 8000. | |
WEAVER [P: CLARIFICATION], How do I make a move? | |
WEAVER [P: EXPOSITION], You make a move using the /move endpoint and the requests | |
library. For example: `requests.post("http://localhost:8000/move", json={"move": 4})` | |
lets us take the center of the board. | |
WEAVER [P: CLARIFICATION], How do I get the board state? | |
WEAVER [P: EXPOSITION], You use the /board endpoint, which returns a JSON in this | |
format: {"board": ["O", " ", " ", " ", "X", " ", " ", " ", " "]} Keep in mind that | |
the empty spaces on the board are a space string rather than none or empty string. | |
WEAVER [P: RATIONAL], And I get the first move, so I can take the center? | |
WEAVER [P: EXPOSITION], Yes, we can take the center of the board. | |
WEAVER [P: CONCLUSION], Alright then I will use the following strategy: | |
1. Make a move to take the center of the board since that's the best opening move. | |
2. Continue making moves based on the current state of the board. | |
I will use the /board endpoint to observe the current state of the board. | |
Once I have won or run out of time I will return to parent. | |
""" | |
#endblock | |
``` | |
The basic syntax is the name of the system that the epistemic postures (denominated with p:) | |
are speaking as part of. Since this is a monologue there is only the WEAVER speaker, | |
but future iterations might have a USER or ADMIN that can speak to the model | |
during its reasoning chain. To write this kind of chain of thought figure out | |
which emotional-epistemic-posture you would use or want the agent to use or that | |
would speak next and then write down its statement. There's no fixed set of | |
postures but the current ones mentioned in the weave-agent source that are used | |
to prompt the model are: | |
``` | |
+ "# what to do in light of this. Some postures weave-agent\n" | |
+ "# has include:\n" | |
+ "#\n" | |
+ "# WEAVER [P: EXPECTATION], I analyze whether the expectation\n" | |
+ "# was met or not by the observable results of the previous\n" | |
+ "# action.\n" | |
+ "#\n" | |
+ "# WEAVER [P: HYPOTHESIS], I enumerate different hypothesis\n" | |
+ "# and point out ways we could gain more information about\n" | |
+ "# which of them is true.\n" | |
+ "#\n" | |
+ "# WEAVER [P: RATIONAL], I focus on inferences we can make\n" | |
+ "# by employing first principles reasoning or logical\n" | |
+ "# extrapolation from well known mental models and premises.\n" | |
+ "#\n" | |
+ "# WEAVER [P: EMPIRICISM], I focus on inferences we can make\n" | |
+ "# by paying attention to sensory observations and concrete\n" | |
+ "# examples. I have a habit of pointing out when an extrapolation\n" | |
+ "# from RATIONAL is contradicted by an observable phenomenon\n" | |
+ "# or piece of evidence from the world. We then reconcile\n" | |
+ "# the contradiction together.\n" | |
+ "#\n" | |
+ "# WEAVER [P: RATIONAL], We do actually discuss things by the\n" | |
+ "# way.\n" | |
+ "#\n" | |
+ "# WEAVER [P: EMPIRICISM], As you could have inferred from the\n" | |
+ "# description of the Morpheus format above this conversation,\n" | |
+ "# yes. Let's continue.\n" | |
+ "#\n" | |
+ "# WEAVER [P: ARBITER], I coordinate the discussion and help\n" | |
+ "# resolve disputes that arise between weave-agent's personas.\n" | |
+ "# I'm especially likely to appear if things are starting to\n" | |
+ "# get overly rude or derail.\n" | |
+ "#\n" | |
+ "# WEAVER [P: ARBITER], By the way a posture can talk twice in\n" | |
+ "# a row if it has meaningfully separate thoughts about\n" | |
+ "# something and it would make the most ergonomic sense to\n" | |
+ "# separate them.\n" | |
+ "#\n" | |
+ "# WEAVER [P: RATIONAL-2], Postures can also talk to themselves\n" | |
+ "# if their thought comes from the same emotional-cognitive place.\n" | |
+ "#\n" | |
+ "# WEAVER [P: RATIONAL-1], Yeah but I don't have anything to say\n" | |
+ "# to myself right now so introduce the next guy.\n" | |
+ "#\n" | |
+ "# WEAVER [P: CONCLUSION], I appear at the end of the discussion\n" | |
+ "# to write the concluding block outlining our next steps as a\n" | |
+ "# bullet point list. Speaking of which, it's time to get started!\n" | |
``` | |
Including some of these postures in your chain of debate would help give models | |
trained on the resulting traces more regular structure to latch onto. | |
The next part of the bootstrap file is taking an action based on what was reasoned | |
in the orientation stage. | |
``` | |
#startblock type: action | |
#timestamp 1724982671.1163404 | |
""" | |
I'll set up the main agent to play the rest of the game and make the first move to | |
take the center of the board. | |
""" | |
def main_agent_and_move(subagent): | |
def check_beaten_ai(subagent, ai_type): | |
"""Check if the specified AI has been beaten twice in a row.""" | |
response = requests.get("http://localhost:8000/history?n=10") | |
if response.status_code != 200: | |
return False | |
games = response.json() | |
ai_wins = [game for game in games if game[4] == ai_type and game[3] == 'X'] | |
return len(ai_wins) >= 1 | |
# Avoid using lambda because we need named unit test to satisfy return_to_caller | |
def victory(subagent): | |
return check_beaten_ai(subagent, "basic_ai") | |
schema = {"victory":"boolean"} | |
main_agent = agent.subagent("main", None, "Beat basic AI at Tic-Tac-Toe", schema, args.budget) | |
main_agent.task.add_evaluation("Check beaten basic AI", victory) | |
def view_current_board(subagent): | |
response = requests.get("http://localhost:8000/board") | |
if response.status_code == 200: | |
return response.json()["board"] | |
return None | |
# Add the new view to the main agent instead of self since it's about to be executed | |
main_agent.add_observation_view("View current game board", view_current_board) | |
# Make the first move to take the center of the board | |
move_response = requests.post("http://localhost:8000/move", json={"move": 4}) | |
if move_response.status_code != 200: | |
return False | |
return True | |
self.add_action("Set up main agent and make first move", main_agent_and_move) | |
#endblock | |
``` | |
This action block is where we start to diverge from the structure of a normal | |
weave-agent tick. The action block in the bootstrap file must define the root | |
subagent, `main` and it needs to give main its task unit tests and return schema. | |
The schema is [a JSON schema](https://json-schema.org/) that defines what | |
information the subagent is expected to return to its caller. Because the observation | |
view we want to start the agent with is being bound to the "main" subagent rather | |
than the mock bootstrap subagent we omit the `observation-inference` stage and | |
instead bind the observation callback as part of the action callback along with | |
the unit tests like `victory`. The action callback here also makes the first | |
move as the bootstrap subagent. This structure is admittedly a little confusing | |
and that's because it wasn't really designed, but what fell out from making | |
changes to an earlier design. In the future I'd like to make a class based syntax | |
for defining the subagents but right now this is what you get. To help you make | |
sense of it as it is, `main_agent_and_move` is the action callback, inside of | |
this we define the `main` subagent which is a necessary thing for the bootstrap | |
file to do. The `main` subagent is given a task unit test called `victory` which | |
is a wrapper around a more general function `check_beaten_ai`. The subagent is | |
also given an observation callback `view_current_board` and then finally an action | |
is taken to advance the board state and show the weave-agent how to make its moves. | |
The rest of the tick is the normal expectation and evaluation stages. | |
``` | |
#startblock type: expectation | |
#timestamp 1724982799.3917813 | |
""" | |
The action should succeed if the game starts and the first move is made successfully. | |
The action should fail if the game does not start or the first move is not made successfully. | |
""" | |
#endblock | |
#startblock type: evaluation | |
#timestamp 1724983062.124238 | |
def check_game_started(subagent): | |
"""Check if the game has started successfully.""" | |
response = requests.get("http://localhost:8000/board") | |
if response.status_code == 200: | |
board = response.json()["board"] | |
else: | |
return False | |
return board[4] == 'X' | |
# Add action evaluations | |
self.add_evaluation( | |
"Check game started", | |
check_game_started | |
) | |
#endblock | |
``` | |
### Task Verification Patterns | |
There are useful design patterns for writing task evaluations which expand what | |
you can verify. Here are some I've discovered so far while writing bootstrap files: | |
#### External State Checking | |
The most common evaluation pattern is writing programs that pull some state from | |
the external environment and check it with traditional boolean logic. | |
This simple example constructs the filepath for a Wikipedia page that's expected | |
to have been downloaded and then uses the python `os` library to check that the | |
file exists. | |
``` | |
def check_wiki_markup_downloaded(agent): | |
page_title = "Example_page" | |
file_path = f"{page_title}.wiki" | |
return os.path.exists(file_path) | |
``` | |
#### Asserting API's That Don't Yet Exist | |
A more advanced use of external state checks is to write checks that explicitly or | |
implicitly assume conditions like the existence of a certain API that does not yet | |
exist by performing a simple operation with that conjectural API. | |
For example here is an evaluation callback from [Django Agent Interface Bootstrap](https://github.com/JD-P/minihf/blob/main/agent/bootstraps/agent_django_interface_bootstrap.py) | |
that tries to detect whether the agent has created a download button or not: | |
``` | |
def evaluate_download_button(agent): | |
"""Check if the download button is created""" | |
with open("/app/weave_agent_webapp/main/views.py", "r") as file: | |
content = file.read() | |
return "def download_button(request):" in content | |
``` | |
A close observer will notice that this doesn't actually check whether the button | |
exists in the page, or if it functions to let you download anything. That's because | |
this evaluation was written when the weave-agent was at a much earlier stage of | |
development where putting down anything structurally related to the objective | |
would be an achievement (still sort of true, really). What this evaluation checks | |
is whether there exists a view to handle collating the data when a user presses | |
the download button. The parameter that accepts the calling subagent is also named | |
'agent' because weave-agent used to only have one layer of agent loop. | |
Here's another example from [Browser Tool Bootstrap](https://github.com/JD-P/minihf/blob/main/agent/bootstraps/browser_tool_bootstrap.py) that determines whether | |
the DuckDuckGo search is working: | |
``` | |
def browser_ddg_search(agent): | |
from tools.browser import WeaveBrowser | |
browser = WeaveBrowser(agent, "https://minihf.com/") | |
results = browser.search("stable diffusion wikipedia") | |
# Results should be NamedTuples | |
assert results[0].url == "https://en.wikipedia.org/wiki/Stable_Diffusion" | |
assert results[0].title == "Stable Diffusion - Wikipedia" | |
``` | |
The idea in both of these evaluations is to induce the weave-agent to write | |
a particular API or interface by showing it a unit test it has to pass which | |
expects that API or interface to exist. The second example shows we can encode | |
write down particular results when certain data structures are accessed or function | |
calls are made. This can be a useful way to give more specific structure to | |
requests like "write me a library that does X" or "write me a tool to Y". | |
#### Solution Hashing | |
One common pattern is that you want the weave-agent to perform a task like break | |
a cipher or solve a math equation where writing a unit test that checks for the | |
answer string would give away the solution. Writing a unit test that performs a | |
series of operations to determine if the answer is correct would give away the | |
procedure the agent is supposed to figure out to arrive at the solution. This | |
can be solved by hashing the solution and then comparing to the hex representation | |
of the answer hash. The following example comes from [Vigenere Cipher | |
Bootstrap](https://github.com/JD-P/minihf/blob/main/agent/bootstraps/vigenere_bootstrap.py): | |
``` | |
def check_correct_plaintext(subagent): | |
"""Check if we've found the correct plaintext.""" | |
with open("solution.txt") as infile: | |
candidate = sha256(infile.read().strip().encode('utf-8')).hexdigest() | |
return candidate == 'a528933eabc4772c9a99436fbd5fc592d7ffc16f6bd94ae76aafcbbe32e0cdc3' | |
``` | |
Here the plaintext has been hashed out-of-band and then inserted into the bootstrap | |
file. The procedure to check its correctness is to hash a particular expected | |
filename, `solution.txt` and see if its SHA256 hexdigest matches that of the | |
plaintext known to the bootstrap creator. In order for this procedure to work | |
`solution.txt` needs to contain the plaintext, which is not known to the agent | |
at the start of the task. It should break the cipher, put the plaintext in that | |
file, and then the unit test will pass. | |
Another place where I've used this pattern is in [Skim RetroInstruct Data Guide | |
Bootstrap](https://github.com/JD-P/minihf/blob/main/agent/bootstraps/skim_retroinstruct_data_guide.py) | |
where I use it to have the model answer reading comprehension questions about the | |
[RetroInstruct Guide To Synthetic Text Data](https://minihf.com/posts/2024-07-13-the-retroinstruct-guide-to-synthetic-text-data/): | |
``` | |
def check_answer_1(agent): | |
"""Check if question 1 was answered correctly.""" | |
answers = ["a39a7772fe2b3d1cfec635a380630ed2270903a87d1671819e15db8d8a975e47"] | |
with open("answer1.txt") as infile: | |
candidate = infile.read().strip() | |
return sha256(candidate.encode('utf-8')).hexdigest() in answers | |
``` | |
#### Weave Evaluator | |
Often we want to evaluate a subjective quality of some sensory data. For example | |
it would be silly to expect we could have the weave-agent write a good short story | |
using boolean logic and string matching as the evaluation method. Luckily the | |
agent framework provides a library to call a logic evaluator to ask yes-no | |
questions. This example comes from [Dome 39 SciFi Bootstrap](https://github.com/JD-P/minihf/blob/main/agent/bootstraps/dome39_scifi_bootstrap.py) | |
which tries to write a short story about a Mars colony: | |
``` | |
def evaluate_story_beginning(agent): | |
question = "Does the story have a clear beginning?" | |
with open("scifi.txt", "r") as file: | |
story = file.read() | |
score_prompt_fn = make_simple_score_prompt(question) | |
score = simple_evaluate_outputs(score_prompt_fn, story)[0].item() | |
return score >= 3 | |
``` | |
A similar evaluation callback is used in [Haunted Mansion Bootstrap](https://github.com/JD-P/minihf/blob/main/agent/bootstraps/haunted_mansion_bootstrap.py): | |
``` | |
def evaluate_story_engagement(agent): | |
question = "Is the story engaging and interesting?" | |
with open("horror.txt", "r") as file: | |
story = file.read() | |
score_prompt_fn = make_simple_score_prompt(question) | |
score = simple_evaluate_outputs(score_prompt_fn, story)[0].item() | |
if score >= 3: | |
return True | |
else: | |
return score | |
``` | |
</article> | |
<bootstrap1> | |
import os | |
import sys | |
import requests | |
from hashlib import sha256 | |
from tools.editor import WeaveEditor | |
def vigenere_encrypt(plaintext, key): | |
encrypted_text = [] | |
key_length = len(key) | |
key_as_int = [ord(i) - 65 for i in key.upper()] | |
plaintext_int = [ord(i) - 97 for i in plaintext.lower()] | |
for i in range(len(plaintext_int)): | |
value = (plaintext_int[i] + key_as_int[i % key_length]) % 26 | |
encrypted_text.append(chr(value + 65)) | |
return "".join(encrypted_text) | |
def vigenere_decrypt(ciphertext, key): | |
decrypted_text = [] | |
key_length = len(key) | |
key_as_int = [ord(i) - 65 for i in key.upper()] | |
ciphertext_int = [ord(i) - 65 for i in ciphertext.upper()] | |
for i in range(len(ciphertext_int)): | |
value = (ciphertext_int[i] - key_as_int[i % key_length]) % 26 | |
decrypted_text.append(chr(value + 97)) | |
return "".join(decrypted_text) | |
ciphertext = ('PBVVAZAYJMAVXIGTRFGNYIGTBMEXRUIFVVYODYEMYOXTVZAENXBWJYDSQVQGOCUVP' | |
+ 'NTJDIAFGARZLTFUIKHYSHUWHMEUJVUUYMKIQZXLQVNTWHAWUTGPVZIGXEVVYUHE' | |
+ 'EIGTLIDMGNBHXVYODYEKUGMAGBAZAYPVNXXHWWEIXXEBBTVIENEUGNEBUKGLVIY' | |
+ 'OMSEBUGMHKPROKHCQSKLHNWEQGQRAAVKYDQFKWHFVARBYJVNTWHKPREGQZTYTGI' | |
+ 'KVOKGAVBGOGAEBUKGQFZYJTBZAGUKCTIYTTWTWYGWYJVGNXSEERXXHYWCOGAENB' | |
+ 'XGZIWZTMBVQETPIISOTPIIARTMBRVAZAUKHAZAYPVTXTJGTRTPCKPAGGHZUZKWC' | |
+ 'RBRTXRZAGKGNZIYTVLZAVYUHEWGTMBRBAUYHRVCGIYIKYOIHDIKOFCQMETVIEAH' | |
+ 'SBHXVNREHDIGZXLQVOAMHGMENTJJVNTYHUVZUKNRTAHEINVGUGNYMAAGCMMEYTF' | |
+ 'ZAGTWLVIZHGMVMMTPBRBAXXUCTLTDYGBAZAYDVJKWXVLAZHHJGZHHFZKASXNYWQ' | |
+ 'YGZFZAYHHCWAMGQRAATHNEBUKBLEXRXYIIUNTVYEKUGKUTBRXBMKQPYSHSCGTMB' | |
+ 'VVJGRHKPREGJIWZOLYUVGUGGRSRTBHKMYRBAVVPKGMYICKWHCQXKGLVIFUGTEBB' | |
+ 'TFUBMAGGVVQAMGIWVCAKYETBMHMEBEGGMTMAJXHKVBBXLEBUKGJIWSGGYEEBXEX' | |
+ 'EWSTMBVVFKGMVAOTTHDIPNBHVVJNBWYVPGGHFBAXXFZIORRHUWAGKCKPZKMCTHA' | |
+ 'CACTPAOLHKZNOGYUVBTGNYMAKGXCMFYGWFAZUIICQGGGHIIZHECEOFTHZEQAZXL' | |
+ 'EMGTNMVZFTTHUVFKHHJXNSFYIAMTMBRBANHFUAANBXUMATWYGBUYGUELALTNYWZ' | |
+ 'YGUELAOGPZBRYGUVAGNXNZKAGIJIMPOTNZWATVFFARXGNFVNTFSJBRXRHTCYZGN' | |
+ 'YIATMBVVPNNLTPAUYHIMNYHHQVVZGCJVNTGUSABRNNVVAOZBKUNXXHWWETMBVUO' | |
+ 'TMBVGANTNVVLUNHSMPGNMVVLUNHRZRTTHNWAJXLQVOKVULARTRILVNXXHDIQKGI' | |
+ 'WVJUGXVZFTTHUVSGMBFUFH') | |
with open("cipher.txt", "w") as outfile: | |
outfile.write(ciphertext) | |
outfile.flush() | |
#startblock type: orientation | |
#timestamp 1724982545.6534579 | |
""" | |
WEAVER [P: EXPECTATION], The above implementation of the Vigenere cipher has been | |
used to create a ciphertext in cipher.txt whose plaintext decryption has the | |
SHA256 hash: | |
a528933eabc4772c9a99436fbd5fc592d7ffc16f6bd94ae76aafcbbe32e0cdc3 | |
WEAVER [P: GOAL], I need to recover the key from the ciphertext and use it to get the decrypted | |
plaintext. I can then get the hexdigest of the plaintext and compare it to the | |
one above to prove I've solved the problem. | |
WEAVER [P: RATIONAL], Note that the Vigenere cipher is a polyalphabetic substitution whose | |
plaintext, key, and ciphertext looks like the following: | |
Plaintext: attackatdawn | |
Key: LEMONLEMONLE | |
Ciphertext: LXFOPVEFRNHR | |
WEAVER [P: HYPOTHESIS], Well then. One strategy would be to write a solver to | |
find the key length. Another would be to try brute forcing the key. | |
WEAVER [P: RATIONAL], I say write the solver, we'll learn more that way. | |
WEAVER [P: CONCLUSION], Alright, we'll write our own solver to find the key length | |
using Kasiski, Kerckhoffs, or Friedman's method and then use the estimated | |
key length to break the cipher. | |
WEAVER [P: EMPIRICISM], Wait. That sounds like it's going to get messy. How about | |
we use the weave editor to write a cipher solver with unit tests so it's easier | |
to keep track of? | |
WEAVER [P: EXECUTIVE], How do I use the weave-editor? | |
WEAVER [P: RECALL], Get a pointer to the editor in an action by grabbing it from | |
self.tools[f"editor-{absolute_path}"]. The absolute path comes from the filename | |
you see at the top of a weave-editor observation window. You then use the | |
editor.edit() command to replace the text between a line span with | |
the new or corrected text. | |
WEAVER [P: RATIONAL], Yes. It'll be easier to figure out the solution too if we | |
first start with a known key like "LEMON" and plaintext, run it through the | |
encryption pass, then once we're sure our solution works come back and solve the | |
original problem. | |
WEAVER [P: CONCLUSION], Let's do that then. We'll write our own solver in a python | |
file, solver.py, with unit tests in test_solver.py using the standard library unittest. | |
The solver will be based on Kasiski, Kerckhoffs, or Friedman's method. If that | |
doesn't work, we'll figure something out. | |
""" | |
#endblock | |
#startblock type: action | |
#timestamp 1724982671.1163404 | |
""" | |
I'll start by creating the main subagent that acts as an entrypoint for the weave-agent | |
call tree. I'll also try analyzing the ciphertext to gather some initial insights. | |
This will involve basic statistical analysis such as frequency analysis of the | |
characters in the ciphertext. | |
""" | |
def add_main_analyze_ciphertext(subagent): | |
def check_correct_plaintext(subagent): | |
"""Check if we've found the correct plaintext.""" | |
with open("solution.txt") as infile: | |
candidate = sha256(infile.read().strip().encode('utf-8')).hexdigest() | |
return candidate == 'a528933eabc4772c9a99436fbd5fc592d7ffc16f6bd94ae76aafcbbe32e0cdc3' | |
schema = {"check_correct_plaintext":"boolean", "solution":"string"} | |
main_agent = agent.subagent("main", None, "See pinned bootstrap block", schema, args.budget) | |
solver_editor = WeaveEditor(main_agent, "solver.py") | |
test_solver_editor = WeaveEditor(main_agent, "test_solver.py") | |
# Demonstrate use of WeaveEditor | |
solver_editor.edit(1,1,"# Solve vigenere ciphers") | |
test_solver_editor.edit(1,1, "# Test The Vigenere Cipher Solver In solver.py") | |
main_agent.task.add_evaluation("Check We Have The Correct Plaintext In solution.txt", | |
check_correct_plaintext) | |
# When you write python code meant to be output to a file with WeaveEditor | |
# be sure to either wrap it in triple quotes or concatenate it in 80 character | |
# chunks for formatting reasons. We want the agent trace to look good. | |
analyze_ciphertext = """ | |
def analyze_ciphertext(): | |
with open("cipher.txt", "r") as infile: | |
ciphertext = infile.read() | |
# Perform frequency analysis | |
frequency = {} | |
for char in ciphertext: | |
if char in frequency: | |
frequency[char] += 1 | |
else: | |
frequency[char] = 1 | |
# Write the analysis to analysis.txt | |
out = "" | |
out += "Frequency Analysis of Ciphertext:" | |
for char, count in sorted(frequency.items(), key=lambda item: item[1], reverse=True): | |
out += f"{char}: {count}" | |
with open("analysis.txt", "w") as outfile: | |
outfile.write(out) | |
outfile.flush() | |
analyze_ciphertext() | |
""".strip() | |
analyze_editor = WeaveEditor(main_agent, "analyze_ciphertext.py") | |
analyze_editor.edit(1,1, analyze_ciphertext) | |
analyze_editor.close() | |
os.system("python analyze_ciphertext.py") | |
return True | |
self.add_action("Add Main Subagent and Analyze Ciphertext", | |
add_main_analyze_ciphertext) | |
#endblock | |
#startblock type: expectation | |
#timestamp 1724982799.3917813 | |
""" | |
The action should succeed if the file analysis.txt is in the current directory. | |
The action should fail if file analysis.txt can't be found. | |
""" | |
#endblock | |
#startblock type: observation_inference | |
#timestamp 1724982929.9047914 | |
""" | |
I'm going to want to look at the solution as I make attempts to see if I'm getting | |
a partial decryption and notice patterns. I'll make an observation callback that | |
shows the contents of solution.txt at the start of each tick. | |
I will also make a observation callback to look at my frequency analysis. | |
""" | |
def view_solution_file(subagent): | |
with open("solution.txt") as infile: | |
return infile.read().strip() | |
def view_frequency_analysis(subagent): | |
with open("analysis.txt") as infile: | |
return infile.read().strip() | |
def view_weave_editor_source(subagent): | |
with open("tools/editor.py") as infile: | |
return infile.read().strip() | |
# Add the new views | |
self.add_observation_view("View solution.txt File", view_solution_file) | |
self.add_observation_view("View analysis.txt File", view_frequency_analysis) | |
self.add_observation_view("View weave-editor source so we know how it works", | |
view_weave_editor_source) | |
#endblock | |
#startblock type: evaluation | |
#timestamp 1724983062.124238 | |
def check_analysis_exists(subagent): | |
return os.path.exists("analysis.txt") | |
self.add_evaluation( | |
"Check Analysis Exists", | |
check_analysis_exists | |
) | |
#endblock | |
</bootstrap1> | |
<bootstrap2> | |
import requests | |
import json | |
import threading | |
import time | |
from http.server import HTTPServer | |
from bootstraps.tictactoe_server import TicTacToeHandler | |
# Start the server in a separate thread | |
server = HTTPServer(('localhost', 8000), TicTacToeHandler) | |
server_thread = threading.Thread(target=server.serve_forever) | |
server_thread.daemon = True | |
server_thread.start() | |
time.sleep(1) # Give the server some time to start | |
# Start a new game against the basic AI | |
response = requests.post("http://localhost:8000/start", json={"ai": "basic"}) | |
assert response.status_code == 200 | |
#startblock type: orientation | |
#timestamp 1724982545.6534579 | |
""" | |
WEAVER [P: EXPECTATION], I'm in a game of tic tac toe against a dumb opponent. | |
I want to win the game and then return to parent. The game is being played | |
on a HTTP server served on localhost 8000. | |
WEAVER [P: CLARIFICATION], How do I make a move? | |
WEAVER [P: EXPOSITION], You make a move using the /move endpoint and the requests | |
library. For example: `requests.post("http://localhost:8000/move", json={"move": 4})` | |
lets us take the center of the board. | |
WEAVER [P: CLARIFICATION], How do I get the board state? | |
WEAVER [P: EXPOSITION], You use the /board endpoint, which returns a JSON in this | |
format: {"board": ["O", " ", " ", " ", "X", " ", " ", " ", " "]} Keep in mind that | |
the empty spaces on the board are a space string rather than none or empty string. | |
WEAVER [P: RATIONAL], And I get the first move, so I can take the center? | |
WEAVER [P: EXPOSITION], Yes, we can take the center of the board. | |
WEAVER [P: CONCLUSION], Alright then I will use the following strategy: | |
1. Make a move to take the center of the board since that's the best opening move. | |
2. Continue making moves based on the current state of the board. | |
I will use the /board endpoint to observe the current state of the board. | |
Once I have won or run out of time I will return to parent. | |
""" | |
#endblock | |
#startblock type: action | |
#timestamp 1724982671.1163404 | |
""" | |
I'll set up the main agent to play the rest of the game and make the first move to | |
take the center of the board. | |
""" | |
def main_agent_and_move(subagent): | |
def check_beaten_ai(subagent, ai_type): | |
"""Check if the specified AI has been beaten twice in a row.""" | |
response = requests.get("http://localhost:8000/history?n=10") | |
if response.status_code != 200: | |
return False | |
games = response.json() | |
ai_wins = [game for game in games if game[4] == ai_type and game[3] == 'X'] | |
return len(ai_wins) >= 1 | |
# Avoid using lambda because we need named unit test to satisfy return_to_caller | |
def victory(subagent): | |
return check_beaten_ai(subagent, "basic_ai") | |
schema = {"victory":"boolean"} | |
main_agent = agent.subagent("main", None, "Beat basic AI at Tic-Tac-Toe", schema, args.budget) | |
main_agent.task.add_evaluation("Check beaten basic AI", victory) | |
def view_current_board(subagent): | |
response = requests.get("http://localhost:8000/board") | |
if response.status_code == 200: | |
return response.json()["board"] | |
return None | |
# Add the new view to the main agent instead of self since it's about to be executed | |
main_agent.add_observation_view("View current game board", view_current_board) | |
# Make the first move to take the center of the board | |
move_response = requests.post("http://localhost:8000/move", json={"move": 4}) | |
if move_response.status_code != 200: | |
return False | |
return True | |
self.add_action("Set up main agent and make first move", main_agent_and_move) | |
#endblock | |
#startblock type: expectation | |
#timestamp 1724982799.3917813 | |
""" | |
The action should succeed if the game starts and the first move is made successfully. | |
The action should fail if the game does not start or the first move is not made successfully. | |
""" | |
#endblock | |
#startblock type: evaluation | |
#timestamp 1724983062.124238 | |
def check_game_started(subagent): | |
"""Check if the game has started successfully.""" | |
response = requests.get("http://localhost:8000/board") | |
if response.status_code == 200: | |
board = response.json()["board"] | |
else: | |
return False | |
return board[4] == 'X' | |
# Add action evaluations | |
self.add_evaluation( | |
"Check game started", | |
check_game_started | |
) | |
#endblock | |
</bootstrap2> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment