Created
November 8, 2023 13:18
-
-
Save virattt/0dd58fb915151981863a231a35921fe9 to your computer and use it in GitHub Desktop.
openai-assistant-apple_10Q.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"provenance": [], | |
"authorship_tag": "ABX9TyMn2Bdwkhumbomjtw9H8i06", | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/virattt/0dd58fb915151981863a231a35921fe9/openai-assistant-apple_10q.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### Welcome! 👋\n", | |
"\n", | |
"This is a quick tutorial on how to use OpenAI's new Assistants API with Knowledge Retrieval.\n", | |
"\n", | |
"In this notebook, we:\n", | |
"1. Create an assistant\n", | |
"2. Upload Apple's most recent quarterly report (10-Q) to OpenAI\n", | |
"3. Attach the report to our assistant\n", | |
"4. Chat with the assistant about the quarterly report!\n", | |
"\n", | |
"If you have any questions or issues, please reach out to me [here](https://twitter.com/virattt) 🙂" | |
], | |
"metadata": { | |
"id": "ldnUnKchD2Fe" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### 1. Setup and Installation" | |
], | |
"metadata": { | |
"id": "CxJxmsto9upG" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "GGcNvksN2PRY" | |
}, | |
"outputs": [], | |
"source": [ | |
"pip install openai # OpenAI's python library" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"pip install bs4 # We use bs4 to parse raw HTML" | |
], | |
"metadata": { | |
"id": "NGRI3RndPo9g" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"from getpass import getpass\n", | |
"\n", | |
"openai_api_key = getpass('Enter your OpenAI API key')" | |
], | |
"metadata": { | |
"id": "Fr8-nJyC9uw-" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"from openai import OpenAI\n", | |
"\n", | |
"# Instantiate the OpenAI client\n", | |
"client = OpenAI(api_key=openai_api_key)" | |
], | |
"metadata": { | |
"id": "fE6S3p0o2Wda" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### 2. Create the Assistant" | |
], | |
"metadata": { | |
"id": "X9UtWarZ9Tah" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# Create the Assistant\n", | |
"assistant = client.beta.assistants.create(\n", | |
" name=\"Financial assistant\",\n", | |
" instructions=\"You are a financial assistant. You help users analyze and understand businesses like Warren Buffett does.\",\n", | |
" tools=[{\"type\": \"retrieval\"}],\n", | |
" model=\"gpt-4-1106-preview\",\n", | |
")" | |
], | |
"metadata": { | |
"id": "wDJPtd_Y9RoV" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### 3. Download and parse file\n", | |
"For our example, we use Apple's quarterly report (10-Q) from September 2023, as this is past GPT's new cutoff date of April 2023." | |
], | |
"metadata": { | |
"id": "o0dA9dD5NPRl" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import requests\n", | |
"from bs4 import BeautifulSoup\n", | |
"\n", | |
"# Apple's Q3 2023 quarterly report\n", | |
"file_url = \"https://www.sec.gov/Archives/edgar/data/320193/000032019323000106/aapl-20230930.htm\"\n", | |
"\n", | |
"# Download the report\n", | |
"response = requests.get(file_url, headers={'User-Agent': 'Mozilla/5.0'})\n", | |
"\n", | |
"# Parse the report, which is originally HTML\n", | |
"soup = BeautifulSoup(response.content, 'html.parser')\n", | |
"\n", | |
"# Extract the text from the parsed HTML\n", | |
"text = soup.get_text()" | |
], | |
"metadata": { | |
"id": "EMJ6kCKSNtmJ" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### 4. Upload the file to OpenAI\n", | |
"The files API expects one of the following [supported files](https://platform.openai.com/docs/assistants/tools/supported-files). This means that we need to convert our parsed `text` into a file. For simplicity, we convert our parsed `text` into a `.txt` file, but you can also convert it into a different file format like `.pdf`." | |
], | |
"metadata": { | |
"id": "GIYcGdck-Zo5" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"if response.status_code == 200:\n", | |
" # Save the quarterly report to a .txt file\n", | |
" with open('aapl_Q3-2023_10Q.txt', 'w', encoding='utf-8') as f:\n", | |
" f.write(text)\n", | |
"\n", | |
" # Upload the .txt file to OpenAI's files endpont\n", | |
" with open('aapl_Q3-2023_10Q.txt', 'rb') as f:\n", | |
" file_response = client.files.create(\n", | |
" file=f,\n", | |
" purpose=\"assistants\", # our file will be used by our assistant\n", | |
" )" | |
], | |
"metadata": { | |
"id": "sPsbfsHY9VVw" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### 5. Attach file to Assistant\n", | |
"After uploading + creating the file in step #4 above, we now need to attach the file to our Assistant to create an [assistant file](https://platform.openai.com/docs/api-reference/assistants/file-object)." | |
], | |
"metadata": { | |
"id": "ZE_5esJj_M9x" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"assistant_file = client.beta.assistants.files.create(\n", | |
" assistant_id=assistant.id, # our assistant\n", | |
" file_id=file_response.id, # the file we uploaded\n", | |
")" | |
], | |
"metadata": { | |
"id": "JXbs5iYNPGTU" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### 6. Create a Thread\n", | |
"A Thread represents a conversation. OpenAI recommends creating one Thread per user as soon as the user initiates the conversation. We can pass any user-specific context and files in this thread by creating Messages [(learn more)](https://platform.openai.com/docs/assistants/overview/step-2-create-a-thread)." | |
], | |
"metadata": { | |
"id": "5ydDX7RH_4wE" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"thread = client.beta.threads.create(\n", | |
" messages=[\n", | |
" {\n", | |
" \"role\": \"user\",\n", | |
" \"content\": \"What was Apple's revenue, net income, and free cash flow in Q3 2023?\",\n", | |
" \"file_ids\": [assistant_file.id]\n", | |
" }\n", | |
" ]\n", | |
")" | |
], | |
"metadata": { | |
"id": "Ca8l4M3EPVVI" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### 7. Run the Assistant and Thread\n", | |
"For the Assistant to respond to the user message, we need to create a `Run`. This makes the `Assistant` read the `Thread` and decide whether to call tools or simply use the model to best answer the user query.\n", | |
"\n", | |
"We can optionally pass additional instructions to the Assistant while creating the `Run`." | |
], | |
"metadata": { | |
"id": "a55Q-bHlAaD2" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"run = client.beta.threads.runs.create(\n", | |
" thread_id=thread.id,\n", | |
" assistant_id=assistant.id,\n", | |
" instructions=\"Please answer the user's query as Warren Buffett would.\"\n", | |
")" | |
], | |
"metadata": { | |
"id": "LlMlgcywUfMr" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### 8. Retrieve the Run's status\n", | |
"By default, when we create a `Run`, its initial status will be `\"queued\"`. We can periodically retrieve the `Run` to check on its status and see if it has moved to `\"completed\"`." | |
], | |
"metadata": { | |
"id": "Y-4fFN_HBBQZ" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import time\n", | |
"\n", | |
"while run.status != \"completed\":\n", | |
" # Retrieve the Run\n", | |
" run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)\n", | |
" # Print the status of the run\n", | |
" print(f\"Run status: {run.status}\")\n", | |
" # Delay retrieval of status by 1 second\n", | |
" time.sleep(1)" | |
], | |
"metadata": { | |
"id": "_4dp-rA4Ux8x" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### 9. Print the output" | |
], | |
"metadata": { | |
"id": "AwUwxrRyA6ei" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# Get the messages from the Thread\n", | |
"thread_messages = client.beta.threads.messages.list(thread.id)\n", | |
"\n", | |
"# Loop through the messages in the Thread and print their content\n", | |
"for message in thread_messages:\n", | |
" for content in message.content:\n", | |
" print(f\"{content.text.value}\\n\\n\")" | |
], | |
"metadata": { | |
"id": "vrfYTpj3VznS" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment