Skip to content

Instantly share code, notes, and snippets.

@simonmesmith
simonmesmith / simple_rag.py
Last active November 28, 2023 14:45
Simple RAG for easily embedding documents and querying embeddings
"""
SIMPLE RAG!
This file provides a class, Collection, that makes it easy to add retrieval
augmented generation (RAG) to an application.
There are so many overly complex RAG tools out there, such as LlamaIndex and
LangChain. Even Chroma can be overly complex for some use cases, and I've
run into issues (in Streamlit Share) where Chroma's dependency on SQLite
caused a conflict that I couldn't resolve. Argh!
@simonmesmith
simonmesmith / openai_assistants_api_with_streamlit.py
Created November 7, 2023 19:30
OpenAI Assistants API with Streamlit
"""
This file demonstrates how to use the OpenAI Assistants API with Streamlit.
Users upload a CSV and the assistant writes an article about the data. For
this to work, you'll need to:
1. Install the streamlit and openai packages
2. Get an OpenAI API key
3. Set an environment variable called OPENAI_API_KEY with your API key
4. Create an assistant with Code Interpreter on, and this system message:
"You are an experienced data journalist. You receive a CSV of data from a
@simonmesmith
simonmesmith / book_modernizer.md
Last active October 10, 2023 13:47
Book modernizer

Book modernization with GPT-3.5

This is a proof of concept that uses OpenAI's GPT-3.5 to modernize books.

For example, given this passage from Mary Shelley's The Last Man:

I visited Naples in the year 1818. On the 8th of December of that year, my companion and I crossed the Bay, to visit the antiquities which are scattered on the shores of Baiæ. The translucent and shining waters of the calm sea covered fragments of old Roman villas, which were

@simonmesmith
simonmesmith / openai_streaming_with_functions.py
Last active October 18, 2024 19:02
Use OpenAI API streaming with functions
"""
This example shows how to use the streaming feature with functions.
"""
import json
import os
import sys
from collections.abc import Generator
from typing import Any, Dict, List
@simonmesmith
simonmesmith / smartgpt.py
Created May 9, 2023 14:19
"SmartGPT" experiment inspired by "GPT 4 is Smarter than You Think" video
import openai
from tqdm import tqdm
import os
def gpt(prompt: str) -> str:
"""
This function uses the OpenAI API to generate a response to a given prompt.
Args:
@simonmesmith
simonmesmith / fine_tune_with_t5.md
Last active May 14, 2024 18:57
Fine-tuning T5 with Hugging Face

Fine-tuning T5 with Hugging Face

Recently, I had to fine-tune a T5 model using Hugging Face's libraries.

Unfortunately, there was a lot of outdated information and many conflicting examples online.

If you just want to get going quickly, you can:

  1. Copy this code
  2. Swap in your own dataframe (ensure it has "source_text" and "target_text" columns, or modify parts of the code that depend on them accordingly)
@simonmesmith
simonmesmith / colorgram.md
Last active October 10, 2023 13:45
Colorgram

Colorgram

Extract colors from images and attach them as swatches. See an example here.

Takes one argument: number_of_color_clusters (int), which is pretty self-explanatory. No? Okay, it means how many color clusters (e.g. reddish colors, bluish colors, etc.) to return and add to the swatches.

Usage:

input_path = "input_image.jpg"
@simonmesmith
simonmesmith / pytesseract_tablereader.md
Last active October 10, 2023 13:45
Pytesseract table reader

Pytesseract tablereader

Here's some code that tries to solve a problem that I think I've identified with regard to converting tables in images into dataframes that we can work with programmatically.

Right now, I see two main solutions in market. One is to use third-party APIs like Amazon's and Google's products in this area, which can get expensive at scale. Another is to use or build upon complex code that uses image processing libraries like OpenCV to find gridlines and use these to determine table rows and columns.

My hypothesis is that we can use only Pytesseract to read tables, since it provides coordinates of text in images, and tables follow a standard structure (rows and columns). I've been working on the code here accordingly.

Usage is very simple: