Skip to content

Instantly share code, notes, and snippets.

View smcalilly's full-sized avatar
💭
waiting on ai to do my work

Sam McAlilly smcalilly

💭
waiting on ai to do my work
View GitHub Profile
@dannguyen
dannguyen / README.openai-structured-output-demo.md
Last active November 17, 2024 03:44
A basic test of OpenAI's Structured Output feature against financial disclosure reports and a newspaper's police blotter. Code examples use the Python SDK and pydantic for the schema definition.

Extracting financial disclosure reports and police blotter narratives using OpenAI's Structured Output

tl;dr this demo shows how to call OpenAI's gpt-4o-mini model, provide it with URL of a screenshot of a document, and extract data that follows a schema you define. The results are pretty solid even with little effort in defining the data — and no effort doing data prep. OpenAI's API could be a cost-efficient tool for large scale data gathering projects involving public documents.

OpenAI announced Structured Outputs for its API, a feature that allows users to specify the fields and schema of extracted data, and guarantees that the JSON output will follow that specification.

For example, given a Congressional financial disclosure report, with assets defined in a table like this:

@ZaxR
ZaxR / nearest_neighbors.py
Last active January 17, 2023 10:17
Find nearest neighbors by lat/long using Haversine distance with a BallTree
"""
Example:
# All locations; also locations FROM which we want to find nearest neighbors
locations = pd.DataFrame({"LOCATION_NAME": ["Chicago, IL", "New York, NY", "San Fransisco, CA"],
"LATITUDE": [1, 2, 3],
"LONGITUDE": [1, 2, 3],
"ID": [1, 2, 3]})
locations = locations.apply(lambda x: Location(location_name=x['LOCATION_NAME'],
latitude=x['LATITUDE'],
longitude=x['LONGITUDE'],
@mrmartineau
mrmartineau / stimulus.md
Last active September 24, 2024 21:07
Stimulus cheatsheet
@adejones
adejones / xlsx_dict_reader.py
Last active June 23, 2021 15:24
openpyxl (2.4.8) sheet-to-dict like the csv DictReader - based on https://gist.github.com/mdellavo/853413
from openpyxl import load_workbook
def XLSXDictReader(f):
book = load_workbook(f)
sheet = book.active
rows = sheet.max_row
cols = sheet.max_column
headers = dict((i, sheet.cell(row=1, column=i).value) for i in range(1, cols))
def item(i, j):
return (sheet.cell(row=1, column=j).value, sheet.cell(row=i, column=j).value)
@rueycheng
rueycheng / GNU-Make.md
Last active October 21, 2024 20:47
GNU Make cheatsheet
@Kartones
Kartones / postgres-cheatsheet.md
Last active November 15, 2024 21:14
PostgreSQL command line cheatsheet

PSQL

Magic words:

psql -U postgres

Some interesting flags (to see all, use -h or --help depending on your psql version):

  • -E: will describe the underlaying queries of the \ commands (cool for learning!)
  • -l: psql will list all databases and then exit (useful if the user you connect with doesn't has a default database, like at AWS RDS)
@adamwiggins
adamwiggins / adams-heroku-values.md
Last active November 5, 2024 21:40
My Heroku values

Make it real

Ideas are cheap. Make a prototype, sketch a CLI session, draw a wireframe. Discuss around concrete examples, not hand-waving abstractions. Don't say you did something, provide a URL that proves it.

Ship it

Nothing is real until it's being used by a real user. This doesn't mean you make a prototype in the morning and blog about it in the evening. It means you find one person you believe your product will help and try to get them to use it.

Do it with style

@dminkovsky
dminkovsky / dict_to_view.py
Last active July 10, 2020 01:13
Django: Generate corresponding JSON and rendered template (HTML) views from the same data dictionary.
from django.http import HttpResponse
from django.utils.functional import wraps
from django.template.context import RequestContext
from django.template.loader import get_template
from django.utils import simplejson as json
def dict_to_template_view(**template_args):
"""
Takes a function that returns a dict and decorates it into a template-rendering view
"""
@jboner
jboner / latency.txt
Last active November 16, 2024 21:28
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD