Skip to content

Instantly share code, notes, and snippets.

View rodrigoalvesvieira's full-sized avatar
📚

Rodrigo Alves Vieira rodrigoalvesvieira

📚
View GitHub Profile
@rodrigoalvesvieira
rodrigoalvesvieira / README.openai-structured-output-demo.md
Created November 5, 2024 01:58 — forked from dannguyen/README.openai-structured-output-demo.md
A basic test of OpenAI's Structured Output feature against financial disclosure reports and a newspaper's police blotter. Code examples use the Python SDK and pydantic for the schema definition.

Extracting financial disclosure reports and police blotter narratives using OpenAI's Structured Output

tl;dr this demo shows how to call OpenAI's gpt-4o-mini model, provide it with URL of a screenshot of a document, and extract data that follows a schema you define. The results are pretty solid even with little effort in defining the data — and no effort doing data prep. OpenAI's API could be a cost-efficient tool for large scale data gathering projects involving public documents.

OpenAI announced Structured Outputs for its API, a feature that allows users to specify the fields and schema of extracted data, and guarantees that the JSON output will follow that specification.

For example, given a Congressional financial disclosure report, with assets defined in a table like this:

@rodrigoalvesvieira
rodrigoalvesvieira / linux-setup.sh
Created June 19, 2024 00:30 — forked from dhh/linux-setup.sh
linux-setup.sh
# THIS LINUX SETUP SCRIPT HAS MORPHED INTO A WHOLE PROJECT: HTTPS://OMAKUB.ORG
# PLEASE CHECKOUT THAT PROJECT INSTEAD OF THIS OUTDATED SETUP SCRIPT.
#
#
# Libraries and infrastructure
sudo apt update -y
sudo apt install -y \
docker.io docker-buildx \
build-essential pkg-config autoconf bison rustc cargo clang \