Skip to content

Instantly share code, notes, and snippets.

@dannguyen
dannguyen / README.openai-structured-output-demo.md
Last active November 17, 2024 03:44
A basic test of OpenAI's Structured Output feature against financial disclosure reports and a newspaper's police blotter. Code examples use the Python SDK and pydantic for the schema definition.

Extracting financial disclosure reports and police blotter narratives using OpenAI's Structured Output

tl;dr this demo shows how to call OpenAI's gpt-4o-mini model, provide it with URL of a screenshot of a document, and extract data that follows a schema you define. The results are pretty solid even with little effort in defining the data — and no effort doing data prep. OpenAI's API could be a cost-efficient tool for large scale data gathering projects involving public documents.

OpenAI announced Structured Outputs for its API, a feature that allows users to specify the fields and schema of extracted data, and guarantees that the JSON output will follow that specification.

For example, given a Congressional financial disclosure report, with assets defined in a table like this:

@siliconvallaeys
siliconvallaeys / PMax Search Terms and Categories in a Spreadsheet
Last active October 5, 2024 10:48
Add search terms and category labels from Performance Max campaigns on Google Ads to a Google spreadsheet automatically
function main() {
/******************************************
* PMax Search Terms Report
* @version: 1.0
* @authors: Frederick Vallaeys (Optmyzr)
* -------------------------------
* Install this script in your Google Ads account (not an MCC account)
* to generate a spreadsheet containing the search terms in your Performance Max campaigns.
* The spreadsheet also includes data about category labels (groupings of search terms).
* Metrics include conversion value, conversions, clicks, and impressions