Tammo van Lessen vanto

Wahoo Elemnt - Tips, tricks and custom images

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Pre-Transformer Models

Extracting financial disclosure reports and police blotter narratives using OpenAI's Structured Output

tl;dr this demo shows how to call OpenAI's gpt-4o-mini model, provide it with URL of a screenshot of a document, and extract data that follows a schema you define. The results are pretty solid even with little effort in defining the data — and no effort doing data prep. OpenAI's API could be a cost-efficient tool for large scale data gathering projects involving public documents.

OpenAI announced Structured Outputs for its API, a feature that allows users to specify the fields and schema of extracted data, and guarantees that the JSON output will follow that specification.

For example, given a Congressional financial disclosure report, with assets defined in a table like this:

	Dear soon-to-be-former user,

	We've got some fantastic news! Well, it's great news for us anyway. You, on
	the other hand, are fucked.

	We've just been acquired by:

	[ ] Facebook
	[ ] Google
	[ ] Twitter

	FROM traefik:camembert
	ADD traefik.toml .
	EXPOSE 80
	EXPOSE 8080
	EXPOSE 443