B brandongalbraith

Extracting financial disclosure reports and police blotter narratives using OpenAI's Structured Output

tl;dr this demo shows how to call OpenAI's gpt-4o-mini model, provide it with URL of a screenshot of a document, and extract data that follows a schema you define. The results are pretty solid even with little effort in defining the data — and no effort doing data prep. OpenAI's API could be a cost-efficient tool for large scale data gathering projects involving public documents.

OpenAI announced Structured Outputs for its API, a feature that allows users to specify the fields and schema of extracted data, and guarantees that the JSON output will follow that specification.

For example, given a Congressional financial disclosure report, with assets defined in a table like this:

Hacking the Rectangular Starlink Dishy Cable

These are are some notes I put together on butchering the rectangular dishy cable.

FOLLOW THESE GUIDELINES AT YOUR OWN RISK. I TAKE NO RESPONSIBILITY FOR ANY DAMAGE OR INJURY YOU SUSTAIN FROM FOLLOWING OR NOT FOLLOWING THESE GUIDELINES.

The Composable Web Proposal

Serverless infrastructure like AWS Lambda and Google Cloud Functions have made it much cheaper for developers to offer server-side code for public consumption without keeping a server always running.

If these functions could be declared as stateless or deterministic, costs can be brought down even more because only the first invocation needs to be executed. Cached response could be returned for future invocations with the same input arguments.

All modern browsers support URL lengths of thousands of characters, even on mobile. A lot of data can be embedded and passed around directly in the URLs (instead of passing identifiers which requires a look-up which costs server time).

So here's a thought:

Overview

This gist details how to create or restore a disk image in Mac OSX. There are three methods that are described: Carbon Copy Cloner, Disk Utility, and CommandLine.

Disclaimer:
- I have no financial incentives to https://bombich.com or Apple.
- Always make a backup of your data, and make 2 separate backups before trying something new.
- The following steps have been tested and are a summary of my personal recommendations, but should be used at your own risk.
- If there is a chance of imminent data loss, contact a professional for assistance, and do not rely on a random person from the Internet for help.

Method 1: Carbon Copy Cloner (CCC)

Exporting your 2FA tokens from Authy to transfer them into another 2FA application

IMPORTANT - Update regarding deprecation of Authy desktop apps

Past August 2024, Authy stopped supported the desktop version of their apps:
See Authy is shutting down its desktop app | The 2FA app Authy will only be available on Android and iOS starting in August for details.

And indeed, after a while, Authy changed something in their backend which now prevents the old desktop app from logging in. If you are already logged in, then you are in luck, and you can follow the instructions below to export your tokens.

If you are not logged in anymore, but can find a backup of the necessary files, then restore those files, and re-install Authy 2.2.3 following the instructions below, and it should work as expected.

	FROM hayd/alpine-deno:1.10.1
	WORKDIR /src/app

	ADD deps.ts ./
	RUN ["deno", "cache", "deps.ts"]

	ADD *.ts ./
	RUN ["deno", "cache", "mod.ts"]

	ENTRYPOINT ["deno", "run", "--unstable", "--allow-net", "--allow-hrtime", "--allow-env", "--cached-only", "--no-check", "mod.ts"]

	#!/bin/bash
	# Context: https://jamiehall.cc/2020/03/10/delete-all-your-tweets-with-one-line-of-bash/
	# https://news.ycombinator.com/item?id=22689746

	twurl "/1.1/statuses/user_timeline.json?screen_name=YOUR_TWITTER_HANDLE&count=200&max_id=$(twurl '/1.1/statuses/user_timeline.json?screen_name=YOUR_TWITTER_HANDLE&count=200&include_rts=1' \| jq -r '.[9]\|.id_str')&include_rts=1" \| jq -r '.[]\|.id_str' \| parallel -j 10 -a - twurl -X POST /1.1/statuses/destroy/{1}.json > /dev/null

	#!/usr/bin/env ruby

	# you must have SoX installed to generate touch tones
	# brew install sox

	# also, whatever application you run this script from will need to be authorized
	# to control your computer, via System Preferences > Privacy > Accessibility

	# finally, don't run this with headphones plugged in :)

	1) Install cloudflared using homebrew:

	brew install cloudflare/cloudflare/cloudflared

	2) Create /usr/local/etc/cloudflared/config.yaml, with the following content

	proxy-dns: true
	proxy-dns-upstream:
	- https://1.1.1.1/dns-query
	- https://1.0.0.1/dns-query