- Shift + tab + tab
- Use “opus” for planning and Sonnet for everything else, /model
Extracting financial disclosure reports and police blotter narratives using OpenAI's Structured Output
tl;dr this demo shows how to call OpenAI's gpt-4o-mini model, provide it with URL of a screenshot of a document, and extract data that follows a schema you define. The results are pretty solid even with little effort in defining the data — and no effort doing data prep. OpenAI's API could be a cost-efficient tool for large scale data gathering projects involving public documents.
OpenAI announced Structured Outputs for its API, a feature that allows users to specify the fields and schema of extracted data, and guarantees that the JSON output will follow that specification.
For example, given a Congressional financial disclosure report, with assets defined in a table like this:
Note: I have moved this list to a proper repository. I'll leave this gist up, but it won't be updated. To submit an idea, open a PR on the repo.
Note that I have not tried all of these personally, and cannot and do not vouch for all of the tools listed here. In most cases, the descriptions here are copied directly from their code repos. Some may have been abandoned. Investigate before installing/using.
The ones I use regularly include: bat, dust, fd, fend, hyperfine, miniserve, ripgrep, just, cargo-audit and cargo-wipe.
| #!/usr/bin/awk -f | |
| # This program is a copy of guff, a plot device. https://github.com/silentbicycle/guff | |
| # My copy here is written in awk instead of C, has no compelling benefit. | |
| # Public domain. @thingskatedid | |
| # Run as awk -v x=xyz ... or env variables for stuff? | |
| # Assumptions: the data is evenly spaced along the x-axis | |
| # TODO: moving average |
GraphQL:
mutation ($executionParams: ExecutionParams!) {
startPipelineExecution(executionParams: $executionParams) {
...startPipelineExecutionResultFragment
}
}
fragment startPipelineExecutionResultFragment on StartPipelineExecutionResult {
| #!/bin/sh | |
| if [ $# -eq 0 ]; then | |
| echo "usage: $0 [cargo options]" >&2 | |
| exit 64 | |
| fi | |
| if [ ! -r Cargo.lock ]; then | |
| echo "Not a cargo project: missing Cargo.lock" >&2 | |
| exit 1 |
| # Copyright (c) 2021 Zecong Hu | |
| # | |
| # Permission to use, copy, modify, and/or distribute this software for any | |
| # purpose with or without fee is hereby granted. | |
| # | |
| # THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH | |
| # REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY | |
| # AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, | |
| # INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM | |
| # LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR |
| # The following is based on a fresh Ubuntu 18.04.2 LTS with gcc, make and perl already | |
| # installed (not that perl has anything to do with any of this..) | |
| # | |
| # This is based on JDK 11 but for JDK 12 I've confirmed it builds ok with binutils-2.29 | |
| # and binutils-2.32 | |
| apt install mercurial | |
| # Be nice if this didn't check out the universe | |
| hg clone http://hg.openjdk.java.net/jdk/jdk11/ | |
| cd jdk11/src/utils/hsdis/ |
| default['sshd']['sshd_config']['AuthenticationMethods'] = 'publickey,keyboard-interactive:pam' | |
| default['sshd']['sshd_config']['ChallengeResponseAuthentication'] = 'yes' | |
| default['sshd']['sshd_config']['PasswordAuthentication'] = 'no' |