B brandongalbraith

People

`:bowtie:`	😄 `:smile:`	😆 `:laughing:`
😊 `:blush:`	😃 `:smiley:`	☺️ `:relaxed:`
😏 `:smirk:`	😍 `:heart_eyes:`	😘 `:kissing_heart:`
😚 `:kissing_closed_eyes:`	😳 `:flushed:`	😌 `:relieved:`
😆 `:satisfied:`	😁 `:grin:`	😉 `:wink:`
😜 `:stuck_out_tongue_winking_eye:`	😝 `:stuck_out_tongue_closed_eyes:`	😀 `:grinning:`
😗 `:kissing:`	😙 `:kissing_smiling_eyes:`	😛 `:stuck_out_tongue:`

Extracting financial disclosure reports and police blotter narratives using OpenAI's Structured Output

tl;dr this demo shows how to call OpenAI's gpt-4o-mini model, provide it with URL of a screenshot of a document, and extract data that follows a schema you define. The results are pretty solid even with little effort in defining the data — and no effort doing data prep. OpenAI's API could be a cost-efficient tool for large scale data gathering projects involving public documents.

OpenAI announced Structured Outputs for its API, a feature that allows users to specify the fields and schema of extracted data, and guarantees that the JSON output will follow that specification.

For example, given a Congressional financial disclosure report, with assets defined in a table like this:

Overview

This gist details how to create or restore a disk image in Mac OSX. There are three methods that are described: Carbon Copy Cloner, Disk Utility, and CommandLine.

Disclaimer:
- I have no financial incentives to https://bombich.com or Apple.
- Always make a backup of your data, and make 2 separate backups before trying something new.
- The following steps have been tested and are a summary of my personal recommendations, but should be used at your own risk.
- If there is a chance of imminent data loss, contact a professional for assistance, and do not rely on a random person from the Internet for help.

Method 1: Carbon Copy Cloner (CCC)

GENERAL TODO:

The examples are all over the place. They need to be more consistent.
Check that x-archive-queue-derive header. I just skimmed it and it doesn't seem right.
Investigate getting an "[email protected]" address for support requests
Some of the standard metadata fields are repeatable, some are not. State this in the descriptions.
Excellent Hank idea: Quick Start (TL;DR) section to avoid all the gory details
Dang, but this damn thing is hard to read. Will that get better when it gets converted to the PHP wrapper? I have my doubts. May need a some quick George love to give tips for better readability.
All the other 'foo' (read: green) bits below

hashlookup.circl.lu

CIRCL hash lookup is a public API to lookup hash values against known database of files. NSRL RDS database is included. More database will be included in the future. The API is accessible via HTTP ReST API and the API is also described as an OpenAPI.

Get information about the hash lookup database (via ReST)

curl -X 'GET' \
  'https://hashlookup.circl.lu/info' \
 -H 'accept: application/json'

MicroService Proxy Gateway Solutions

Kong, Traefik, Caddy, Linkerd, Fabio, Vulcand, and Netflix Zuul seem to be the most common in microservice proxy/gateway solutions. Kubernetes Ingress is often a simple Ngnix, which is difficult to separate the popularity from other things.

Github Star Trend:

This is just a picture of this link from March 2, 2019

Originally, I had included some other solution

	FROM hayd/alpine-deno:1.10.1
	WORKDIR /src/app

	ADD deps.ts ./
	RUN ["deno", "cache", "deps.ts"]

	ADD *.ts ./
	RUN ["deno", "cache", "mod.ts"]

	ENTRYPOINT ["deno", "run", "--unstable", "--allow-net", "--allow-hrtime", "--allow-env", "--cached-only", "--no-check", "mod.ts"]

	#!/usr/bin/env bash
	# Takes a YouTube URI to a playlist (fairly liberal, it's fine as long
	# as the playlist id can be extracted), and prints a list of URIs in a
	# YouTube playlist.
	#
	# Requires youtube-dl 2014.10.24, tested on youtube-dl
	# 2014.11.02.1. Feature subject to change.
	youtube-dl -j --flat-playlist "$1" \| jq -r '.id' \| sed 's_^_https://youtube.com/v/_'

	function retry(isDone, next) {
	var current_trial = 0, max_retry = 50, interval = 10, is_timeout = false;
	var id = window.setInterval(
	function() {
	if (isDone()) {
	window.clearInterval(id);
	next(is_timeout);
	}
	if (current_trial++ > max_retry) {
	window.clearInterval(id);

	#!/usr/bin/env python

	import subprocess
	import sys

	# pip install flickrapi
	# project home: http://stuvel.eu/flickrapi
	import flickrapi

	api_key = '00000000000000000000000000000000' # obtain your api key at http://www.flickr.com/services