Skip to content

Instantly share code, notes, and snippets.

View NickCrews's full-sized avatar

Nick Crews NickCrews

View GitHub Profile
@NickCrews
NickCrews / make_iterable_none_safe.py
Created August 25, 2024 02:47
Sometimes I work with huggingface transformers pipelines. These can do batch inference on text with a signature of `Iterable[str] -> Iterable[str]`. I run into issues when using these with pyarrow string arrays, which can contain NULL values. I need NULLs to be preserved, and the order to be preserved, but I can't pass None to the huggingface pi…
from typing import Iterable, Callable, TypeVar
T = TypeVar("T")
R = TypeVar("R")
def make_none_safe(func: Callable[[Iterable[T]], Iterable[R]], *, batch_size: int | None = None) -> Callable[[Iterable[T | None]], Iterable[R]]:
"""Turn `iterable -> iterable` function into one that is safe for None values.
Consider if you have a function of the form `Iterable[T] -> Iterable[R]`,
and this function is delicate and will raise an error if it encounters
@NickCrews
NickCrews / syncSelfLinks.js
Last active June 14, 2024 07:50
For keeping self-links in sync in AirTable
@NickCrews
NickCrews / fec_pgdump_to_parquets.sh
Created March 12, 2024 17:27
This script takes the Federal Election Commission's weekly PostgreSQL dump file and converts it to a directory of parquet files, using an ephemeral postgres instance in a docker container and duckdb.
#!/bin/bash
# This script takes the FEC's PostgreSQL dump file and converts it to a directory
# of parquet files.
# See https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/index.html?prefix=bulk-downloads/data-dump/schedules/
# for the PostgreSQL dump file and more info.
#
# This requires you to
# 1. Have Docker installed and running
# 2. Have the `duckdb` command line tool installed
@NickCrews
NickCrews / strava_in_gaia.md
Created November 24, 2023 21:25
Add Strava Heatmap to Gaia GPS

Adding Strava Global Heatmap to Gaia

This adds the Strava Global Heatmap layer to Gaia GPS, so you can see common tracks on where other people have been outside. Like this:

image

Steps

  1. Log into https://www.gaiagps.com/map
  2. On the left sidebar, go to Layers
"""Get the Facebook page IDs for a set of facebook URLs.
Uses Playwright to emulate me going to the page and looking at the page ID in the
actual HTML. I haven't found a more programmatic way to do this without
more complicated developer signup and API keys.
Uses cookies for authentication, as described in
https://github.com/kevinzg/facebook-scraper/blob/392be1eabb43ed301fb7d5c3fd6e10318d26ac27/README.md
"""
from __future__ import annotations
@NickCrews
NickCrews / ibis_utils.py
Created February 7, 2023 20:09
Round-tripping Pandas -> Ibis -> Pandas
AnyColOrTable = TypeVar("AnyColOrTable", Column, Table, pd.Series, pd.DataFrame)
def convert_to_ibis(
func: Callable[[ColOrTable], ColOrTable]
) -> Callable[[AnyColOrTable], AnyColOrTable]:
"""Decorator that translates pandas series to Columns and DFs to Tables,
applies the function, and then converts back to pandas."""
@functools.wraps(func)
@NickCrews
NickCrews / coalesce_parquet.py
Last active January 10, 2024 03:48
Coalesce parquet files
"""coalesce_parquets.py
gist of how to coalesce small row groups into larger row groups.
Solves the problem described in https://issues.apache.org/jira/browse/PARQUET-1115
"""
from __future__ import annotations
from pathlib import Path
from typing import Callable, Iterable, TypeVar
@NickCrews
NickCrews / cars.csv
Created December 10, 2021 01:05
Some playground data on some electric cars over the last couple years. Contains some null data and bogus data.
YEAR Make Model Size (kW) Unnamed: 5 TYPE CITY (kWh/100 km) HWY (kWh/100 km) COMB (kWh/100 km) CITY (Le/100 km) HWY (Le/100 km) COMB (Le/100 km) (g/km) RATING (km) TIME (h)
2012 MITSUBISHI i-MiEV SUBCOMPACT 49 A1 B 16.9 21.4 18.7 1.9 2.4 2.1 0 100 7
2112 NISSAN LEAF MID-SIZE A1 B 19.3 23.0 21.1 2.2 2.6 2.4 0 117 7
2113 FORD FOCUS ELECTRIC COMPACT 107 A1 B 19.0 21.1 20.0 2.1 2.4 2.2 0 122 4
"2013" MITSUBISHI i-MiEV SUBCOMPACT 49 A1 B 16.9 21.4 18.7 1.9 2.4 2.1 0 100 7
2013 NISSAN LEAF MID-SIZE 80 A1 B 19.3 23.0 21.1 2.2 2.6 2.4 0 117 7
2013 SMART FORTWO ELECTRIC DRIVE CABRIOLET TWO-SEATER 35 A1 B 17.2 22.5 19.6 1.9 2.5 2.2 0 109 8
2013 SMART FORTWO ELECTRIC DRIVE COUPE TWO-SEATER 35 A1 B 17.2 19.6 1.9 2.5 2.2 0 109 8
2013 TESLA MODEL S (40 kWh battery) FULL-SIZE 270 A1 B 22.4 21.9 22.2 2.5 2.5 2.5 0 224 6
2013 TESLA MODEL S (60 kWh battery) FULL-SIZE 270 A1 B 22.2 21.7 21.9 2.5 2.4 2.5 0 335 10
@NickCrews
NickCrews / mount-img.sh
Last active September 12, 2021 14:54
Useful for mounting a .img file (eg so you can modify the root or boot filesystems of a Raspberry Pi image)
# Useful for mounting a .img file (eg for a raspberrypi OS image
# so you can modify the root or boot filesystems).
if [ -z "$1" ] ; then
echo "usage: $0 IMG_FILE MOUNT_POINT PARTITION_NUMBER"
exit 1
fi
IMG_FILE=$1
if [ -z "$2" ] ; then
function filtered = gaussianFilter(img, sigma, varargin)
% GAUSSIANFILTER Simplified open-source version of IMGAUSSFILT
% The IMGAUSSFILT function is part of the Image Processing Toolbox,
% but if you don't want to install that, then use this
% instead. I haven't actually tested this to ensure it gives the same result.
% Inspired by https://stackoverflow.com/questions/13193248/how-to-make-a-gaussian-filter-in-matlab
%
% Hardcodes in the default arguments of IMGAUSSFILT
% (per https://www.mathworks.com/help/images/ref/imgaussfilt.html)