Skip to content

Instantly share code, notes, and snippets.

View tdhopper's full-sized avatar
©️
𝔀𝓸𝓻𝓴𝓲𝓷𝓰 𝓱𝓪𝓻𝓭

Tim Hopper tdhopper

©️
𝔀𝓸𝓻𝓴𝓲𝓷𝓰 𝓱𝓪𝓻𝓭
View GitHub Profile
@johnynek
johnynek / gist:8961994
Last active August 29, 2015 13:56
Some Questions with Sketch Monoids

Unifying Sketch Monoids

As I discussed in Algebra for Analytics, many sketch monoids, such as Bloom filters, HyperLogLog, and Count-min sketch, can be described as a hashing (projection) of items into a sparse space, then using two different commutative monoids to read and write respectively. Finally, the read monoids always have the property that (a + b) <= a, b and the write monoids has the property that (a + b) >= a, b.

##Some questions:

  1. Note how similar CMS and Bloom filters are. The difference: bloom hashes k times onto the same space, CMS hashes k times onto a k orthogonal subspaces. Why the difference? Imagine a fixed space bloom that hashes onto k orthogonal spaces, or an overlapping CMS that hashes onto k * m length space. How do the error asymptotics change?
  2. CMS has many query modes (dot product, etc...) can those generalize to other sketchs (HLL, Bloom)?
  3. What other sketch or non-sketch algorithms can be expressed in this dual mo
# alias to edit commit messages without using rebase interactive
# example: git reword commithash message
reword = "!f() {\n GIT_SEQUENCE_EDITOR=\"sed -i 1s/^pick/reword/\" GIT_EDITOR=\"printf \\\"%s\\n\\\" \\\"$2\\\" >\" git rebase -i \"$1^\";\n git push -f;\n}; f"
# aliases to change a git repo from private to public, and public to private using gh-cli
alias gitpublic="gh repo edit --accept-visibility-change-consequences --visibility public"
alias gitprivate="gh repo edit --accept-visibility-change-consequences --visibility private"
# delete all your repos using gh-cli (please do not run this unless you want to delete all your repos)
gh repo list --limit 300 --json url -q '.[].url' | xargs -n1 gh repo delete --yes
@lisamelton
lisamelton / transcode-video.sh
Last active April 29, 2025 20:17
Transcode video file (works best with Blu-ray or DVD rip) into MP4 (or optionally Matroska) format, with configuration and at bitrate similar to popular online downloads.
#!/bin/bash
#
# transcode-video.sh
#
# Copyright (c) 2013-2015 Don Melton
#
about() {
cat <<EOF
$program 5.13 of April 8, 2015
@macdiva
macdiva / tmux-named
Created November 19, 2013 15:42 — forked from indirect/tmux-named
#!/bin/bash
# Set up paths and whatnot
test -e ~/.bashrc && source ~/.bashrc
# We need tmux. Obvs.
if [[ -z `which tmux` ]]; then echo "You need tmux first!"; exit 1; fi
# Named variables are much more flexible
name="$1"
@minrk
minrk / nbstripout
Last active March 12, 2025 18:41
git pre-commit hook for stripping output from IPython notebooks
#!/usr/bin/env python
"""strip outputs from an IPython Notebook
Opens a notebook, strips its output, and writes the outputless version to the original file.
Useful mainly as a git filter or pre-commit hook for users who don't want to track output in VCS.
This does mostly the same thing as the `Clear All Output` command in the notebook UI.
LICENSE: Public Domain
@iamatypeofwalrus
iamatypeofwalrus / roll_ipython_in_aws.md
Last active February 21, 2025 18:39
Create an iPython HTML Notebook on Amazon's AWS Free Tier from scratch.

What

Roll your own iPython Notebook server with Amazon Web Services (EC2) using their Free Tier.

What are we using? What do you need?

  • An active AWS account. First time sign-ups are eligible for the free tier for a year
  • One Micro Tier EC2 Instance
  • With AWS we will use the stock Ubuntu Server AMI and customize it.
  • Anaconda for Python.
  • Coffee/Beer/Time
@jehiah
jehiah / iphone_messages_dump.py
Last active September 28, 2020 03:53
Script to dump out messages to csv from an iPhone Backup sqlite file
# Copyright Jehiah Czebotar 2013
# http://jehiah.cz/
import tornado.options
import glob
import os
import sqlite3
import logging
import datetime
import csv
@piscisaureus
piscisaureus / pr.md
Created August 13, 2012 16:12
Checkout github pull requests locally

Locate the section for your github remote in the .git/config file. It looks like this:

[remote "origin"]
	fetch = +refs/heads/*:refs/remotes/origin/*
	url = [email protected]:joyent/node.git

Now add the line fetch = +refs/pull/*/head:refs/remotes/origin/pr/* to this section. Obviously, change the github url to match your project's URL. It ends up looking like this:

@dbu
dbu / git-status-recursive
Created May 31, 2012 14:14
recursive git status
#!/usr/bin/php
<?php
$repos = array();
exec('find -type d -name .git | sed -e "s/\.git//"', $repos);
foreach ($repos as $repo) {
$status = shell_exec("cd $repo && git status");
if (false == strpos($status, 'nothing to commit (working directory clean)')) {
echo "$repo\n" . str_repeat('-', strlen($repo)) . "\n$status\n\n";
}
}
#!/usr/bin/env python
#
# Converts any integer into a base [BASE] number. I have chosen 62
# as it is meant to represent the integers using all the alphanumeric
# characters, [no special characters] = {0..9}, {A..Z}, {a..z}
#
# I plan on using this to shorten the representation of possibly long ids,
# a la url shortenters
#