Skip to content

Instantly share code, notes, and snippets.

View dchud's full-sized avatar

Dan Chudnov dchud

View GitHub Profile
@dchud
dchud / followers.py
Created October 20, 2016 04:16
repeats fetching of follower lists from twitter for a small number of users
#!/usr/bin/env python
"""
Simple tool to fetch follower lists every n seconds and store them
with time-based filenames. Can be later merged, deduped, and fed to
users/lookup method to extract full user info.
"""
import argparse
import datetime
@dchud
dchud / keybase.md
Created April 25, 2016 16:12
keybase.md

Keybase proof

I hereby claim:

  • I am dchud on github.
  • I am dchud (https://keybase.io/dchud) on keybase.
  • I have a public key ASCMTFzlMORcg1rB4eJb0Pb2JvL15aBDlpFvkJ10gHD8swo

To claim this, I am signing this object:

@dchud
dchud / config.json
Last active December 21, 2017 22:21
sample of deduped businesses
{
"field_names": ["estab_name", "site_address", "site_city", "site_state",
"site_zip", "nr_in_estab", "owner_type"],
"field_definitions": [{"field": "estab_name", "type": "String"},
{"field": "site_address", "type": "Address"},
{"field": "site_city", "type": "ShortString",
"Has Missing": true},
{"field": "site_state", "type": "ShortString",
"Has Missing": true},
{"field": "site_zip", "type": "ShortString",
@dchud
dchud / README.txt
Last active November 9, 2018 12:29
Testing dbplus VM for analytics class
This describes installing a virtual machine configured for use in a data warehousing
for analytics course. Students will be working with Jupyter notebooks (Python, R, Spark),
the unix (ubuntu-14.04) command line, MySQL, Spyder, PostgreSQL, and a few other things
as they come up.
The box contains a lot of stuff, and is rather big compared to a standard Ubuntu ISO, say.
You will need at least 3Gb free on your host machine to download it, and probably at least
double that to run it. Because the download file is big, you will want to be on a network
with a fat pipe.
@dchud
dchud / README.md
Last active August 29, 2015 14:09
animated Anscombe's Quartet regression diagnostics

What makes the Anscombe's Quartet of datasets useful, as wikipedia explains, is their near-equivalent summary stats: the x and y sets share the same mean, sample variance, correlation and simple linear regression model. It's instructive as a clear example of what to watch out for when developing simple linear regressions, and the issues each dataset highlights come clear in the different diagnostic plots.

The animation allows visual tracking of each data point through the

@dchud
dchud / 20141001-filecounts.min.json
Last active August 29, 2015 14:07
A year's worth of files
[["2014-09-22 22:00:00Z", 119], ["2014-09-22 09:00:00Z", 120], ["2014-09-22 03:00:00Z", 120], ["2014-09-22 08:00:00Z", 120], ["2014-09-22 21:00:00Z", 120], ["2014-09-22 14:00:00Z", 120], ["2014-09-22 06:00:00Z", 120], ["2014-09-22 18:00:00Z", 120], ["2014-09-22 01:00:00Z", 120], ["2014-09-22 00:00:00Z", 120], ["2014-09-22 02:00:00Z", 120], ["2014-09-22 10:00:00Z", 119], ["2014-09-22 11:00:00Z", 120], ["2014-09-22 12:00:00Z", 120], ["2014-09-22 16:00:00Z", 119], ["2014-09-22 20:00:00Z", 120], ["2014-09-22 07:00:00Z", 120], ["2014-09-22 17:00:00Z", 118], ["2014-09-22 15:00:00Z", 120], ["2014-09-22 04:00:00Z", 120], ["2014-09-22 05:00:00Z", 120], ["2014-09-22 23:00:00Z", 120], ["2014-09-22 13:00:00Z", 120], ["2014-09-22 19:00:00Z", 120], ["2014-09-09 22:00:00Z", 119], ["2014-09-09 09:00:00Z", 120], ["2014-09-09 03:00:00Z", 120], ["2014-09-09 08:00:00Z", 120], ["2014-09-09 21:00:00Z", 120], ["2014-09-09 14:00:00Z", 120], ["2014-09-09 06:00:00Z", 120], ["2014-09-09 18:00:00Z", 120], ["2014-09-09 01:00:00Z", 120],
@dchud
dchud / dspace-4.0-bin-install-guide
Last active August 29, 2015 13:56
20-minute guide to installing DSpace-4.0 from binary release on Ubuntu 12.04 LTS
# dchud's 20 minute guide to installing DSpace 4.0 from the binary release
# on a clean ubuntu 12.04 server running on aws ec2 or where-have-you
#
# this guide assumes you are already comfortable with *nix system administration
#
# this guide leaves out anything to do with the many details of configuring
# DSpace itself; it just gets you to the "it's up and running" step
#
# to start: get your clean ubuntu-12.04 system up to date
$ sudo apt-get update && sudo apt-get upgrade
@dchud
dchud / ol-cover-identifiers.txt
Last active December 27, 2015 15:28
unique categories of identifiers found in open library cover identifier mapping file
1sbn
alecso
alexandriava.gov
alibris_id
almedina
amazon
amazon_asin#
amazon.ca_asin
amazon.co.uk_asin
amazon.de
@dchud
dchud / gist:6497887
Created September 9, 2013 16:11
archivesspace rc1 install log. ubuntu 12.04.
$ sudo apt-get update
$ sudo apt-get install default-jre
$ java -version
java version "1.6.0_27"
download archivesspace-1.0.0RC1.tar.gz from archivespace
$ gunzip archivesspace-1.0.0RC1.tar.gz
$ tar -xf archivesspace-1.0.0RC1.tar
@dchud
dchud / gist:5911546
Created July 2, 2013 17:57
diff to comment out all invoking of solr without apache2-level errors
dchud@gwdev-dchud12:~/public_html/ncsu-quicksearch (master *)$ git diff
diff --git a/bestbets/bestbets.php b/bestbets/bestbets.php
index ca547bd..703258a 100644
--- a/bestbets/bestbets.php
+++ b/bestbets/bestbets.php
@@ -45,7 +45,7 @@ class BestBet {
// instantiate new SolrPhpClient service with connection
// to best bets solr index
- $solr = new Apache_Solr_Service('HOST', PORT, 'SOLRULR'); // EDIT