Skip to content

Instantly share code, notes, and snippets.

View igorbrigadir's full-sized avatar

Igor Brigadir igorbrigadir

View GitHub Profile
@rufuspollock
rufuspollock / pdf2xxx.md
Last active November 15, 2016 15:58
PDF 2 XXX. Tools, libraries and tutorials for converting PDFs to something more machine usable

Additions wanted - please just fork and add.

Tutorials

  • Parsing PDFs by Thomas Levine
  • [Get Started With Scraping – Extracting Simple Tables from PDF Documents][scoda-simple-tables]

Generic (PDF -> text)

@willurd
willurd / web-servers.md
Last active November 10, 2025 16:13
Big list of http static server one-liners

Each of these commands will run an ad hoc http static server in your current (or specified) directory, available at http://localhost:8000. Use this power wisely.

Discussion on reddit.

Python 2.x

$ python -m SimpleHTTPServer 8000
@mathiasbynens
mathiasbynens / md5-collision.js
Last active October 22, 2021 23:36
Verify the most famous MD5 collision example in JavaScript, using nothing but built-in Node libraries.
#!/usr/bin/env node
// Verify the most famous MD5 collision example in JavaScript, using nothing but
// built-in Node modules.
var crypto = require('crypto');
var ucs2encode = require('punycode').ucs2.encode;
var assert = require('assert');
var md5 = function(string) {
@emaadmanzoor
emaadmanzoor / ExpandEdinburghFSDCorpus.md
Last active October 31, 2020 20:30
Expand the Edinburgh Twitter FSD corpus

Expand The Edinburgh Twitter FSD Corpus

The Python scripts attached here take care of the following tedious work, and should help one quickly get started with some real work on the corpus:

  • Respect the Twitter API rate limits and throttle API hits.
  • Don't hit the API for already expanded tweet ID's, so you can resume tweet expansion after stopping midway.
  • Parse the API response and dump it into the correct column in the sqlite3 database.
  • Gracefully handle exceptions while acquiring tweets from the API.
  • Wrap version 1.1 of the Twitter API.
  • Start from a specified tweet ID, assuming the input file is sorted in increasing order of tweet ID.
@biovisualize
biovisualize / README.md
Last active July 29, 2019 14:42
Test for attaching a png to a gist
  1. Create a new public gist on https://gist.github.com/
  2. Under "Clone this gist", copy the link (i.e., https://gist.github.com/4415518.git)
  3. If you have the command line git tools, clone this gist to a local folder: git clone https://gist.github.com/4415518.git
  4. It will add a folder with the gist id as a name (i.e., 4415518) under the current working directory. Navigate to this folder in the command line: cd 4415518 (dir 4415518 on windows)
  5. Navigate to this folder in your file explorer and add an image (i.e., test.png)
  6. Add it to git from the command line: git add test.png
  7. Commit it to git: git commit -m "I just added a file!"
  8. Push this commit to your remote gist (you will need your Github user name and password): git push
  9. Go back and refresh your Gist on https://gist.github.com/ to confirm that it worked
@dsparks
dsparks / nominate_example.R
Created December 3, 2012 13:36
Running NOMINATE
doInstall <- TRUE
toInstall <- c("wnominate", "ggplot2")
if(doInstall){install.packages(toInstall, repos = "http://cran.us.r-project.org")}
lapply(toInstall, library, character.only = TRUE)
# Load most recent senate roll call data:
rollCall <- readKH("http://amypond.sscnet.ucla.edu/rollcall/static/S112.ord")
# Run wnominate on the roll call object
nDims <- 3
@jdunck
jdunck / redis_leaky_bucket.py
Created November 17, 2012 08:16
leaky bucket queue - redis 2.6 + lua + python
#cribbed from http://vimeo.com/52569901 (Twilio carrier call origination moderation)
# The idea is that many fan-in queues can enqueue at any rate, but
# dequeue needs to happen in a rate-controlled manner without allowing
# any individual input queue to starve other queues.
# http://en.wikipedia.org/wiki/Leaky_bucket (second sense, "This version is referred to here as the leaky bucket as a queue.")
#
# requires:
# redis 2.6+
# redis-py>=2.7.0
# anyjson
@amueller
amueller / digits_video.py
Created October 5, 2012 19:13
Visualization of iris and digits datasets via random projections
# (c) 2012 Andreas Mueller [email protected]
# License: BSD 2-Clause
#
# See my blog for details: http://peekaboo-vision.blogspot.com
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
@komiya-atsushi
komiya-atsushi / HttpResponseHelper.java
Created September 26, 2012 14:16
Twitter4J 2.2.6 を使って Twitter v1.1 API の search/tweets を叩くサンプルコード
package twitter4j.internal.http;
public class HttpResponseHelper {
public static HttpClientConfiguration getHttpClientConfiguration(
HttpResponse res) {
return res.CONF;
}
}
@robinsloan
robinsloan / langoliers.rb
Last active February 27, 2025 02:44
The Langoliers, a tweet deletion script
require "rubygems"
require "twitter"
require "json"
# things you must configure
TWITTER_USER = "your_username"
MAX_AGE_IN_DAYS = 1 # anything older than this is deleted
# get these from dev.twitter.com
CONSUMER_KEY = "your_consumer_key"