Skip to content

Instantly share code, notes, and snippets.

View igorbrigadir's full-sized avatar

Igor Brigadir igorbrigadir

View GitHub Profile
@willurd
willurd / web-servers.md
Last active November 10, 2025 16:13
Big list of http static server one-liners

Each of these commands will run an ad hoc http static server in your current (or specified) directory, available at http://localhost:8000. Use this power wisely.

Discussion on reddit.

Python 2.x

$ python -m SimpleHTTPServer 8000
@rufuspollock
rufuspollock / pdf2xxx.md
Last active November 15, 2016 15:58
PDF 2 XXX. Tools, libraries and tutorials for converting PDFs to something more machine usable

Additions wanted - please just fork and add.

Tutorials

  • Parsing PDFs by Thomas Levine
  • [Get Started With Scraping – Extracting Simple Tables from PDF Documents][scoda-simple-tables]

Generic (PDF -> text)

@mrflip
mrflip / tuning_storm_trident.asciidoc
Last active October 8, 2024 15:18
Notes on Storm+Trident tuning

Tuning Storm+Trident

Tuning a dataflow system is easy:

The First Rule of Dataflow Tuning:
* Ensure each stage is always ready to accept records, and
* Deliver each processed record promptly to its destination
@jpetazzo
jpetazzo / gist:6127116
Created July 31, 2013 23:21
Debian/Ubuntu containers protips, thanks to @spahl
# this forces dpkg not to call sync() after package extraction and speeds up install
RUN echo "force-unsafe-io" > /etc/dpkg/dpkg.cfg.d/02apt-speedup
# we don't need and apt cache in a container
RUN echo "Acquire::http {No-Cache=True;};" > /etc/apt/apt.conf.d/no-cache
@jimweirich
jimweirich / eternal_flame.sng
Created August 8, 2013 05:59
Words/Chords to the Eternal Flame
The Eternal Flame (God Wrote in Lisp)
Bob Kanefsky / Julia Ecklar
F G C
I was taught assembler in my second year of school.
F G C
It's kinda like construction work, with a toothpick for a tool.
F G C Em Am
So when I made my senior year, I threw my code away,
@esfand
esfand / 0Overview.md
Last active April 27, 2018 11:09
Thread Management in Java 8

Concurrent Package overview

The package is a ton of utilities for developing concurrent programs in Java: concurrent maps, synchronization strategies, blocking queues, thread management, and many more. The latter is the one we are concerned about: thread management. Thread management in this package all starts with the Executors helper class and the ExecutorService interface. The Executors helper class provides easy methods to create an ExecutorService as either a cached thread pool that grows on demand as new threads are needed or a fixed thread pool that ensures tasks are queued once the threads are exhausted. As a developer all you need to do is simply submit new tasks and the task will be executed in the background once a thread is available. The result is a Future . With the future in hand, you can poll for the completion of the event or simply wait for it. With only a few lines of code, we have created a re-usable and thread managed system for querying APIs.

import java.util.con
@mblondel
mblondel / matrix_sketch.py
Last active February 13, 2019 09:26
Frequent directions algorithm for matrix sketching.
# (C) Mathieu Blondel, November 2013
# License: BSD 3 clause
import numpy as np
from scipy.linalg import svd
def frequent_directions(A, ell, verbose=False):
"""
Return the sketch of matrix A.
import marisa_trie
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
# hack to store vocabulary in MARISA Trie
class _MarisaVocabularyMixin(object):
def fit_transform(self, raw_documents, y=None):
super(_MarisaVocabularyMixin, self).fit_transform(raw_documents)
self._freeze_vocabulary()
@benallard
benallard / xml_split.py
Last active September 27, 2025 00:57
Small python script to split huge XML files into parts. It takes one or two parameters. The first is always the huge XML file, and the second the size of the wished chunks in Kb (default to 1Mb) (0 spilt wherever possible) The generated files are called like the original one with an index between the filename and the extension like that: bigxml.…
#!/usr/bin/env python
import os
import xml.parsers.expat
from xml.sax.saxutils import escape
from optparse import OptionParser
from math import log10
# How much data we process at a time
@debasishg
debasishg / gist:8172796
Last active October 19, 2025 00:47
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t