Skip to content

Instantly share code, notes, and snippets.

View igorbrigadir's full-sized avatar

Igor Brigadir igorbrigadir

View GitHub Profile
@casebeer
casebeer / ema_gen.py
Last active July 6, 2022 07:54
Exponential moving average generator example in Python
def consumer(func):
'''
Decorator taking care of initial next() call to "sending" generators
From PEP-342
http://www.python.org/dev/peps/pep-0342/
'''
def wrapper(*args,**kw):
gen = func(*args, **kw)
next(gen)
@debasishg
debasishg / gist:8172796
Last active October 19, 2025 00:47
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
@benallard
benallard / xml_split.py
Last active September 27, 2025 00:57
Small python script to split huge XML files into parts. It takes one or two parameters. The first is always the huge XML file, and the second the size of the wished chunks in Kb (default to 1Mb) (0 spilt wherever possible) The generated files are called like the original one with an index between the filename and the extension like that: bigxml.…
#!/usr/bin/env python
import os
import xml.parsers.expat
from xml.sax.saxutils import escape
from optparse import OptionParser
from math import log10
# How much data we process at a time
import marisa_trie
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
# hack to store vocabulary in MARISA Trie
class _MarisaVocabularyMixin(object):
def fit_transform(self, raw_documents, y=None):
super(_MarisaVocabularyMixin, self).fit_transform(raw_documents)
self._freeze_vocabulary()
@mblondel
mblondel / matrix_sketch.py
Last active February 13, 2019 09:26
Frequent directions algorithm for matrix sketching.
# (C) Mathieu Blondel, November 2013
# License: BSD 3 clause
import numpy as np
from scipy.linalg import svd
def frequent_directions(A, ell, verbose=False):
"""
Return the sketch of matrix A.
@esfand
esfand / 0Overview.md
Last active April 27, 2018 11:09
Thread Management in Java 8

Concurrent Package overview

The package is a ton of utilities for developing concurrent programs in Java: concurrent maps, synchronization strategies, blocking queues, thread management, and many more. The latter is the one we are concerned about: thread management. Thread management in this package all starts with the Executors helper class and the ExecutorService interface. The Executors helper class provides easy methods to create an ExecutorService as either a cached thread pool that grows on demand as new threads are needed or a fixed thread pool that ensures tasks are queued once the threads are exhausted. As a developer all you need to do is simply submit new tasks and the task will be executed in the background once a thread is available. The result is a Future . With the future in hand, you can poll for the completion of the event or simply wait for it. With only a few lines of code, we have created a re-usable and thread managed system for querying APIs.

import java.util.con
@jimweirich
jimweirich / eternal_flame.sng
Created August 8, 2013 05:59
Words/Chords to the Eternal Flame
The Eternal Flame (God Wrote in Lisp)
Bob Kanefsky / Julia Ecklar
F G C
I was taught assembler in my second year of school.
F G C
It's kinda like construction work, with a toothpick for a tool.
F G C Em Am
So when I made my senior year, I threw my code away,
@jpetazzo
jpetazzo / gist:6127116
Created July 31, 2013 23:21
Debian/Ubuntu containers protips, thanks to @spahl
# this forces dpkg not to call sync() after package extraction and speeds up install
RUN echo "force-unsafe-io" > /etc/dpkg/dpkg.cfg.d/02apt-speedup
# we don't need and apt cache in a container
RUN echo "Acquire::http {No-Cache=True;};" > /etc/apt/apt.conf.d/no-cache
@mrflip
mrflip / tuning_storm_trident.asciidoc
Last active October 8, 2024 15:18
Notes on Storm+Trident tuning

Tuning Storm+Trident

Tuning a dataflow system is easy:

The First Rule of Dataflow Tuning:
* Ensure each stage is always ready to accept records, and
* Deliver each processed record promptly to its destination