- Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
- Models and Issues in Data Stream Systems
- Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
- Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
- [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| #set -e | |
| #set -x | |
| fail() { | |
| echo $1 | |
| [ -e ${EBS_DEVICE} ] && [ "$VOLUME_ID" != "" ] && [ $REGION != "" ] && { | |
| ec2-detach-volume --region $REGION $VOLUME_ID | |
| ec2-delete-volume --region $REGION $VOLUME_ID |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| public class CurrentUserLoggingFilter implements Filter { | |
| public static final String MDC_CURRENT_ACCOUNT_ID_KEY = "CurrentAccountId"; | |
| @Context | |
| HttpContext context; | |
| @Override | |
| public void init(FilterConfig filterConfig) throws ServletException { /* unused */ } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| // SymSpell: 1000x faster through Symmetric Delete spelling correction algorithm | |
| // | |
| // The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup | |
| // for a given Damerau-Levenshtein distance. It is three orders of magnitude faster and language independent. | |
| // Opposite to other algorithms only deletes are required, no transposes + replaces + inserts. | |
| // Transposes + replaces + inserts of the input term are transformed into deletes of the dictionary term. | |
| // Replaces and inserts are expensive and language dependent: e.g. Chinese has 70,000 Unicode Han characters! | |
| // | |
| // Copyright (C) 2012 Wolf Garbe, FAROO Limited | |
| // Version: 1.6 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| package correlate | |
| import ( | |
| "dft" | |
| "fmt" | |
| "math/cmplx" | |
| ) | |
| // Cross-correlation is a measure of similarity of two waveforms as a function | |
| // of a time-lag applied to one of them. Aka the sliding dot product. This is |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Get a bunch of Crunchbase data, but respect the API limits. | |
| Author JD Maturen | |
| Apache 2 License | |
| """ | |
| import logging | |
| from random import random | |
| import sys |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Implementation of the shifted beta geometric (sBG) model from "How to Project Customer Retention" (Fader and Hardie 2006) | |
| http://www.brucehardie.com/papers/021/sbg_2006-05-30.pdf | |
| Apache 2 License | |
| """ | |
| from math import log |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| package main | |
| import ( | |
| "encoding/json" | |
| "errors" | |
| "flag" | |
| "fmt" | |
| "net/http" | |
| "net/url" | |
| "os" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| The MIT License (MIT) | |
| Copyright (c) 2014 Matteo Rinaudo | |
| Permission is hereby granted, free of charge, to any person obtaining a copy | |
| of this software and associated documentation files (the "Software"), to deal | |
| in the Software without restriction, including without limitation the rights | |
| to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
| copies of the Software, and to permit persons to whom the Software is | |
| furnished to do so, subject to the following conditions: |
@holman got a request about our deployment system, heaven
I know it's not a high priority, but has there been any activity on open-sourcing the core Heaven gem?
There is. I've been working on extracting the non-GitHub specific parts into two gems. This first is a CLI portion called hades. The second is an HTTP API portion called heaven.
When you open source something previously used as in internal tool like Heaven, Hubot, Boxen, etc., how do you manage and hook in the parts that need to stay internal?
Normally I focus around four questions: