- Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
- Models and Issues in Data Stream Systems
- Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
- Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
- [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AWSTemplateFormatVersion: '2010-09-09' | |
Description: 'Broker cloudformation template' | |
Parameters: | |
KeyName: | |
Description: 'For SSH access' | |
Type: 'AWS::EC2::KeyPair::KeyName' | |
MinimumInstances: | |
Description: Minimum number of instances for autoscaling group | |
Type: Number | |
AllowedValues: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AWSTemplateFormatVersion: '2010-09-09' | |
Description: 'Zookeeper cloudformation template' | |
Parameters: | |
MinimumInstances: | |
Description: Minimum number of instances for autoscaling group | |
Type: Number | |
AllowedValues: | |
- 3 | |
- 5 | |
InstanceType: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package pb | |
import ( | |
"fmt" | |
"reflect" | |
st "github.com/golang/protobuf/ptypes/struct" | |
) | |
// ToStruct converts a map[string]interface{} to a ptypes.Struct |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from time import sleep | |
from io import StringIO | |
import psycopg2 | |
def upsert_df_into_postgres(df, target_table, primary_keys, conn_string, | |
n_trials=5, quoting=None, null_repr=None): | |
""" | |
Uploads data from `df` to `target_table` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Welcome to | |
____ __ | |
/ __/__ ___ _____/ /__ | |
_\ \/ _ \/ _ `/ __/ '_/ | |
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0 | |
/_/ | |
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77) | |
Type in expressions to have them evaluated. | |
Type :help for more information. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sudo yum -y install epel-release | |
sudo yum -y install gcc gcc-c++ python-pip python-devel atlas atlas-devel gcc-gfortran openssl-devel libffi-devel | |
# use pip or pip3 as you prefer for python or python3 | |
pip install --upgrade virtualenv | |
virtualenv --system-site-packages ~/venvs/tensorflow | |
source ~/venvs/tensorflow/bin/activate | |
pip install --upgrade numpy scipy wheel cryptography #optional | |
pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.10.0rc0-cp35-cp35m-linux_x86_64.whl | |
# or below if you want gpu, support, but cuda and cudnn are required, see docs for more install instructions | |
pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.10.0rc0-cp35-cp35m-linux_x86_64.whl |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
next_xid = 1 | |
active_xids = set() | |
records = [] | |
def new_transaction(): | |
global next_xid | |
next_xid += 1 | |
active_xids.add(next_xid) | |
return Transaction(next_xid) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class HyperLogLogStoreUDAF extends UserDefinedAggregateFunction { | |
override def inputSchema = new StructType() | |
.add("stringInput", BinaryType) | |
override def update(buffer: MutableAggregationBuffer, input: Row) = { | |
// This input Row only has a single column storing the input value in String (or other Binary data). | |
// We only update the buffer when the input value is not null. | |
if (!input.isNullAt(0)) { | |
if (buffer.isNullAt(0)) { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import spark.streaming.{Seconds, StreamingContext} | |
import spark.storage.StorageLevel | |
import spark.streaming.examples.twitter.TwitterInputDStream | |
import com.twitter.algebird._ | |
import spark.streaming.StreamingContext._ | |
import spark.SparkContext._ | |
/** | |
* Example of using CountMinSketch monoid from Twitter's Algebird together with Spark Streaming's | |
* TwitterInputDStream |
NewerOlder