Hưng Vũ Hungsiro506

General Background and Overview

Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t

	AWSTemplateFormatVersion: '2010-09-09'
	Description: 'Broker cloudformation template'
	Parameters:
	KeyName:
	Description: 'For SSH access'
	Type: 'AWS::EC2::KeyPair::KeyName'
	MinimumInstances:
	Description: Minimum number of instances for autoscaling group
	Type: Number
	AllowedValues:

	AWSTemplateFormatVersion: '2010-09-09'
	Description: 'Zookeeper cloudformation template'
	Parameters:
	MinimumInstances:
	Description: Minimum number of instances for autoscaling group
	Type: Number
	AllowedValues:
	- 3
	- 5
	InstanceType:

	package pb

	import (
	"fmt"
	"reflect"

	st "github.com/golang/protobuf/ptypes/struct"
	)

	// ToStruct converts a map[string]interface{} to a ptypes.Struct

	from time import sleep
	from io import StringIO

	import psycopg2


	def upsert_df_into_postgres(df, target_table, primary_keys, conn_string,
	n_trials=5, quoting=None, null_repr=None):
	"""
	Uploads data from `df` to `target_table`

	Welcome to
	____ __
	/ __/__ ___ _____/ /__
	_\ \/ _ \/ _ `/ __/ '_/
	/___/ .__/\_,_/_/ /_/\_\ version 2.0.0
	/_/

	Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
	Type in expressions to have them evaluated.
	Type :help for more information.

	sudo yum -y install epel-release
	sudo yum -y install gcc gcc-c++ python-pip python-devel atlas atlas-devel gcc-gfortran openssl-devel libffi-devel
	# use pip or pip3 as you prefer for python or python3
	pip install --upgrade virtualenv
	virtualenv --system-site-packages ~/venvs/tensorflow
	source ~/venvs/tensorflow/bin/activate
	pip install --upgrade numpy scipy wheel cryptography #optional
	pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.10.0rc0-cp35-cp35m-linux_x86_64.whl
	# or below if you want gpu, support, but cuda and cudnn are required, see docs for more install instructions
	pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.10.0rc0-cp35-cp35m-linux_x86_64.whl

	class HyperLogLogStoreUDAF extends UserDefinedAggregateFunction {

	override def inputSchema = new StructType()
	.add("stringInput", BinaryType)

	override def update(buffer: MutableAggregationBuffer, input: Row) = {
	// This input Row only has a single column storing the input value in String (or other Binary data).
	// We only update the buffer when the input value is not null.
	if (!input.isNullAt(0)) {
	if (buffer.isNullAt(0)) {

	import spark.streaming.{Seconds, StreamingContext}
	import spark.storage.StorageLevel
	import spark.streaming.examples.twitter.TwitterInputDStream
	import com.twitter.algebird._
	import spark.streaming.StreamingContext._
	import spark.SparkContext._

	/**
	* Example of using CountMinSketch monoid from Twitter's Algebird together with Spark Streaming's
	* TwitterInputDStream