Skip to content

Instantly share code, notes, and snippets.

View Hungsiro506's full-sized avatar

Hưng Vũ Hungsiro506

View GitHub Profile
@amundra2016
amundra2016 / brokers.yml
Last active November 30, 2018 02:39
Brokers Cloudformation Template
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Broker cloudformation template'
Parameters:
KeyName:
Description: 'For SSH access'
Type: 'AWS::EC2::KeyPair::KeyName'
MinimumInstances:
Description: Minimum number of instances for autoscaling group
Type: Number
AllowedValues:
@amundra2016
amundra2016 / zookeeper.yml
Last active February 11, 2019 01:11
Zookeeper Cloudformation Template
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Zookeeper cloudformation template'
Parameters:
MinimumInstances:
Description: Minimum number of instances for autoscaling group
Type: Number
AllowedValues:
- 3
- 5
InstanceType:
@jsmouret
jsmouret / struct.go
Last active January 10, 2025 07:17
Convert map[string]interface{} to a google.protobuf.Struct
package pb
import (
"fmt"
"reflect"
st "github.com/golang/protobuf/ptypes/struct"
)
// ToStruct converts a map[string]interface{} to a ptypes.Struct
@Nikolay-Lysenko
Nikolay-Lysenko / upsert_from_pandas_to_postgres.py
Last active January 5, 2022 06:08
Upsert (a hybrid of insert and update) from pandas.DataFrame to PostgreSQL database
from time import sleep
from io import StringIO
import psycopg2
def upsert_df_into_postgres(df, target_table, primary_keys, conn_string,
n_trials=5, quoting=None, null_repr=None):
"""
Uploads data from `df` to `target_table`
@MLnick
MLnick / SQLTransformerWithJoin.scala
Created August 18, 2016 07:55
Using SQLTransformer to join DataFrames in ML Pipeline
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
@thoolihan
thoolihan / install_tensorflow_centos7.sh
Last active January 28, 2019 06:17
Install TensorFlow on CentOS7
sudo yum -y install epel-release
sudo yum -y install gcc gcc-c++ python-pip python-devel atlas atlas-devel gcc-gfortran openssl-devel libffi-devel
# use pip or pip3 as you prefer for python or python3
pip install --upgrade virtualenv
virtualenv --system-site-packages ~/venvs/tensorflow
source ~/venvs/tensorflow/bin/activate
pip install --upgrade numpy scipy wheel cryptography #optional
pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.10.0rc0-cp35-cp35m-linux_x86_64.whl
# or below if you want gpu, support, but cuda and cudnn are required, see docs for more install instructions
pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.10.0rc0-cp35-cp35m-linux_x86_64.whl
next_xid = 1
active_xids = set()
records = []
def new_transaction():
global next_xid
next_xid += 1
active_xids.add(next_xid)
return Transaction(next_xid)
@MLnick
MLnick / HyperLogLogStoreUDAF.scala
Last active March 16, 2022 05:31
Experimenting with Spark SQL UDAF - HyperLogLog UDAF for distinct counts, that stores the actual HLL for each row to allow further aggregation
class HyperLogLogStoreUDAF extends UserDefinedAggregateFunction {
override def inputSchema = new StructType()
.add("stringInput", BinaryType)
override def update(buffer: MutableAggregationBuffer, input: Row) = {
// This input Row only has a single column storing the input value in String (or other Binary data).
// We only update the buffer when the input value is not null.
if (!input.isNullAt(0)) {
if (buffer.isNullAt(0)) {
@debasishg
debasishg / gist:8172796
Last active June 23, 2025 05:56
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
@MLnick
MLnick / StreamingCMS.scala
Created February 13, 2013 15:00
Spark Streaming with CountMinSketch from Twitter Algebird
import spark.streaming.{Seconds, StreamingContext}
import spark.storage.StorageLevel
import spark.streaming.examples.twitter.TwitterInputDStream
import com.twitter.algebird._
import spark.streaming.StreamingContext._
import spark.SparkContext._
/**
* Example of using CountMinSketch monoid from Twitter's Algebird together with Spark Streaming's
* TwitterInputDStream