- Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
- Models and Issues in Data Stream Systems
- Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
- Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
- [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
""" | |
License: MIT License | |
Copyright (c) 2023 Miel Donkers | |
Very simple HTTP server in python for logging requests | |
Usage:: | |
./server.py [<port>] | |
""" | |
from http.server import BaseHTTPRequestHandler, HTTPServer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
set hivevar:start_date=0000-01-01; | |
set hivevar:days=1000000; | |
set hivevar:table_name=[INSERT YOUR TABLE NAME HERE]; | |
-- If you are running a version of HIVE prior to 1.2, comment out all uses of date_format() and uncomment the lines below for equivalent functionality | |
CREATE TABLE IF NOT EXISTS ${table_name} AS | |
WITH dates AS ( | |
SELECT date_add("${start_date}", a.pos) as date |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Ask for the user password | |
# Script only works if sudo caches the password for a few minutes | |
sudo true | |
# Install kernel extra's to enable docker aufs support | |
# sudo apt-get -y install linux-image-extra-$(uname -r) | |
# Add Docker PPA and install latest version | |
# sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9 | |
# sudo sh -c "echo deb https://get.docker.io/ubuntu docker main > /etc/apt/sources.list.d/docker.list" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function map() { | |
emit(1, { | |
sum: this.value, // the field you want stats for | |
min: this.value, | |
max: this.value, | |
count: 1, | |
diff: 0 | |
}); | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Set codec, dir and segmentName accordingly to the segment you are trying to restore | |
Codec codec = new Lucene42Codec(); | |
Directory dir = FSDirectory.open(new File("/tmp/test")); | |
String segmentName = "_0"; | |
IOContext ioContext = new IOContext(); | |
SegmentInfo segmentInfos = codec.segmentInfoFormat().getSegmentInfoReader().read(dir, segmentName, ioContext); | |
Directory segmentDir; | |
if (segmentInfos.getUseCompoundFile()) { | |
segmentDir = new CompoundFileDirectory(dir, IndexFileNames.segmentFileName(segmentName, "", IndexFileNames.COMPOUND_FILE_EXTENSION), ioContext, false); |
Since many deployments may start out with 3 nodes and so little is known about how to grow a cluster from 3 memebrs to 5 members without losing the existing Quorum, here is an example of how this might be achieved.
In this example, all 5 nodes will be running on the same Vagrant host for the purpose of illustration, running on distinct configurations (ports and data directories) without the actual load of clients.
YMMV. Caveat usufructuarius.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// | |
// See: https://github.com/foundationdb/sql-parser | |
// | |
import com.foundationdb.sql.parser.*; | |
import com.foundationdb.sql.parser.SQLParserContext.*; | |
public class CustomCase { | |
public static class ColumnNamePrinter implements Visitor { | |
@Override |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Script to be placed in elasticsearch/bin | |
# Launch it from elasticsearch dir | |
# bin/backup indexname | |
# We suppose that data are under elasticsearch/data | |
# It will create a backup file under elasticsearch/backup | |
if [ -z "$1" ]; then | |
INDEX_NAME="dummy" | |
else | |
INDEX_NAME=$1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
###############FUNCTIONS############ | |
function prepare { | |
#optimize the index | |
echo -n "Optimizing index $INDEX_NAME..." | |
curl -XPOST "$ADDRESS/$INDEX_NAME/_optimize" 2>/dev/null| grep 'failed":0' >/dev/null | |
if [ $? -eq 0 ]; then | |
echo "done" |
NewerOlder