Skip to content

Instantly share code, notes, and snippets.

View cadrev's full-sized avatar

Felan Carlo Garcia cadrev

View GitHub Profile
@cadrev
cadrev / contours.py
Created April 9, 2016 19:32 — forked from jsundram/contours.py
Convert matplotlib contours into valid (compressed) topojson.
import logging
import matplotlib.pyplot as plt
import numpy as np
import os
import scipy.stats as stats
import sys
def read_data(filename):
"""Reads a data file assumed to have at least 2 columns: 1) lat, 2) lng."""

Putting wings on the Elephant

[operating-hadoop]

HBase is used widely at Facebook and one of the biggest usecase is Facebook Messages. With a billion users there are a lot of reliability and performance challenges on both HBase and HDFS. HDFS was originally designed for a batch processing system like MapReduce/Hive. A realtime usecase like Facebook Messages where the p99 latency can`t be more than a couple hundreds of milliseconds poses a lot of challenges for HDFS. In this talk we will share the work the HDFS team at Facebook has done to support a realtime usecase like Facebook Messages : (1) Using system calls to tune performance; (2) Inline checksums to reduce iops by 40%; (3) Reducing the p99 for read and write latencies by about 10x; (4) Tools used to determine root cause of outliers. We will discuss the details of each technique, the challenges we faced, lessons learned and results showing the impact of each improvement.

speaker: Pritam Damania

Real-Time Market Basket Analysis for Retail with

@cadrev
cadrev / 2.ipynb
Created December 16, 2015 14:24 — forked from karlnapf/2.ipynb
Machine learning assignment 2
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@cadrev
cadrev / 123 datasets.csv
Created December 16, 2015 14:19 — forked from octaviomtz/123 datasets.csv
123 machine learning databases
Problem File Name Relation Name nRows TestMethod nTrain nTest nVars nTargets
abalone/abalone.arff abalone 4177 test-set cross-validation 4077 100 8 3
acute-inflammation/acute-inflammation.arff acute-inflammation 120 leave-one-out cross-validation 119 100 6 2
acute-nephritis/acute-nephritis.arff acute-nephritis 120 leave-one-out cross-validation 119 100 6 2
adult/adult_train.arff adult 32561 test-set cross-validation 32461 100 14 2
annealing/annealing_train.arff annealing 798 test-set cross-validation 698 100 31 5
arrhythmia/arrhythmia.arff arrhythmia 452 leave-one-out cross-validation 451 100 262 13
audiology-std/con_patrons_repetidos/audiology-std_train.arff audiology-std 194 leave-one-out cross-validation 193 100 59 18
audiology-std/audiology-std_train.arff audiology-std 171 leave-one-out cross-validation 170 100 59 18
balance-scale/balance-scale.arff balance-scale 625 test-set cross-validation 525 100 4 3
@cadrev
cadrev / songs.ipynb
Created December 16, 2015 14:10 — forked from carlward/songs.ipynb
Songs from API
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@cadrev
cadrev / waveform.py
Created October 30, 2015 16:07 — forked from mixxorz/waveform.py
Generate waveform images from audio files
# Requires pydub (with ffmpeg) and Pillow
#
# Usage: python waveform.py <audio_file>
import sys
from pydub import AudioSegment
from PIL import Image, ImageDraw
@cadrev
cadrev / gist:fe766c8ec5ec40ec8a9c
Created September 24, 2015 14:44 — forked from entaroadun/gist:1653794
Recommendation and Ratings Public Data Sets For Machine Learning

Movies Recommendation:

Music Recommendation: