Skip to content

Instantly share code, notes, and snippets.

View duydo's full-sized avatar

Duy Do duydo

View GitHub Profile
@duydo
duydo / simplefulltextsearch.py
Created November 13, 2013 08:11
Simple fulltex search in-memory
# -*- coding: utf-8 -*-
import re
import shlex
def search(query):
pieces = shlex.split(query.encode('utf-8'))
include, or_include, exclude = [], [], []
for piece in pieces:
p = piece.decode('utf-8')
if p.startswith('-'):

Latency numbers every programmer should know

L1 cache reference ......................... 0.5 ns
Branch mispredict ............................ 5 ns
L2 cache reference ........................... 7 ns
Mutex lock/unlock ........................... 25 ns
Main memory reference ...................... 100 ns             
Compress 1K bytes with Zippy ............. 3,000 ns  =   3 µs
Send 2K bytes over 1 Gbps network ....... 20,000 ns  =  20 µs
SSD random read ........................ 150,000 ns  = 150 µs

Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs

@duydo
duydo / s3delete.py
Created October 26, 2013 16:15 — forked from jerem/s3delete.py
#!/usr/bin/env python
import gevent.monkey
gevent.monkey.patch_all()
import sys
import optparse
import gevent
from boto.s3.connection import S3Connection
#!/usr/bin/env python
"""
Example of fetching followers of multiple Twitter accounts recursively, using twitterspawn.
"""
import atexit
import json
import twitterspawn
# ========================================
# Testing n-gram analysis in ElasticSearch
# ========================================
curl -X DELETE localhost:9200/ngram_test
curl -X PUT localhost:9200/ngram_test -d '
{
"settings" : {
"index" : {
"analysis" : {
@duydo
duydo / twitter_mapping.sh
Created October 17, 2013 09:52
Preserving Special Characters During Tokenization twitter message with elasticsearch
curl -XPUT 'http://localhost:9200/twitter' -d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
},
"analysis" : {
"filter" : {
"tweet_filter" : {
"type" : "word_delimiter",
#!/bin/sh
# Variables
USER="admin"
PASS="password"
# Assert Root User
SCRIPTUSER=`whoami`
if [ "$SCRIPTUSER" != "root" ]
then
{
"title": "Tweets Search",
"rows": [
{
"title": "Options",
"height": "50px",
"editable": true,
"collapse": false,
"collapsable": true,
"panels": [
@duydo
duydo / ByteTokenizer.java
Last active June 16, 2023 22:21
The byte tokenizer class allows an application to break a byte array into tokens.
/**
* @(#)ByteTokenizer.java Sep 23, 2008
* Copyright (C) 2008 Duy Do. All Rights Reserved.
*/
package com.duydo.util;
import java.util.Enumeration;
import java.util.NoSuchElementException;
/**
@duydo
duydo / elasticsearch.sh
Created September 15, 2012 15:25
elasticsearch script
#!/bin/bash
NAME=elasticsearch
PREFIX=/usr/local
ES_HOME=$PREFIX/$NAME
install() {
v=$1;
echo "Downloading $NAME $v...";
file="$NAME-$v.tar.gz";