duydo’s gists

duydo / gist:2402998

Created April 17, 2012 02:24

install a jar file into maven local repository

mvn install:install-file -DgroupId=com.example -DartifactId=example-app -Dversion=1.0 -Dpackaging=jar -Dfile=path/to/jar/file

duydo / gist:2403083

Created April 17, 2012 02:55

Set JAVA_HOME environment on Mac OS


	export JAVA_HOME=$(/usr/libexec/java_home)

duydo / elasticsearch_best_practices.txt

Last active June 20, 2024 09:59

Elasticsearch - Index best practices from Shay Banon

	If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:

	- Use create in the index API (assuming you can).
	- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
	- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
	- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
	- Increase the memory allocated to elasticsearch node. By default its 1g.
	- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
	- Increase the number of machines you have so

duydo / gist:2551254

Created April 29, 2012 15:23

Virtual host on your dev PC

	<VirtualHost *:80>
	VirtualDocumentRoot /Users/duydo/Sites/%1
	ServerName automated_domains
	ServerAlias *.localhost.com
	</VirtualHost>

duydo / gist:2603084

Created May 5, 2012 14:57

avoid tar's arguments list too long error

find . -name '*.ext' -print | tar -cvzf archive.tar.gz --files-from -

duydo / gist:2780649

Created May 24, 2012 10:12

ElasticSearch - substring search

	You have two options:

	Use ngrams through the ngram token filter: http://www.elasticsearch.org/guide/reference/index-modules/analysis/ngram-tokenfilter.html. See Wikipedia for details: http://en.wikipedia.org/wiki/N-gram
	Use the compound token filter http://www.elasticsearch.org/guide/reference/index-modules/analysis/compound-word-tokenfilter.html (which I have contributed to Lucene btw.)

	The ngrams are useful if you cannot provide a dictionary because you don't know what type of documents you have to index. The downside is that they use a lot of space in your indices. So you need to be careful here.

duydo / install_lily.sh

Created May 26, 2012 02:59

install lily

	#! /bin/bash

	#
	#Simple script to download & install lily with its dependencies
	#

	LILY_VERSION='1.1.2'
	HADOOP_VERSION='1.0.0'
	HBASE_VERSION='0.92.1'
	ZOOKEEPER_VERSION='3.4.3'

duydo / logstash-template.json

Created September 9, 2012 03:40 — forked from deverton/logstash-template.json

Logstash Elasticsearch Template

	{
	"template": "logstash-*",
	"settings" : {
	"number_of_shards" : 1,
	"number_of_replicas" : 0,
	"index" : {
	"query" : { "default_field" : "@message" },
	"store" : { "compress" : { "stored" : true, "tv": true } }
	}
	},

duydo / elasticsearch.sh

Created September 15, 2012 15:25

elasticsearch script

duydo / ByteTokenizer.java

Last active June 16, 2023 22:21

The byte tokenizer class allows an application to break a byte array into tokens.

	/**
	* @(#)ByteTokenizer.java Sep 23, 2008
	* Copyright (C) 2008 Duy Do. All Rights Reserved.
	*/
	package com.duydo.util;

	import java.util.Enumeration;
	import java.util.NoSuchElementException;

	/**

Duy Do duydo