Skip to content

Instantly share code, notes, and snippets.

@drewr
Last active August 29, 2015 14:13
Show Gist options
  • Save drewr/b920250c3211650bdd40 to your computer and use it in GitHub Desktop.
Save drewr/b920250c3211650bdd40 to your computer and use it in GitHub Desktop.
ES manual _optimize for great memory savings (pre-Lucene 4.10.3)

Usage

curl -s https://gist.githubusercontent.com/drewr/b920250c3211650bdd40/raw/optimize.py | python - INDEXNAME

Sample output

% curl -O download.elasticsearch.org/stream2es/stream2es; chmod +x stream2es
% ./stream2es generator --fields f1:str:10,f2:int:1000 --max-docs 1000000
2015-01-08T13:29:19.883-0600 INFO  00:50.232 19907.6d/s 2906.4K/s (142.6mb) indexed 1000000 streamed 1000000 errors 0
2015-01-08T13:29:19.890-0600 INFO  done
% curl -s https://gist.githubusercontent.com/drewr/b920250c3211650bdd40/raw/optimize.py | python - foo
12: foo 2 0 1000000 201246393 21 0s
11: foo 2 0 1000000 201192416 20 0s
10: foo 2 0 1000000 200875370 19 0s
9: foo 2 0 1000000 200490977 18 0s
8: foo 2 0 1000000 199500026 16 1s
7: foo 2 0 1000000 198077275 14 1s
6: foo 2 0 1000000 196012198 12 1s
5: foo 2 0 1000000 193746403 10 2s
4: foo 2 0 1000000 191470827 8 2s
3: foo 2 0 1000000 189211145 6 2s
2: foo 2 0 1000000 187086169 4 3s
1: foo 2 0 1000000 185021367 2 5s
%
#!/usr/bin/env python
#
# Multi-shard manual optimizer (Python 2.x)
#
# curl -s http://users.elasticsearch.org/drewr/p/manual-optimize.py | python - INDEX
#
import time, sys, urllib2, json
def get(url, d=None):
r = urllib2.Request(url, d)
return urllib2.urlopen(r).read()
def post(url, timeout=60):
r = urllib2.Request(url)
return urllib2.urlopen(r, timeout=timeout).read()
def optimize(index, n=1):
u = "http://localhost:9200/%s/_optimize?max_num_segments=%s" % \
(index, n)
return post(u, timeout=3600)
def max_segments(index):
u = "http://localhost:9200/_cat/shards/%s?h=sc" % index
return max([int(i) for i in get(u).splitlines()])
def status(index):
u = "http://localhost:9200/_cat/indices/%s?bytes=b&h=index,pri,rep,dc,pri.store.size,pri.segments.count" % index
return get(u).strip()
def main(index):
seg = max_segments(index)
while seg > 0:
print("%d:" % seg),
start = time.time()
o = optimize(index, seg)
dur = int(round(time.time() - start))
print("%s %ds" % (status(index), dur))
seg = max_segments(index) - 1
if __name__ == "__main__":
main(sys.argv[1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment