-
-
Save jayswan/a8d9920ef74516a02fe1 to your computer and use it in GitHub Desktop.
>>> import itertools | |
>>> import string | |
>>> from elasticsearch import Elasticsearch,helpers | |
es = Elasticsearch() | |
>>> # k is a generator expression that produces | |
... # a series of dictionaries containing test data. | |
... # The test data are just letter permutations | |
... # created with itertools.permutations. | |
... # | |
... # We then reference k as the iterator that's | |
... # consumed by the elasticsearch.helpers.bulk method. | |
>>> k = ({'_type':'foo', '_index':'test','letters':''.join(letters)} | |
... for letters in itertools.permutations(string.letters,2)) | |
>>> # calling k.next() shows examples | |
... # (while consuming the generator, of course) | |
>>> # each dict contains a doc type, index, and data (at minimum) | |
>>> k.next() | |
{'_type': 'foo', 'letters': 'ab', '_index': 'test'} | |
>>> k.next() | |
{'_type': 'foo', 'letters': 'ac', '_index': 'test'} | |
>>> # create our test index | |
>>> es.indices.create('test') | |
{u'acknowledged': True} | |
>>> helpers.bulk(es,k) | |
(2650, []) | |
>>> # check to make sure we got what we expected... | |
>>> es.count(index='test') | |
{u'count': 2650, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}} |
Would this work for a python list of JSON documents?
@zyxwu Butt it's hell slow.
I have a use case where I am updating/adding documents using the bulk API call and just after I fire the bulk call I check the ES count using the count API. Problem is there is some delay that happens after firing the bulk call which results in a delay in reflecting the correct count. How do I make sure the correct count reflects after I fire the bulk call?
@tusharkale197 ElasticSearch is near real time database not exact real time database and hence once you index a document changes will be reflected within index.refresh_interval time period default value of which is 1s you can change it accordingly and also if you want you can manually refresh index using refresh api
but when I index documents one by one, everything's fine
res = es.index(index=INDEX, doc_type=DOC_TYPE, id=ind, body=JS_message)