Skip to content

Instantly share code, notes, and snippets.

@thequbit
Created December 30, 2013 20:33
Show Gist options
  • Save thequbit/8187729 to your computer and use it in GitHub Desktop.
Save thequbit/8187729 to your computer and use it in GitHub Desktop.
I have an elastic search index with the following scheme:
# create entry
body = {'targeturl': urldata['targeturl'], # url of website being crawled
'docurl': docurl, # the url of the document
'docname': docname, # the name of the document
'linktext': linktext, # the text within the <a> tags
'pdftext': pdftext, # the full text of the document
'pdfhash': pdfhash, # the MD5 hash of the document
'scrapedatetime': scrapedatetime, # the datetime the document was found
'textfilename': textfilename, # the name of the text file in the file store
'pdffilename': pdffilename, # the name of the pdf file in the file store
'misfit': misfit, # boolean for downstream use
'orgname': org['name'], # name of the organization the doc belongs to
'orgid': org['orgid'], # org id in the DB
'bodyid': org['bodyid'] # body id in the DB of the body the org belongs to
}
# send to indexer
es = elasticsearch.Elasticsearch()
es.index(
index="monroeminutes",
doc_type="pdfdoc",
id=uuid.uuid4(),
body=body,
)
The above code passes the found document into elastic search to be indexed. I now need to perform a search to get the document out, but only if it matches the orgid that I want. I believe this is just a syntax of how to form the query that I am getting wrong.
Line in question:
https://github.com/thequbit/monroeminutes/blob/master/src/search.py#L38
Error:
(mmenv)administrator@anna:~/dev/monroeminutes/src$ python search.py
No handlers could be found for logger "elasticsearch"
Traceback (most recent call last):
File "search.py", line 72, in <module>
response = search.search('scottsville',orgid=1)
File "search.py", line 45, in search
body=body
File "/home/administrator/.virtualenvs/mmenv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 70, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/administrator/.virtualenvs/mmenv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 388, in search
params=params, body=body)
File "/home/administrator/.virtualenvs/mmenv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 223, in perform_request
status, raw_data = connection.perform_request(method, url, params, body, ignore=ignore)
File "/home/administrator/.virtualenvs/mmenv/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 53, in perform_request
self._raise_error(response.status, raw_data)
File "/home/administrator/.virtualenvs/mmenv/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 82, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.TransportError: TransportError(400, u'SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[JMs9LW9cR726AXz43g05-A][monroeminutes][1]: SearchParseException[[monroeminutes][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"match": {"pdftext": "scottsville", "orgid": 1}}, "from": 0, "size": 10}]]]; nested: QueryParsingException[[monroeminutes] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its \'options\' form, with \'query\' element?]; }{[JMs9LW9cR726AXz43g05-A][monroeminutes][4]: SearchParseException[[monroeminutes][4]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"match": {"pdftext": "scottsville", "orgid": 1}}, "from": 0, "size": 10}]]]; nested: QueryParsingException[[monroeminutes] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its \'options\' form, with \'query\' element?]; }{[JMs9LW9cR726AXz43g05-A][monroeminutes][2]: SearchParseException[[monroeminutes][2]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"match": {"pdftext": "scottsville", "orgid": 1}}, "from": 0, "size": 10}]]]; nested: QueryParsingException[[monroeminutes] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its \'options\' form, with \'query\' element?]; }{[JMs9LW9cR726AXz43g05-A][monroeminutes][3]: SearchParseException[[monroeminutes][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"match": {"pdftext": "scottsville", "orgid": 1}}, "from": 0, "size": 10}]]]; nested: QueryParsingException[[monroeminutes] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its \'options\' form, with \'query\' element?]; }{[JMs9LW9cR726AXz43g05-A][monroeminutes][0]: SearchParseException[[monroeminutes][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"match": {"pdftext": "scottsville", "orgid": 1}}, "from": 0, "size": 10}]]]; nested: QueryParsingException[[monroeminutes] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its \'options\' form, with \'query\' element?]; }]')
(mmenv)administrator@anna:~/dev/monroeminutes/src$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment