Skip to content

Instantly share code, notes, and snippets.

@thequbit
Created December 16, 2013 22:04
Show Gist options
  • Save thequbit/7995274 to your computer and use it in GitHub Desktop.
Save thequbit/7995274 to your computer and use it in GitHub Desktop.
def sendtoelasticsearch(self,targeturl,docurl,pdftext,pdfhash,scrapedatetime):
Where:
targeturl = "http://henrietta.org/" # the root url that was scraped
docurl = "http://henrietta.org/mydoc.pdf" # the url of the pdf document
pdftext = <the converted text of the pdf>
pdfhash = md5(pdftext)
scrapedatetime = the datetime of when the pdf was scraped/downloaded
I would like to send the text to elastic search for it to be indexed so I can perform queies on it from a website.
Using this example:
http://www.elasticsearch.org/blog/unleash-the-clients-ruby-python-php-perl/#python
It looks like that is exactly what i want to do ... but the es.index() function takes in an id ... that appears to be a number. Can this be the docurl??
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment