-
-
Save lukas-vlcek/1075067 to your computer and use it in GitHub Desktop.
#!/bin/sh | |
host=localhost:9200 | |
curl -X DELETE "${host}/test" | |
curl -X PUT "${host}/test" -d '{ | |
"settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }} | |
}' | |
curl -X GET "${host}/_cluster/health?wait_for_status=green&pretty=1&timeout=5s" | |
curl -X PUT "${host}/test/attachment/_mapping" -d '{ | |
"attachment" : { | |
"properties" : { | |
"file" : { | |
"type" : "attachment", | |
"fields" : { | |
"title" : { "store" : "yes" }, | |
"file" : { "term_vector":"with_positions_offsets", "store":"yes" } | |
} | |
} | |
} | |
} | |
}' | |
curl -C - -O http://www.intersil.com/data/fn/fn6742.pdf | |
coded=`cat fn6742.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'` | |
json="{\"file\":\"${coded}\"}" | |
echo "$json" > json.file | |
curl -X POST "${host}/test/attachment/" -d @json.file | |
echo | |
curl -XPOST "${host}/_refresh" | |
curl "${host}/_search?pretty=true" -d '{ | |
"fields" : ["title"], | |
"query" : { | |
"query_string" : { | |
"query" : "amplifier" | |
} | |
}, | |
"highlight" : { | |
"fields" : { | |
"file" : {} | |
} | |
} | |
}' | |
# | |
# The following is output of the last search query: | |
# | |
# | |
# | |
#{ | |
# "took" : 6, | |
# "timed_out" : false, | |
# "_shards" : { | |
# "total" : 1, | |
# "successful" : 1, | |
# "failed" : 0 | |
# }, | |
# "hits" : { | |
# "total" : 1, | |
# "max_score" : 0.005872132, | |
# "hits" : [ { | |
# "_index" : "test", | |
# "_type" : "attachment", | |
# "_id" : "UUaHJ6CfTOC3T2I4Kj_pXg", | |
# "_score" : 0.005872132, | |
# "fields" : { | |
# "file.title" : "ISL99201" | |
# }, | |
# "highlight" : { | |
# "file" : [ "\nMono <em>Amplifier</em> • Filterless Class D with Efficiency > 86% at 400mW\nThe ISL99201 is a fully integrat", "\nmono <em>amplifier</em>. It is designed to maximize performance for \nmobile phone applications. The applicat" ] | |
# } | |
# } ] | |
# } | |
#} |
The link http://www.intersil.com/data/fn/fn6742.pdf does not exist anymore
it is now at
http://www.intersil.com/content/dam/Intersil/documents/fn67/fn6742.pdf
This helped me so much with custom attachment type support implementation for django-haystack!!!
Thank you very much Lukas!
This helped me a ton. Thanks! I made a similar gist using Python - inspired by this one. https://gist.github.com/stevehanson/7461706
The script downloads an empty pdf file because the redirection from http://www.intersil.com/data/fn/fn6742.pdf to http://www.intersil.com/content/dam/Intersil/documents/fn67/fn6742.pdf is not followed by the curl command !
There is no error message or warning in the script, but as a result the Elastic query returns an empty resultset ! which might send you a long way wondering what happened..
So, to have a successfull result, you'll have to edit the line 27 of the script to edit the file URL or to use wget
instead of curl
(which will follow the redirection)
I forked this gist with the aforementioned correction here :
https://gist.github.com/zipang/6fe4ee9b821b5e454962
Do you have a script for Win32?
I'm not sure how to execute this part of the script on Windows.
coded=
cat fn6742.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'
In a windows command windows I can run "type fn6742.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
I get the error:
Can't locate IME/Base64.pm in @inc (@inc contains: c:/Perl/site/lib c:/Perl/lib
.).
BEGIN failed--compilation aborted.
The process tried to write to a nonexistent pipe.