Last active
June 21, 2016 10:19
-
-
Save PaulMougel/a89de604ddb2d2ecd8ea to your computer and use it in GitHub Desktop.
Attachment upload & indexation in Elasticsearch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# https://github.com/elasticsearch/elasticsearch-mapper-attachments | |
plugin install elasticsearch/elasticsearch-mapper-attachments/2.4.2 | |
curl -X PUT http://localhost:9200/test | |
# Note that here we declare that the attachement is stored in the field "my_attachment" | |
curl -X PUT http://localhost:9200/test/pdf/_mapping -d '{"pdf": {"properties": {"my_attachment": {"type": "attachment"}}}}' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The file has to fit in a JSON doc: base64 and end of lines encoded. | |
# http://stackoverflow.com/a/20046414/2137601 | |
coded=`cat my_file.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'` | |
# As specified in the schema, we upload the file's content in the "my_attachment" field | |
# (we can also add other fields) | |
json="{\"my_attachment\":\"${coded}\"}" | |
rm -f json.file && echo "$json" > json.file | |
curl -X POST "localhost:9200/test/pdf" -d @json.file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ curl -X GET http://localhost:9200/test/pdf/_search | |
$ curl -X GET http://localhost:9200/test/pdf/_search?q=word |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I am trying to index a pdf into ES using the above given code.
But, in 1-file-upload.sh, at line 8, i am getting following error:
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Failed to decode VALUE_STRING as base64 (MIME-NO-LINEFEEDS): Illegal white space character (code 0x20) as character #4 of 4-char base64 unit: can only used between units\n at [Source: org.elasticsearch.common.io.stream.InputStreamStreamInput@30431766; line: 1, column: 23]"}},"status":400}
In logs, following error is given:
[2016-06-21 10:13:41,176][INFO ][rest.suppressed ] /test/pdf Params: {index=test, type=pdf} MapperParsingException[failed to parse]; nested: JsonParseException[Failed to decode VALUE_STRING as base64 (MIME-NO-LINEFEEDS): Illegal white space character (code 0x20) as character #4 of 4-char base64 unit: can only used between units at [Source: org.elasticsearch.common.io.stream.InputStreamStreamInput@30431766; line: 1, column: 23]]; at org.elasticsearch.index.mapper.DocumentParser.innerParseDocument(DocumentParser.java:159) at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:79) at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:304) at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:517) at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:508) at org.elasticsearch.action.support.replication.TransportReplicationAction.prepareIndexOperationOnPrimary(TransportReplicationAction.java:1053) at org.elasticsearch.action.support.replication.TransportReplicationAction.executeIndexRequestOnPrimary(TransportReplicationAction.java:1061) at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:170) at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.performOnPrimary(TransportReplicationAction.java:579) at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1.doRun(TransportReplicationAction.java:452) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.fasterxml.jackson.core.JsonParseException: Failed to decode VALUE_STRING as base64 (MIME-NO-LINEFEEDS): Illegal white space character (code 0x20) as character #4 of 4-char base64 unit: can only used between units at [Source: org.elasticsearch.common.io.stream.InputStreamStreamInput@30431766; line: 1, column: 23] at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1581) at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getBinaryValue(UTF8StreamJsonParser.java:486) at com.fasterxml.jackson.core.JsonParser.getBinaryValue(JsonParser.java:1225) at org.elasticsearch.common.xcontent.json.JsonXContentParser.binaryValue(JsonXContentParser.java:190) at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:441) at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:314) at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:441) at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:267) at org.elasticsearch.index.mapper.DocumentParser.innerParseDocument(DocumentParser.java:127) ... 13 more
I am using ES 2.1.2 and Mapper-Attachment 3.1.2.
Please let me know what could be done to remove this error..
Thanks in advance.