Last active
February 22, 2019 09:13
-
-
Save glenacota/8c0132c217ba72328ba4e105124768d3 to your computer and use it in GitHub Desktop.
An extended exercise that covers the "Mapping and Text Analysis" objective of the Elastic exam.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ** EXAM OBJECTIVES: MAPPINGS AND TEXT ANALYSIS ** | |
# (remove, if present, any `hamlet*` index and index template) | |
# Create the index `hamlet_1`, with one primary shard and no replicas | |
# Define the mapping for `hamlet_1`, satisfying the following criteria: (i) has a type "_doc" with three string fields named `speaker`, `line_number`, and `text_entry`; (ii) only `text_entry` is analysed; (iii) `text_entry` has a multi-field named `english`, associated with the built-in "english" analyzer; (iv) no aggregations supported by `line_number` | |
# Populate `hamlet_1` by running the _bulk command with the request-body below | |
{"index":{"_index":"hamlet_1","_id":0}} | |
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos there?"} | |
{"index":{"_index":"hamlet_1","_id":1}} | |
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay, answer me: stand, and unfold yourself."} | |
{"index":{"_index":"hamlet_1","_id":2}} | |
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live the king!"} | |
{"index":{"_index":"hamlet_1","_id":3}} | |
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though yet of Hamlet our dear brothers death"} | |
{"index":{"_index":"hamlet_1","_id":4}} | |
{"line_number":"1.2.2","speaker":"KING CLAUDIUS","text_entry":"The memory be green, and that it us befitted"} | |
{"index":{"_index":"hamlet_1","_id":5}} | |
{"line_number":"1.3.1","speaker":"LAERTES","text_entry":"My necessaries are embarkd: farewell:"} | |
{"index":{"_index":"hamlet_1","_id":6}} | |
{"line_number":"1.3.4","speaker":"LAERTES","text_entry":"But let me hear from you."} | |
{"index":{"_index":"hamlet_1","_id":7}} | |
{"line_number":"1.3.5","speaker":"OPHELIA","text_entry":"Do you doubt that?"} | |
{"index":{"_index":"hamlet_1","_id":8}} | |
{"line_number":"1.4.1","speaker":"HAMLET","text_entry":"The air bites shrewdly; it is very cold."} | |
{"index":{"_index":"hamlet_1","_id":9}} | |
# Create the index `hamlet_2`, which updates the mapping of `hamlet_1` by defining a multi-field for `speaker`. Such multi-field is named `token` and it maps to a (default) analysed text | |
# Reindex `hamlet_1` into `hamlet_2` | |
# Verify that full-text queries on "speaker.token" are enabled on `hamlet_2` | |
# Index more documents in `hamlet_2` by running the _bulk command with the request-body below | |
{"index":{"_index":"hamlet_2","_id":"p1"}} | |
{"name":"HAMLET","relationship":[{"name":"HORATIO","type":"friend"},{"name":"GERTRUDE","type":"mother"}]} | |
{"index":{"_index":"hamlet_2","_id":"p2"}} | |
{"name":"KING CLAUDIUS","relationship":[{"name":"HAMLET","type":"nephew"}]} | |
# The items of the `relationship` array cannot be searched independently. For example, the query below returns 1 hit | |
GET hamlet_2/_search | |
{ | |
"query": { | |
"bool": { | |
"must": [ | |
{ "match": { "relationship.name": "gertrude" } }, | |
{ "match": { "relationship.type": "friend" } } | |
] | |
} | |
} | |
} | |
# Create the index `hamlet_3`, which updates the mapping of `hamlet_2` by satisfying the following criteria: (i) the inner objects of `relationship` can be searched independently; (ii) the fields of the inner objects of `relationship` are all keywords type | |
# Reindex `hamlet_2` into `hamlet_3` | |
# Verify that the items in `relationship` can be searched independently of each other. For example, the query below should return 0 hits | |
GET hamlet_3/_search | |
{ | |
"query": { | |
"nested": { | |
"path": "relationship", | |
"query": { | |
"bool": { | |
"must": [ | |
{ "match": { "relationship.name": "GERTRUDE" } }, | |
{ "match": { "relationship.type": "friend" } } | |
] | |
} | |
} | |
} | |
} | |
} | |
# Change the value of `relationship.type` in the query above to get 1 hit | |
# So far, we have indexed two kinds of documents, either related to a character or to the dialogue. Notice that a profile-related document can be linked to many dialogue-related documents. We will model this one-to-many relation in the next step | |
# Create the index `hamlet_4`, which updates the mapping of `hamlet_3` by satisfying the following criteria: (i) has a join field named `profile_or_dialogue`; (ii) such join field defines a parent/child relation between a `profile` and a `dialogue`, respectively | |
# Reindex `hamlet_3` into `hamlet_4` | |
# Update the document with id "p2" (i.e., the profile document of King Claudius) by adding the field `profile_or_dialogue` and setting its property `name` to "profile" | |
# To be continued |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment