The idea is to use a Elastic Search Data Stream https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html
A data stream is a way to handle time-series data (such as webhook logs) that rolls over time.
A data stream is backed by:
- an alias used for writing/searching (eg "webhooks_logs")
- a set of hidden backing indexes that store data
- a index template, that defines the mapping and fields used in each index
- a rollover configuration that can delete old indexes, create new ones and modify which is the active writing index
Mappings are the way to specify schema for indexed documents
important: make sure that _source
metadata is not disabled
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html
important: mappings don't have types since ES 7.x https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html
Prerequisites:
- Elasticsearch data streams are intended for time series data only. Each document indexed to a data stream must contain the
@timestamp
field. This field must be mapped as adate
ordate_nanos
field data type. - Data streams are best suited for time-based, append-only use cases. If you frequently need to update or delete existing documents, we recommend using an index alias and an index template instead.
ILM can be used to automatically manage a data stream’s backing indices. For example: rotating indexes based on size or age.
https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-put-lifecycle.html https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-index-lifecycle.html
PUT /_ilm/policy/my-data-stream-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_age": "1d",
"max_size": "100GB"
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}
response:
{"acknowledged": true}
A data stream uses an index template to configure its backing indices. A template for a data stream must specify:
- One or more index patterns that match the name of the stream.
- The mappings and settings for the stream’s backing indices.
- That the template is used exclusively for data streams.
- A priority for the template.
PUT /_index_template/my-data-stream-template
{
"index_patterns": [ "my-data-stream*" ],
"data_stream": { },
"priority": 200,
"template": {
"mappings": {
"properties": {
"@timestamp": { "type": "date_nanos" }
}
},
"settings": {
"index.lifecycle.name": "my-data-stream-policy"
}
},
"version": "external-version",
"_meta": { "whatever": "you-want" }
}
PUT /_data_stream/my-data-stream
After it's created, you can query the Data Stream params:
GET /_data_stream/my-data-stream
{
"data_streams": [
{
"name": "my-data-stream",
"timestamp_field": {
"name": "@timestamp"
},
"indices": [
{
"index_name": ".ds-my-data-stream-000001",
"index_uuid": "krR78LfvTOe6gr5dj2_1xQ"
},
{
"index_name": ".ds-my-data-stream-000002",
"index_uuid": "C6LWyNJHQWmA08aQGvqRkA"
}
],
"generation": 2,
"status": "GREEN",
"template": "my-data-stream-template",
"ilm_policy": "my-data-stream-policy"
}
]
}
You can add documents to a data stream using two types of indexing requests:
- Individual indexing requests
- Bulk indexing requests
PUT /my-data-stream/_create/{id}
{
"@timestamp": "2020-12-07T11:06:07.000Z",
"user": {
"id": "8a4f500d"
},
"message": "Login successful"
}
PUT /my-data-stream/_bulk?refresh
{"create":{ }}
{ "@timestamp": "2020-12-08T11:04:05.000Z", "user": { "id": "vlb44hny" }, "message": "Login attempt failed" }
{"create":{"_id": "3"}}
{ "@timestamp": "2020-12-08T11:06:07.000Z", "user": { "id": "8a4f500d" }, "message": "Login successful" }
{"create":{ }}
{ "@timestamp": "2020-12-09T11:07:08.000Z", "user": { "id": "l7gk7f82" }, "message": "Logout successful" }