rupeshtiwari/opensearch performance.md

Last active September 5, 2024 17:56

Star (5) You must be signed in to star a gist
Fork (1) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/rupeshtiwari/ef155a1e7fdf8157430cacaa18a7e79a.js"></script>
Save rupeshtiwari/ef155a1e7fdf8157430cacaa18a7e79a to your computer and use it in GitHub Desktop.

Download ZIP

opensearch performance, aws, threadpool, active threads

Raw

opensearch performance.md

How do I troubleshoot search latency spikes in my Amazon OpenSearch Service cluster?
How can I improve the indexing performance on my Amazon OpenSearch Service cluster?
- Resolution
- Related information
How do I resolve search or write rejections in OpenSearch Service?
- Short description
- Resolution

How do I troubleshoot search latency spikes in my Amazon OpenSearch Service cluster?

https://repost.aws/knowledge-center/opensearch-latency-spikes

Short description

For search requests, OpenSearch Service calculates the round trip time as follows:

Round trip = Time the query spends in the query phase + Time in the fetch phase + Time spent in the queue + Network latency

The SearchLatency metric on Amazon CloudWatch gives you the time that the query spent in the query phase.

To troubleshoot search latency spikes in your OpenSearch Service cluster, there are multiple steps that you can take:

Check for insufficient resources provisioned on the cluster
Check for search rejections using the ThreadpoolSearchRejected metric in CloudWatch
Use the search slow logs API and the profile API
Resolve any 504 gateway timeout errors

Resolution

Check for insufficient resources provisioned on the cluster

If you have insufficient resources provisioned on your cluster, then you might experience search latency spikes. Use the following best practices to make sure that you have sufficient resources provisioned.

1. Review the CPUUtilization metric and the JVMMemory pressure of the cluster using CloudWatch. Confirm that they're within the recommended thresholds. For more information, see Recommended CloudWatch alarms for Amazon OpenSearch Service.

2. Use the node stats API to get node level statistics on your cluster:

GET /_nodes/stats

In the output, check the following sections: caches, fielddata, and jvm. To compare the outputs, run this API multiple times with some delay between each output.

3. OpenSearch Service uses multiple caches to improve its performance and the response time of requests:

The file-system cache, or page cache, that exists on the operating system level
The shard level request cache and query cache that both exist on the OpenSearch Service level

Review the node stats API output for cache evictions. A high number of cache evictions in the output means that the cache size is inadequate to serve the request. To reduce your evictions, use bigger nodes with more memory.

To view specific cache information with the node stats API, use the following requests. The second request is for a shard-level request cache:

GET /_nodes/stats/indices/request_cache?human

GET /_nodes/stats/indices/query_cache?human

For more information on OpenSearch caches, see Elasticsearch caching deep dive: Boosting query speed one cache at a time on the Elastic website.

For steps to clear the various caches, see Clear index or data stream cache in the OpenSearch website.

4. Performing aggregations on fields that contain highly unique values might cause an increase in the heap usage. If aggregation queries are already in use, then search operations use fielddata. Fielddata also sorts and accesses the field values in the script. Fielddata evictions depend on the size of the indices.fielddata.cache.size file, and this accounts for 20% of the JVM heap space. When the cache is exceeded, eviction start.

To view the fielddata cache, use this request:

GET /_nodes/stats/indices/fielddata?human

For more information on troubleshooting high JVM memory, see How do I troubleshoot high JVM memory pressure on my Amazon OpenSearch Service cluster?
To troubleshoot high CPU utilization, see How do I troubleshoot high CPU utilization on my Amazon OpenSearch Service cluster?

Check for search rejections using the ThreadpoolSearchRejected metric in CloudWatch

To check for search rejections using CloudWatch, follow the steps in How do I resolve search or write rejections in Amazon OpenSearch Service?

Use search slow logs to identify long running queries

To identify both long running queries and the time that a query spent on a particular shard, use slow logs. You can set thresholds for the query phase and then fetch the phase for each index. For more information on setting up slow logs, see Viewing Amazon Elasticsearch Service slow logs. For a detailed breakdown of the time that's spent by your query in the query phase, set "profile":true for your search query .

Note: If you set the threshold for logging to a very low value, your JVM memory pressure might increase. This might lead to more frequent garbage collection that then increases CPU utilization and adds to cluster latency. Logging more queries might also increase your costs. A long output of the profile API also adds significant overhead to any search queries.

Resolve any 504 gateway timeout errors

From the application logs of your OpenSearch Service cluster, you can see specific HTTP error codes for individual requests. For more information on resolving HTTP 504 gateway timeout errors, see How can I prevent HTTP 504 gateway timeout errors in Amazon OpenSearch Service?

Note: You must activate error logs to identify specific HTTP error codes. For more information about HTTP error codes, see Viewing Amazon OpenSearch Service error logs.

Other factors that can cause high search latency

There are a number of other factors that can cause high search latency. Use the following tips to further troubleshoot high search latency:

Frequent or long running garbage collection activity might cause search performance issues. Garbage collection activity might pause threads and increase search latency. For more information, see A heap of trouble: Managing Amazon OpenSearch Service's managed heap on the Elastic website.
Provisioned IOPS (or i3 instances) might help you remove any Amazon Elastic Block Store (Amazon EBS) bottleneck. In most cases, you don't need them. Before you move an instance to i3, it's a best practice to test the performance between i3 nodes and r5 nodes.
A cluster with too many shards might increase resource utilization, even when the cluster is inactive. Too many shards slow down query performance. Although increasing the replica shard count can help you achieve faster searches, don't go beyond 1000 shards on a given node. Also, make sure that the shard sizes are between 10 GiB and 50 GiB. Ideally, the maximum number of shards on a node is heap * 20.
Too many segments or too many deleted documents might affect search performance. To improve perform, use force merge on read-only indices. Also, increase the refresh internal on the active indices, if possible. For more information, see Lucene's handling of deleted documents on the Elastic website.
If your cluster is in a Virtual Private Cloud (VPC), then it's a best practice to run your applications within the same VPC.
Use UltraWarm nodes or hot data nodes for read-only data. Hot storage provides the fastest possible performance for indexing and searching new data. However, UltraWarm nodes are a cost-effective way to store large amounts of read-only data on your cluster. For indices that you don't need to write to and don't require high performance, UltraWarm offers significantly lower costs per GiB of data.
Determine if your workload benefits from having the data that you're searching for available on all nodes. Some applications benefit from this approach, especially if there are few indices on your cluster. To do this, increase the number of replica shards.
Note: This might add to indexing latency. Also, you might need additional Amazon EBS storage to accommodate the replicas that you add. This increases your EBS volume costs.
Search as few fields as possible, and avoid scripts and wildcard queries. For more information, see Tune for search speed on the Elastic website.
For indices with many shards, use custom routing to help speed up searches. Custom routing makes sure that you query only the shards that hold your data, rather than broadcast the request to all shards. For more information, see Customizing your document routing on the Elastic website.

Related information

How can I improve the indexing performance on my Amazon OpenSearch Service cluster?

Resolution

https://repost.aws/knowledge-center/opensearch-indexing-performance

Be sure that the shards are distributed evenly across the data nodes for the index that you're ingesting into

Use the following formula to confirm that the shards are evenly distributed:

Number of shards for index = k * (Number of data nodes), where k is the number of shards per node

For example, if there are 24 shards in the index, and there are eight data nodes, then OpenSearch Service assigns three shards to each node. For more information, see Get started with Amazon OpenSearch Service: How many shards do I need?

Increase the refresh_interval to 60 seconds or more

Refresh your OpenSearch Service index so that your documents are available for search. Note that refreshing your index requires the same resources that are used by indexing threads.

The default refresh interval is one second. When you increase the refresh interval, the data node makes fewer API calls. The refresh interval can be shorter or faster, depending on the length of the refresh interval. To prevent 429 errors, it's a best practice to increase the refresh interval.

Note: The default refresh interval is one second for indices that receive one or more search requests in the last 30 seconds. For more information about the updated default interval, see _refresh API version 7.x on the Elasticsearch website.

Change the replica count to zero

If you anticipate heavy indexing, consider setting the index.number_of_replicas value to "0." Each replica duplicates the indexing process. As a result, disabling the replicas improves your cluster performance. After the heavy indexing is complete, reactivate the replicated indices.

Important: If a node fails while replicas are disabled, you might lose data. Disable the replicas only if you can tolerate data loss for a short duration.

Experiment to find the optimal bulk request size

Start with the bulk request size of 5 MiB to 15 MiB. Then, slowly increase the request size until the indexing performance stops improving. For more information, see Using and sizing bulk requests on the Elasticsearch website.

Note: Some instance types limit bulk requests to 10 MiB. For more information, see Network limits.

Use an instance type that has SSD instance store volumes (such as I3)

I3 instances provide fast and local memory express (NVMe) storage. I3 instances deliver better ingestion performance than instances that use General Purpose SSD (gp2) Amazon Elastic Block Store (Amazon EBS) volumes. For more information, see Petabyte scale for Amazon OpenSearch Service.

Reduce response size

To reduce the size of OpenSearch Service's response, use the filter_path parameter to exclude unnecessary fields. Be sure that you don't filter out any fields that are required to identify or retry failed requests. Those fields can vary by client.

In the following example, the index-name, type-name, and took fields are excluded from the response:

curl -XPOST "es-endpoint/index-name/type-name/_bulk?pretty&filter_path=-took,-items.index._index,-items.index._type" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test2", "_id" : "1" } }
{ "user" : "testuser" }
{ "update" : {"_id" : "1", "_index" : "test2"} }
{ "doc" : {"user" : "example"} }

For more information, see Reducing response size.

Increase the value of index.translog.flush_threshold_size

By default, index.translog.flush_threshold_size is set to 512 MB. This means that the translog is flushed when it reaches 512 MB. The weight of the indexing load determines the frequency of the translog. When you increase index.translog.flush_threshold_size, the node performs the translog operation less frequently. Because OpenSearch Service flushes are resource-intensive operations, reducing the frequency of translogs improves indexing performance. By increasing the flush threshold size, the OpenSearch Service cluster also creates fewer large segments (instead of multiple small segments). Large segments merge less often, and more threads are used for indexing instead of merging.

Note: An increase in index.translog.flush_threshold_size can also increase the time that it takes for a translog to complete. If a shard fails, then recovery takes more time because the translog is larger.

Before increasing index.translog.flush_threshold_size, call the following API operation to get current flush operation statistics:

curl -XPOST "os-endpoint/index-name/_stats/flush?pretty"

Replace the os-endpoint and index-name with your respective variables.

In the output, note the number of flushes and the total time. The following example output shows that there are 124 flushes, which took 17,690 milliseconds:

{
     "flush": {
          "total": 124,
          "total_time_in_millis": 17690
     }
}

To increase the flush threshold size, call the following API operation:

$ curl -XPUT "os-endpoint/index-name/_settings?pretty" -d "{"index":{"translog.flush_threshold_size" : "1024MB"}}"

In this example, the flush threshold size is set to 1024 MB, which is ideal for instances that have more than 32 GB of memory.

Note: Choose the appropriate threshold size for your OpenSearch Service domain.

Run the _stats API operation again to see whether the flush activity changed:

$ curl _XGET "os-endpoint/index-name/_stats/flush?pretty"

Note: It's a best practice to increase the index.translog.flush_threshold_size only for the current index. After you confirm the outcome, apply the changes to the index template.

Segment replication strategy

Segment replication can be applied in a variety of scenarios, including:

High write loads without high search requirements and with longer refresh times.
When experiencing very high loads, you want to add new nodes but don’t want to index all data immediately.
OpenSearch cluster deployments with low replica counts, such as those used for log analytics.

PUT /my-index1
{
  "settings": {
    "index": {
      "replication.type": "SEGMENT" 
    }
  }
}

For the best performance, it is recommended that you enable the following settings:

Segment replication backpressure
Balanced primary shard allocation, using the following command:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.balance.prefer_primary": true,
    "segrep.pressure.enabled": true
  }
}

You can set the default replication type for newly created cluster indexes in the opensearch.yml file as follows:

cluster.indices.replication.strategy: 'SEGMENT'

Try Segment replication approach

Related information

Best practices for Amazon OpenSearch Service

How do I resolve search or write rejections in OpenSearch Service?

https://repost.aws/knowledge-center/opensearch-resolve-429-error

Short description

When you write or search for data in your OpenSearch Service cluster, you might receive the following HTTP 429 error or es_rejected_execution_exception:

error":"elastic: Error 429 (Too Many Requests): rejected execution of org.elasticsearch.transport.TransportService$7@b25fff4 on 
EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@768d4a66[Running, 
pool size = 2, active threads = 2, queued tasks = 200, completed tasks = 820898]] [type=es_rejected_execution_exception]"

Reason={"type":"es_rejected_execution_exception","reason":"rejected execution of org.elasticsearch.transport.TcpTransport$RequestHandler@3ad6b683 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@bef81a5[Running, pool size = 25, active threads = 23, queued tasks = 1000, completed tasks = 440066695]]"

The following variables can contribute to an HTTP 429 error or es_rejected_execution_exception:

Data node instance types and search or write limits
High values for instance metrics
Active and Queue threads
High CPU utilization and JVM memory pressure

HTTP 429 errors can occur because of search and write requests to your cluster. Rejections can also come from a single node or multiple nodes of your cluster.

Note: Different versions of Elasticsearch use different thread pools to process calls to the _index API. Elasticsearch versions 1.5 and 2.3 use the index thread pool. Elasticsearch versions 5.x, 6.0, and 6.2 use the bulk thread pool. Elasticsearch versions 6.3 and later use the write thread pool. For more information, see Thread pool on the Elasticsearch website.

Resolution

Data node instance types and search or write limits

A data node instance type has fixed virtual CPUs (vCPUs). Plug the vCPU count into your formula to retrieve the concurrent search or write operations that your node can perform before the node enters a queue. If an active thread becomes full, then the thread spills over to a queue and is eventually rejected. For more information about the relationship between vCPUs and node types, see OpenSearch Service pricing.

Additionally, there is a limit to how many searches per node or writes per node that you can perform. This limit is based on the thread pool definition and Elasticsearch version number. For more information, see Thread pool on the Elasticsearch website.

For example, if you choose the R5.2xlarge node type for five nodes in your Elasticsearch cluster (version 7.4), then the node will have 8 vCPUs.

Use the following formula to calculate maximum active threads for search requests:

int ((# of available_processors * 3) / 2) + 1

Use the following formula to calculate maximum active threads for write requests:

int (# of available_processors)

For an R5.2xlarge node, you can perform a maximum of 13 search operations:

(8 VCPUs * 3) / 2 + 1 = 13 operations

For an R5.2xlarge node, you can perform a maximum of 8 write operations:

8 VCPUs = 8 operations

For an OpenSearch Service cluster with five nodes, you can perform a maximum of 65 search operations:

5 nodes * 13 = 65 operations

For an OpenSearch Service cluster with five nodes, you can perform a maximum of 40 write operations:

5 nodes * 8 = 40 operations

High values for instance metrics

To troubleshoot your 429 exception, check the following Amazon CloudWatch metrics for your cluster:

IndexingRate: The number of indexing operations per minute. A single call to the _bulk API that adds two documents and updates two counts as four operations that might spread across nodes. If that index has one or more replicas, other nodes in the cluster also record a total of four indexing operations. Document deletions don't count towards the IndexingRate metric.
SearchRate: The total number of search requests per minute for all shards on a data node. A single call to the _search API might return results from many different shards. If five different shards are on one node, then the node reports "5" for this metric, even if the client only made one request.
CoordinatingWriteRejected: The total number of rejections that occurred on the coordinating node. These rejections are caused by the indexing pressure that accumulated since the OpenSearch Service startup.
PrimaryWriteRejected: The total number of rejections that occurred on the primary shards. These rejections are caused by indexing pressure that accumulated since the last OpenSearch Service startup.
ReplicaWriteRejected: The total number of rejections that occurred on the replica shards because of indexing pressure. These rejections are caused by indexing pressure that accumulated since the last OpenSearch Service startup.
ThreadpoolWriteQueue: The number of queued tasks in the write thread pool. This metric tells you whether a request is being rejected because of high CPU usage or high indexing concurrency.
ThreadpoolWriteRejected: The number of rejected tasks in the write thread pool.
Note: The default write queue size was increased from 200 to 10,000 in OpenSearch Service version 7.9. As a result, this metric is no longer the only indicator of rejections from OpenSearch Service. Use the CoordinatingWriteRejected, PrimaryWriteRejected, and ReplicaWriteRejected metrics to monitor rejections in versions 7.9 and later.
ThreadpoolSearchQueue: The number of queued tasks in the search thread pool. If the queue size is consistently high, then consider scaling your cluster. The maximum search queue size is 1,000.
ThreadpoolSearchRejected: The number of rejected tasks in the search thread pool. If this number continually grows, then consider scaling your cluster.
JVMMemoryPressure: The JVM memory pressure specifies the percentage of the Java heap in a cluster node. If JVM memory pressure reaches 75%, OpenSearch Service initiates the Concurrent Mark Sweep (CMS) garbage collector. The garbage collection is a CPU-intensive process. If JVM memory pressure stays at this percentage for a few minutes, then you might encounter cluster performance issues. For more information, see How do I troubleshoot high JVM memory pressure on my Amazon OpenSearch Service cluster?

Note: The thread pool metrics that are listed help inform you about the IndexingRate and SearchRate.

For more information about monitoring your OpenSearch Service cluster with CloudWatch, see Instance metrics.

Active and Queue threads

If there is a lack of CPUs or high request concurrency, then a queue can fill up quickly, resulting in an HTTP 429 error. To monitor your queue threads, check the ThreadpoolSearchQueue and ThreadpoolWriteQueue metrics in CloudWatch.

To check your Active and Queue threads for any search rejections, use the following command:

GET /_cat/thread_pool/search?v&h=id,name,active,queue,rejected,completed

To check Active and Queue threads for write rejections, replace "search" with "write". The rejected and completed values in the output are cumulative node counters, which are reset when new nodes are launched. For more information, see the Example with explicit columns section of cat thread pool API on the Elasticsearch website.

Note: The bulk queue on each node can hold between 50 and 200 requests, depending on which Elasticsearch version you are using. When the queue is full, new requests are rejected.

Errors for search and write rejections

Search rejections

A search rejection error indicates that active threads are busy and that queues are filled up to the maximum number of tasks. As a result, your search request can be rejected. You can configure OpenSearch Service logs so that these error messages appear in your search slow logs.

Note: To avoid extra overhead, set your slow log threshold to a generous amount. For example, if most of your queries take 11 seconds and your threshold is "10", then OpenSearch Service takes more time to write a log. You can avoid this overhead by setting your slow log threshold to 20 seconds. Then, only a small percentage of the slower queries (that take longer than 11 seconds) is logged.

After your cluster is configured to push search slow logs to CloudWatch, set a specific threshold for slow log generation. You can set a specific threshold for slow log generation with the following HTTP POST call:

curl -XPUT http://<your domain’s endpoint>/index/_settings -d '{"index.search.slowlog.threshold.query.<level>":"10s"}'

Write rejections

A 429 error message as a write rejection indicates a bulk queue error. The es_rejected_execution_exception[bulk] indicates that your queue is full and that any new requests are rejected. This bulk queue error occurs when the number of requests to the cluster exceeds the bulk queue size (threadpool.bulk.queue_size). A bulk queue on each node can hold between 50 and 200 requests, depending on which Elasticsearch version you are using.

You can configure OpenSearch Service logs so that these error messages appear in your index slow logs.

Note: To avoid extra overhead, set your slow log threshold to a generous amount. For example, if most of your queries take 11 seconds and your threshold is "10", then OpenSearch Service will take additional time to write a log. You can avoid this overhead by setting your slow log threshold to 20 seconds. Then, only a small percentage of the slower queries (that take longer than 11 seconds) is logged.

After your cluster is configured to push search slow logs to CloudWatch, set a specific threshold for slow log generation. To set a specific threshold for slow log generation, use the following HTTP POST call:

curl -XPUT http://<your domain’s endpoint>/index/_settings -d '{"index.indexing.slowlog.threshold.query.<level>":"10s"}'

Write rejection best practices

Here are some best practices that mitigate write rejections:

When documents are indexed faster, the write queue is less likely to reach capacity.
Tune bulk size according to your workload and desired performance. For more information, see Tune for indexing speed on the Elasticsearch website.
Add exponential retry logic in your application logic. The exponential retry logic makes sure that failed requests are automatically retried.
Note: If your cluster continuously experiences high concurrent requests, then the exponential retry logic won't help resolve the 429 error. Use this best practice when there is a sudden or occasional spike of traffic.
If you are ingesting data from Logstash, then tune the worker count and bulk size. It's a best practice to set your bulk size between 3-5 MB.

For more information about indexing performance tuning, see How can I improve the indexing performance on my OpenSearch Service cluster?

Search rejection best practices

Here are some best practices that mitigate search rejections:

Switch to a larger instance type. OpenSearch Service relies heavily on the filesystem cache for faster search results. The number of threads in the thread pool on each node for search requests is equal to the following: int((# of available_processors * 3) / 2) + 1. Switch to an instance with more vCPUs to get more threads to process search requests.
Turn on search slow logs for a given index or for all indices with a reasonable threshold value. Check to see which queries are taking longer to run and implement search performance strategies for your queries. For more information, see Troubleshooting Elasticsearch searches, for beginners or Advanced tuning: Finding and fixing slow Elasticsearch queries on the Elasticsearch website.

Raw

threadpool in opensearch.md

Thread pools

https://www.elastic.co/guide/en/elasticsearch/reference/7.10/modules-threadpool.html

A node uses several thread pools to manage memory consumption. Queues associated with many of the thread pools enable pending requests to be held instead of discarded.

There are several thread pools, but the important ones include:

generic

For generic operations (for example, background node discovery). Thread pool type is scaling.

search

For count/search/suggest operations. Thread pool type is fixed_auto_queue_size with a size of int((# of allocated processors * 3) / 2) + 1, and initial queue_size of 1000.

search_throttled

For count/search/suggest/get operations on search_throttled indices. Thread pool type is fixed_auto_queue_size with a size of 1, and initial queue_size of 100.

get

For get operations. Thread pool type is fixed with a size of # of allocated processors, queue_size of 1000.

analyze

For analyze requests. Thread pool type is fixed with a size of 1, queue size of 16.

write

For single-document index/delete/update and bulk requests. Thread pool type is fixed with a size of # of allocated processors, queue_size of 10000. The maximum size for this pool is 1 + # of allocated processors.

snapshot

For snapshot/restore operations. Thread pool type is scaling with a keep-alive of 5m and a max of min(5, (# of allocated processors) / 2).

warmer

For segment warm-up operations. Thread pool type is scaling with a keep-alive of 5m and a max of min(5, (# of allocated processors) / 2).

refresh

For refresh operations. Thread pool type is scaling with a keep-alive of 5m and a max of min(10, (# of allocated processors) / 2).

listener

Mainly for java client executing of action when listener threaded is set to true. Thread pool type is scaling with a default max of min(10, (# of allocated processors) / 2).

fetch_shard_started

For listing shard states. Thread pool type is scaling with keep-alive of 5m and a default maximum size of 2 * # of allocated processors.

fetch_shard_store

For listing shard stores. Thread pool type is scaling with keep-alive of 5m and a default maximum size of 2 * # of allocated processors.

flush

For flush, synced flush, and translog fsync operations. Thread pool type is scaling with a keep-alive of 5m and a default maximum size of min(5, ( # of allocated processors) / 2).

force_merge

For force merge operations. Thread pool type is fixed with a size of 1 and an unbounded queue size.

management

For cluster management. Thread pool type is scaling with a keep-alive of 5m and a default maximum size of 5.

system_read

For read operations on system indices. Thread pool type is fixed and a default maximum size of min(5, (# of allocated processors) / 2).

system_write

For write operations on system indices. Thread pool type is fixed and a default maximum size of min(5, (# of allocated processors) / 2).

Changing a specific thread pool can be done by setting its type-specific parameters; for example, changing the number of threads in the write thread pool:

thread_pool: write: size: 30

Thread pool types

The following are the types of thread pools and their respective parameters:

`fixed`

The fixed thread pool holds a fixed size of threads to handle the requests with a queue (optionally bounded) for pending requests that have no threads to service them.

The size parameter controls the number of threads.

The queue_size allows to control the size of the queue of pending requests that have no threads to execute them. By default, it is set to -1 which means its unbounded. When a request comes in and the queue is full, it will abort the request.

thread_pool: write: size: 30 queue_size: 1000

`fixed_auto_queue_size`

This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

deprecated[7.7.0,The experimental fixed_auto_queue_size thread pool type is deprecated and will be removed in 8.0.]

The fixed_auto_queue_size thread pool holds a fixed size of threads to handle the requests with a bounded queue for pending requests that have no threads to service them. It’s similar to the fixed threadpool, however, the queue_size automatically adjusts according to calculations based on Little’s Law. These calculations will potentially adjust the queue_size up or down by 50 every time auto_queue_frame_size operations have been completed.

The size parameter controls the number of threads.

The queue_size allows to control the initial size of the queue of pending requests that have no threads to execute them.

The min_queue_size setting controls the minimum amount the queue_size can be adjusted to.

The max_queue_size setting controls the maximum amount the queue_size can be adjusted to.

The auto_queue_frame_size setting controls the number of operations during which measurement is taken before the queue is adjusted. It should be large enough that a single operation cannot unduly bias the calculation.

The target_response_time is a time value setting that indicates the targeted average response time for tasks in the thread pool queue. If tasks are routinely above this time, the thread pool queue will be adjusted down so that tasks are rejected.

thread_pool: search: size: 30 queue_size: 500 min_queue_size: 10 max_queue_size: 1000 auto_queue_frame_size: 2000 target_response_time: 1s

`scaling`

The scaling thread pool holds a dynamic number of threads. This number is proportional to the workload and varies between the value of the core and max parameters.

The keep_alive parameter determines how long a thread should be kept around in the thread pool without it doing any work.

thread_pool: warmer: core: 1 max: 8 keep_alive: 2m

Allocated processors setting

The number of processors is automatically detected, and the thread pool settings are automatically set based on it. In some cases it can be useful to override the number of detected processors. This can be done by explicitly setting the node.processors setting.

node.processors: 2

There are a few use-cases for explicitly overriding the node.processors setting:

If you are running multiple instances of Elasticsearch on the same host but want want Elasticsearch to size its thread pools as if it only has a fraction of the CPU, you should override the node.processors setting to the desired fraction, for example, if you’re running two instances of Elasticsearch on a 16-core machine, set node.processors to 8. Note that this is an expert-level use case and there’s a lot more involved than just setting the node.processors setting as there are other considerations like changing the number of garbage collector threads, pinning processes to cores, and so on.
Sometimes the number of processors is wrongly detected and in such cases explicitly setting the node.processors setting will workaround such issues.

In order to check the number of processors detected, use the nodes info API with the os flag.