Skip to content

Instantly share code, notes, and snippets.

@gyli
Last active April 20, 2019 19:22
Show Gist options
  • Save gyli/c32b2af531e5948540e176dcc0b6bb73 to your computer and use it in GitHub Desktop.
Save gyli/c32b2af531e5948540e176dcc0b6bb73 to your computer and use it in GitHub Desktop.
Find the best number for parameter number_of_routing_shards in Elasticsearch
# https://www.elastic.co/guide/en/elasticsearch/reference/7.0/indices-split-index.html
# Parameter number_of_routing_shards is used for splitting index in Elasticsearch
# Since ES 7.0, it has default value, which is designed to split by factors of 2 up to a maximum of 1024 shards.
# However, depending on the original number of primary shards, the default value might not be the best choice,
# since it might not provide the most possibles the shards could be split to.
# For example, if the current primary shard number is 5, es would give number_of_routing_shards 650 as default value,
# and it allows the index to be split to 10, 20, 40, 80, 160, 320 or 640.
# However, assuming the maximum shard number is still 1024, set number_of_routing_shards to 900 would give the splitting
# more options: 5, 10, 15, 20, 25, 30, 45, 50, 60, 75, 90, 100, 150, 180, 225, 300, 450, 900
def find_best_number_of_routing_shards(current_shard: int, upper_limit: int = 1024) -> int:
max_count = 0
result = current_shard
for num in range(current_shard, upper_limit+1):
count = 0
for i in range(1, num + 1):
if num % i == 0:
if i % current_shard == 0:
count += 1
if count > max_count:
max_count = count
result = num
return result
find_best_number_of_routing_shards(current_shard=5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment