Skip to content

Instantly share code, notes, and snippets.

@portante
Last active February 12, 2019 20:34
Show Gist options
  • Save portante/c801acce8ae275fdcfc743b346981b13 to your computer and use it in GitHub Desktop.
Save portante/c801acce8ae275fdcfc743b346981b13 to your computer and use it in GitHub Desktop.
A simple script to display the "health" of an ES cluster
#!/bin/bash
ES_URL='https://localhost:9200'
curl_get='curl -s -X GET --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key'
date
$curl_get $ES_URL/_cat/health?v
$curl_get $ES_URL/_cat/allocation?v\&h=node,host,ip,shards,disk.indices,disk.used,disk.avail,disk.total,disk.percent
# See https://www.elastic.co/guide/en/elasticsearch/reference/2.4/cat-nodes.html for header meanings
$curl_get $ES_URL/_cat/nodes?v\&h=name,host,r,m,hc,hp,hm,rc,rp,rm,fdc,fdp,fdm,load,uptime
$curl_get $ES_URL/_cat/nodes?v\&h=name,host,r,m,fm,fe,qcm,qce,rcm,rce,rchc,rcmc,ft,ftt,rto,rti,sc,sm,siwm,siwmx,svmm
$curl_get $ES_URL/_cat/nodes?v\&h=name,host,r,m,gc,gto,gti,iic,iito,iiti,mc,mt,mtt,sfc,sfto,sfti,sqc,sqto,sqti
$curl_get $ES_URL/_cat/thread_pool?v\&h=host,bulk.rejected,bulk.completed,bulk.queue,bulk.queueSize,bulk.size,bulk.active,bulk.largest
$curl_get $ES_URL/_cat/thread_pool?v\&h=host,refresh.rejected,refresh.completed,refresh.queue,refresh.size,refresh.active,refresh.largest
$curl_get $ES_URL/_cat/thread_pool?v\&h=host,management.rejected,management.completed,management.queue,management.size,management.active,management.largest
@portante
Copy link
Author

portante commented Nov 6, 2017

Example output:

Wed Apr  4 04:01:01 UTC 2018
epoch      timestamp cluster    status node.total node.data shards  pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1522814461 04:01:01  logging-es red             5         5  16037 8019    0    2        1           306               4.9m                100.0%
name                            host          r m  disk    hc hp     hm     rc rp     rm   fdc fdp     fdm  load uptime
logging-es-data-master-d0wsv6fm 10.130.26.15  d * 1.8tb   7gb 59 11.8gb 44.9gb 72 62.7gb 23149   2 1048576 34.50   5.4d
logging-es-data-master-qzegpjxm 10.128.38.104 d - 1.8tb 4.2gb 35 11.8gb 49.4gb 79 62.7gb 23430   2 1048576 11.31   5.4d
logging-es-data-master-gfhuq3b3 10.128.10.22  d - 1.8tb 6.3gb 53 11.8gb 52.2gb 83 62.7gb 23676   2 1048576 37.52   5.1d
logging-es-data-master-2ohdnb9t 10.131.46.9   d m 1.8tb 6.1gb 30 19.8gb 55.4gb 88 62.7gb 23493   2 1048576 18.14   5.4d
logging-es-data-master-8txl73t7 10.131.6.13   d m 1.8tb 5.7gb 48 11.8gb 48.6gb 78 62.7gb 23477   2 1048576 19.23   4.5d
name                            host          r m fm fe qcm qce rcm rce rchc rcmc     ft  ftt      rto   rti    sc      sm    siwm siwmx    svmm
logging-es-data-master-d0wsv6fm 10.130.26.15  d * 0b  0  0b   0  0b   0    0    0 135245 5.8h 19991795   23d 11871 157.9mb 140.1mb 4.7gb     1mb
logging-es-data-master-qzegpjxm 10.128.38.104 d - 0b  0  0b   0  0b   0    0    0 128489   4h 22772818  9.6d 12046   163mb 209.4mb 4.8gb 876.4kb
logging-es-data-master-gfhuq3b3 10.128.10.22  d - 0b  0  0b   0  0b   0    0    0 126097 4.1h 22399131 12.1d 12369 201.8mb 204.5mb 4.7gb   1.2mb
logging-es-data-master-2ohdnb9t 10.131.46.9   d m 0b  0  0b   0  0b   0    0    0 131095 2.8h 21580707    7d 12192 172.6mb 146.6mb 5.2gb     1mb
logging-es-data-master-8txl73t7 10.131.6.13   d m 0b  0  0b   0  0b   0    0    0 129940 4.8h 22524579 12.8d 12209 192.6mb 165.1mb 4.7gb 449.6kb
name                            host          r m gc gto   gti iic      iito  iiti mc      mt   mtt sfc   sfto  sfti sqc   sqto  sqti
logging-es-data-master-d0wsv6fm 10.130.26.15  d *  0 606 186ms  14 580087811 39.3d  7 3061740 34.9d   0 223268  3.8s   0 223268    4s
logging-es-data-master-qzegpjxm 10.128.38.104 d -  0 242 129ms   0 605650405 12.7d  0 2585360 16.1d   0  46892 617ms   0  46892 537ms
logging-es-data-master-gfhuq3b3 10.128.10.22  d -  0 236 142ms  65 596138241   18d  6 2712997 19.2d   0 120399    2s   0 120399  2.9s
logging-es-data-master-2ohdnb9t 10.131.46.9   d m  0  18   8ms  23 596781033  8.8d  7 2421689 12.9d   0      9   5ms   0      9  18ms
logging-es-data-master-8txl73t7 10.131.6.13   d m  0  31  28ms   2 584991298 18.4d  0 2716871 20.1d   0  39511 566ms   0  39511 391ms
host          bulk.rejected bulk.completed bulk.queue bulk.queueSize bulk.size bulk.active bulk.largest
10.130.26.15        2213698       40650519        379            400        16          16           16
10.128.38.104         43895       42455894          0            400        16          13           16
10.128.10.22         366721       41824620        331            400        16          16           16
10.131.46.9            8320       40880862          7            400        16          16           16
10.131.6.13          316570       42233000          1            400        16           7           16
host          refresh.rejected refresh.completed refresh.queue refresh.size refresh.active refresh.largest
10.130.26.15                 0          19994769             0            8              7               8
10.128.38.104                0          22776077             0            8              1               8
10.128.10.22                 0          22403753             8            8              8               8
10.131.46.9                  0          21582952             0            8              1               8
10.131.6.13                  0          22528702             0            8              1               8
host          management.rejected management.completed management.queue management.size management.active management.largest
10.130.26.15                    0               319600                0               5                 1                  5
10.128.38.104                   0               304931                0               4                 1                  4
10.128.10.22                    0               294648                0               4                 1                  4
10.131.46.9                     0               304932                0               4                 1                  4
10.131.6.13                     0               270428                0               4                 1                  4

@portante
Copy link
Author

portante commented May 24, 2018

Here is an example output from a healthy cluster:

Thu May 24 20:54:33 UTC 2018
epoch      timestamp cluster    status node.total node.data shards  pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1527195273 20:54:33  logging-es green           5         5   6478 3239    0    0        0             0                  -                100.0%
node                            host         ip           shards disk.indices disk.used disk.avail disk.total disk.percent
logging-es-data-master-2esm8hxv 10.129.28.65 10.129.28.65   1295         33gb   151.1gb      1.8tb      1.9tb            7
logging-es-data-master-hge0qoxy 10.131.22.69 10.131.22.69   1296         23gb   137.4gb      1.8tb      1.9tb            6
logging-es-data-master-bal6ofun 10.131.22.68 10.131.22.68   1296       22.4gb   136.2gb      1.8tb      1.9tb            6
logging-es-data-master-hpg82wh6 10.128.2.66  10.128.2.66    1296       22.5gb   136.6gb      1.8tb      1.9tb            6
logging-es-data-master-emt7f0y0 10.129.28.64 10.129.28.64   1295       32.3gb   150.3gb      1.8tb      1.9tb            7
name                            host         r m    hc hp     hm      rc rp      rm   fdc fdp     fdm  load uptime
logging-es-data-master-hpg82wh6 10.128.2.66  d * 7.7gb 38 19.8gb  61.8gb 99  62.7gb 11817   1 1048576 15.17   1.4d
logging-es-data-master-bal6ofun 10.131.22.68 d - 9.5gb 48 19.7gb 153.3gb 98 157.2gb 12058   1 1048576 42.79   1.4d
logging-es-data-master-emt7f0y0 10.129.28.64 d - 7.5gb 38 19.7gb 155.1gb 99 157.2gb 13361   1 1048576 79.74   1.4d
logging-es-data-master-hge0qoxy 10.131.22.69 d m  10gb 50 19.7gb 153.3gb 98 157.2gb 12270   1 1048576 42.79   1.4d
logging-es-data-master-2esm8hxv 10.129.28.65 d m 6.7gb 34 19.7gb 155.1gb 99 157.2gb 12301   1 1048576 81.72   1.4d
name                            host         r m fm fe  qcm qce rcm rce rchc rcmc    ft  ftt      rto  rti   sc      sm    siwm siwmx    svmm
logging-es-data-master-hpg82wh6 10.128.2.66  d * 0b  0   0b   0  0b   0    0    0 32493 1.8h  9537651 4.8d 6121 115.1mb 385.4mb 3.2gb 324.1kb
logging-es-data-master-bal6ofun 10.131.22.68 d - 0b  0 328b   0  0b   0    0    0 33041 1.9h 10013174 4.9d 6178 114.3mb 463.4mb 3.1gb 440.8kb
logging-es-data-master-emt7f0y0 10.129.28.64 d - 0b  0   0b   0  0b   0    0    0 23926 1.1h  7375397 3.4d 6538 133.3mb 808.3mb 3.7gb 537.6kb
logging-es-data-master-hge0qoxy 10.131.22.69 d m 0b  0 328b   0  0b   0    0    0 31700 1.6h 10575243   5d 6038 115.1mb   388mb 3.1gb 453.9kb
logging-es-data-master-2esm8hxv 10.129.28.65 d m 0b  0   0b   0  0b   0    0    0 28048   1h  8368253 3.8d 6182 131.9mb 137.5mb 3.8gb  66.2kb
name                            host         r m gc gto   gti     iic     iito iiti mc      mt  mtt sfc  sfto  sfti sqc  sqto  sqti
logging-es-data-master-hpg82wh6 10.128.2.66  d *  0 586  89ms 1641378 77655281 3.6d  7 1179457 9.5d   0     0    0s   0     0    0s
logging-es-data-master-bal6ofun 10.131.22.68 d -  0 126  23ms 3003419 61520394 5.6d  4 1235672 8.6d   0 17137   19s   0 17137 44.9s
logging-es-data-master-emt7f0y0 10.129.28.64 d -  0 245 318ms 7089820 46963347 3.5d 25  908099   6d   0    11  10ms   0    11  16ms
logging-es-data-master-hge0qoxy 10.131.22.69 d m  0 121 189ms 2842407 56871770 4.2d  0 1252566   9d   0 17152 16.2s   0 17152 43.6s
logging-es-data-master-2esm8hxv 10.129.28.65 d m  0 102 120ms 4718014 47393183 3.7d  1 1014222 6.8d   0     6 276ms   0     6  62ms
host         bulk.rejected bulk.completed bulk.queue bulk.queueSize bulk.size bulk.active bulk.largest
10.128.2.66          18559       22716322          0           1296        16           3           16
10.131.22.68             9       19595847          0           1296        32           0           32
10.129.28.64             0       16328617          0           1296        32           4           32
10.131.22.69             0       18980485          0           1296        32           0           32
10.129.28.65             0       16016642        690           1296        32          32           32
host         refresh.rejected refresh.completed refresh.queue refresh.size refresh.active refresh.largest
10.128.2.66                 0           9537758             0            8              1               8
10.131.22.68                0          10013490             1           10             10              10
10.129.28.64                0           7375554            11           10             10              10
10.131.22.69                0          10575571             0           10              7              10
10.129.28.65                0           8368308             0           10              2              10
host         management.rejected management.completed management.queue management.size management.active management.largest
10.128.2.66                    0                44043                0               4                 1                  4
10.131.22.68                   0                40816                0               3                 1                  3
10.129.28.64                   0                40791                0               3                 1                  3
10.131.22.69                   0                40815                0               3                 1                  3
10.129.28.65                   0                40794                0               3                 1                  3

@portante
Copy link
Author

Here is an example from a cluster with unassigned shards:

Thu May 24 20:56:43 UTC 2018
epoch      timestamp cluster    status node.total node.data shards  pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1527195403 20:56:43  logging-es red             3         3   5659 2828    0    0       12             0                  -                 99.8%
node                            host        ip          shards disk.indices disk.used disk.avail disk.total disk.percent
logging-es-data-master-gkvb8lj8 10.129.0.29 10.129.0.29   1576       27.4gb   135.2gb      1.8tb      1.9tb            6
logging-es-data-master-ml8zmsl7 10.128.0.77 10.128.0.77   2111       61.9gb   173.6gb      1.7tb      1.9tb            8
logging-es-data-master-zolfd11v 10.129.0.30 10.129.0.30   1972       49.3gb   160.4gb      1.8tb      1.9tb            7
UNASSIGNED                                                  12
name                            host        r m     hc hp     hm      rc rp      rm   fdc fdp     fdm  load uptime
logging-es-data-master-gkvb8lj8 10.129.0.29 d * 16.3gb 53 30.7gb 153.4gb 98 157.2gb 14818   1 1048576 31.14  21.3h
logging-es-data-master-ml8zmsl7 10.128.0.77 d m 18.1gb 58 30.7gb 155.8gb 99 157.2gb 19916   1 1048576 23.57  21.1h
logging-es-data-master-zolfd11v 10.129.0.30 d m 21.2gb 69 30.7gb 153.4gb 98 157.2gb 17992   1 1048576 31.14  21.3h
name                            host        r m fm fe qcm qce rcm rce rchc rcmc    ft   ftt     rto  rti    sc      sm    siwm siwmx    svmm
logging-es-data-master-gkvb8lj8 10.129.0.29 d * 0b  0  0b   0  0b   0    0    0 29950 44.7m 5557638 1.1d  7928 140.3mb 308.8mb 3.4gb 148.2kb
logging-es-data-master-ml8zmsl7 10.128.0.77 d m 0b  0  0b   0  0b   0    0    0 38451  1.2h 8177223 2.9d 10510 249.7mb   454mb   5gb 601.6kb
logging-es-data-master-zolfd11v 10.129.0.30 d m 0b  0  0b   0  0b   0    0    0 33444    1h 7602006 2.9d  9331 203.5mb 458.7mb 4.5gb 492.6kb
name                            host        r m gc gto  gti iic     iito iiti mc     mt  mtt sfc  sfto sfti sqc  sqto sqti
logging-es-data-master-gkvb8lj8 10.129.0.29 d *  0 147 36ms   1 48861959  21h  7 599867   2d   0 22386 1.5s   0 22386 2.5s
logging-es-data-master-ml8zmsl7 10.128.0.77 d m  0 104 70ms   0 95514486 2.4d  9 944469 4.9d   0 22148 2.6s   0 22148 5.7s
logging-es-data-master-zolfd11v 10.129.0.30 d m  0 195 47ms   5 79627575 2.9d  2 903139 4.5d   0 10225 1.7s   0 10225 1.8s
host        bulk.rejected bulk.completed bulk.queue bulk.queueSize bulk.size bulk.active bulk.largest
10.129.0.29        133901       10775966          0             50        32           0           32
10.128.0.77       2548220       21183026         43             50        32          32           32
10.129.0.30        480337       17562786          0             50        32           1           32
host        refresh.rejected refresh.completed refresh.queue refresh.size refresh.active refresh.largest
10.129.0.29                0           5558012             4           10             10              10
10.128.0.77                0           8177729             0           10              0              10
10.129.0.30                0           7602586             0           10              3              10
host        management.rejected management.completed management.queue management.size management.active management.largest
10.129.0.29                   0                30634                0               5                 1                  5
10.128.0.77                   0                28499                0               3                 1                  3
10.129.0.30                   0                28564                0               3                 1                  3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment