Skip to content

Instantly share code, notes, and snippets.

@vanch
Last active March 15, 2021 19:33
Show Gist options
  • Save vanch/ec383243ee92bb34956cc31c9bf4a74d to your computer and use it in GitHub Desktop.
Save vanch/ec383243ee92bb34956cc31c9bf4a74d to your computer and use it in GitHub Desktop.
es cluster bug
[pod/elasticsearch-master-1/elasticsearch] {"type": "server", "timestamp": "2021-03-15T19:32:02,579Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elasticsearch", "nod
e.name": "elasticsearch-master-1", "message": "master not discovered or elected yet, an election requires 2 nodes with ids [5m__iT3jSQW1vwEV_xFwMQ, BkCL1vzsR7mUQRzWdrBLXg], have discovered [{elasticsearch-master
-1}{BkCL1vzsR7mUQRzWdrBLXg}{hd8g7UblT_CjTUTkEfIPsg}{10.233.84.50}{10.233.84.50:9300}{cdhilmrstw}{ml.machine_memory=2147483648, xpack.installed=true, transform.node=true, ml.max_open_jobs=20, ml.max_jvm_size=1073
741824}, {elasticsearch-master-2}{m9dArdVXSq-6GBEDR2wRIA}{TRIr99EFSjyYlvQVJXGasA}{10.233.118.54}{10.233.118.54:9300}{cdhilmrstw}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.max_jv
m_size=1073741824, transform.node=true}, {elasticsearch-master-0}{5m__iT3jSQW1vwEV_xFwMQ}{z2kJjCDeSBSuUVX6vB34cQ}{10.233.88.52}{10.233.88.52:9300}{cdhilmrstw}{ml.machine_memory=2147483648, ml.max_open_jobs=20, x
pack.installed=true, ml.max_jvm_size=1073741824, transform.node=true}] which is a quorum; discovery will continue using [10.233.88.52:9300, 10.233.118.54:9300] from hosts providers and [{elasticsearch-master-1}{
BkCL1vzsR7mUQRzWdrBLXg}{hd8g7UblT_CjTUTkEfIPsg}{10.233.84.50}{10.233.84.50:9300}{cdhilmrstw}{ml.machine_memory=2147483648, xpack.installed=true, transform.node=true, ml.max_open_jobs=20, ml.max_jvm_size=10737418
24}] from last-known cluster state; node term 1, last-accepted version 0 in term 0" }
[pod/elasticsearch-master-1/elasticsearch] {"type": "server", "timestamp": "2021-03-15T19:32:08,387Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "elasticsearch", "node.name": "elasticsearch-m
aster-1", "message": "path: /_cluster/health, params: {wait_for_status=green, timeout=1s}",
[pod/elasticsearch-master-1/elasticsearch] "stacktrace": ["org.elasticsearch.discovery.MasterNotDiscoveredException: null",
[pod/elasticsearch-master-1/elasticsearch] "at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:219) [elasticsearch-7.11.2.jar:7.11.2
]",
[pod/elasticsearch-master-1/elasticsearch] "at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:324) [elasticsearch-7.11.2.jar:7.11.2]",
[pod/elasticsearch-master-1/elasticsearch] "at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:241) [elasticsearch-7.11.2.jar:7.11.2]",
[pod/elasticsearch-master-1/elasticsearch] "at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:590) [elasticsearch-7.11.2.jar:7.11.2]",
[pod/elasticsearch-master-1/elasticsearch] "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:673) [elasticsearch-7.11.2.jar:7.11.2]",
[pod/elasticsearch-master-1/elasticsearch] "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
[pod/elasticsearch-master-1/elasticsearch] "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
[pod/elasticsearch-master-1/elasticsearch] "at java.lang.Thread.run(Thread.java:832) [?:?]"] }
[pod/elasticsearch-master-1/elasticsearch] {"type": "server", "timestamp": "2021-03-15T19:32:12,582Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elasticsearch", "nod
e.name": "elasticsearch-master-1", "message": "master not discovered or elected yet, an election requires 2 nodes with ids [5m__iT3jSQW1vwEV_xFwMQ, BkCL1vzsR7mUQRzWdrBLXg], have discovered [{elasticsearch-master
-1}{BkCL1vzsR7mUQRzWdrBLXg}{hd8g7UblT_CjTUTkEfIPsg}{10.233.84.50}{10.233.84.50:9300}{cdhilmrstw}{ml.machine_memory=2147483648, xpack.installed=true, transform.node=true, ml.max_open_jobs=20, ml.max_jvm_size=1073
741824}, {elasticsearch-master-2}{m9dArdVXSq-6GBEDR2wRIA}{TRIr99EFSjyYlvQVJXGasA}{10.233.118.54}{10.233.118.54:9300}{cdhilmrstw}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.max_jv
m_size=1073741824, transform.node=true}, {elasticsearch-master-0}{5m__iT3jSQW1vwEV_xFwMQ}{z2kJjCDeSBSuUVX6vB34cQ}{10.233.88.52}{10.233.88.52:9300}{cdhilmrstw}{ml.machine_memory=2147483648, ml.max_open_jobs=20, x
pack.installed=true, ml.max_jvm_size=1073741824, transform.node=true}] which is a quorum; discovery will continue using [10.233.88.52:9300, 10.233.118.54:9300] from hosts providers and [{elasticsearch-master-1}{
BkCL1vzsR7mUQRzWdrBLXg}{hd8g7UblT_CjTUTkEfIPsg}{10.233.84.50}{10.233.84.50:9300}{cdhilmrstw}{ml.machine_memory=2147483648, xpack.installed=true, transform.node=true, ml.max_open_jobs=20, ml.max_jvm_size=10737418
24}] from last-known cluster state; node term 1, last-accepted version 0 in term 0" }
[pod/elasticsearch-master-1/elasticsearch] {"type": "server", "timestamp": "2021-03-15T19:32:18,361Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "elasticsearch", "node.name": "elasticsearch-m
aster-1", "message": "path: /_cluster/health, params: {wait_for_status=green, timeout=1s}",
[pod/elasticsearch-master-1/elasticsearch] "stacktrace": ["org.elasticsearch.discovery.MasterNotDiscoveredException: null",
[pod/elasticsearch-master-1/elasticsearch] "at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:219) [elasticsearch-7.11.2.jar:7.11.2
]",
[pod/elasticsearch-master-1/elasticsearch] "at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:324) [elasticsearch-7.11.2.jar:7.11.2]",
[pod/elasticsearch-master-1/elasticsearch] "at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:241) [elasticsearch-7.11.2.jar:7.11.2]",
[pod/elasticsearch-master-1/elasticsearch] "at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:590) [elasticsearch-7.11.2.jar:7.11.2]",
[pod/elasticsearch-master-1/elasticsearch] "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:673) [elasticsearch-7.11.2.jar:7.11.2]",
[pod/elasticsearch-master-1/elasticsearch] "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
[pod/elasticsearch-master-1/elasticsearch] "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
[pod/elasticsearch-master-1/elasticsearch] "at java.lang.Thread.run(Thread.java:832) [?:?]"] }
[pod/elasticsearch-master-1/elasticsearch] {"type": "server", "timestamp": "2021-03-15T19:32:22,585Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elasticsearch", "nod
e.name": "elasticsearch-master-1", "message": "master not discovered or elected yet, an election requires 2 nodes with ids [5m__iT3jSQW1vwEV_xFwMQ, BkCL1vzsR7mUQRzWdrBLXg], have discovered [{elasticsearch-master
-1}{BkCL1vzsR7mUQRzWdrBLXg}{hd8g7UblT_CjTUTkEfIPsg}{10.233.84.50}{10.233.84.50:9300}{cdhilmrstw}{ml.machine_memory=2147483648, xpack.installed=true, transform.node=true, ml.max_open_jobs=20, ml.max_jvm_size=1073
741824}, {elasticsearch-master-2}{m9dArdVXSq-6GBEDR2wRIA}{TRIr99EFSjyYlvQVJXGasA}{10.233.118.54}{10.233.118.54:9300}{cdhilmrstw}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.max_jv
m_size=1073741824, transform.node=true}, {elasticsearch-master-0}{5m__iT3jSQW1vwEV_xFwMQ}{z2kJjCDeSBSuUVX6vB34cQ}{10.233.88.52}{10.233.88.52:9300}{cdhilmrstw}{ml.machine_memory=2147483648, ml.max_open_jobs=20, x
pack.installed=true, ml.max_jvm_size=1073741824, transform.node=true}] which is a quorum; discovery will continue using [10.233.88.52:9300, 10.233.118.54:9300] from hosts providers and [{elasticsearch-master-1}{
BkCL1vzsR7mUQRzWdrBLXg}{hd8g7UblT_CjTUTkEfIPsg}{10.233.84.50}{10.233.84.50:9300}{cdhilmrstw}{ml.machine_memory=2147483648, xpack.installed=true, transform.node=true, ml.max_open_jobs=20, ml.max_jvm_size=10737418
24}] from last-known cluster state; node term 1, last-accepted version 0 in term 0" }
NAME: loges
LAST DEPLOYED: Mon Mar 15 22:26:31 2021
NAMESPACE: logging
STATUS: pending-install
REVISION: 1
USER-SUPPLIED VALUES:
esConfig:
elasticsearch.yml: |
http.max_content_length: 100mb
nodeSelector:
node-role.kubernetes.io/worker: loges
COMPUTED VALUES:
antiAffinity: hard
antiAffinityTopologyKey: kubernetes.io/hostname
clusterHealthCheckParams: wait_for_status=green&timeout=1s
clusterName: elasticsearch
enableServiceLinks: true
envFrom: []
esConfig:
elasticsearch.yml: |
http.max_content_length: 100mb
esJavaOpts: -Xmx1g -Xms1g
esMajorVersion: ""
extraContainers: []
extraEnvs: []
extraInitContainers: []
extraVolumeMounts: []
extraVolumes: []
fsGroup: ""
fullnameOverride: ""
hostAliases: []
httpPort: 9200
image: docker.elastic.co/elasticsearch/elasticsearch
imagePullPolicy: IfNotPresent
imagePullSecrets: []
imageTag: 7.11.2
ingress:
annotations: {}
enabled: false
hosts:
- host: chart-example.local
paths:
- path: /
tls: []
initResources: {}
keystore: []
labels: {}
lifecycle: {}
masterService: ""
masterTerminationFix: false
maxUnavailable: 1
minimumMasterNodes: 2
nameOverride: ""
networkHost: 0.0.0.0
networkPolicy:
http:
enabled: false
transport:
enabled: false
nodeAffinity: {}
nodeGroup: master
nodeSelector:
node-role.kubernetes.io/worker: loges
persistence:
annotations: {}
enabled: true
labels:
enabled: false
podAnnotations: {}
podManagementPolicy: Parallel
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
podSecurityPolicy:
create: false
name: ""
spec:
fsGroup:
rule: RunAsAny
privileged: true
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
- emptyDir
priorityClassName: ""
protocol: http
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
replicas: 3
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 1000m
memory: 2Gi
roles:
data: "true"
ingest: "true"
master: "true"
ml: "true"
remote_cluster_client: "true"
schedulerName: ""
secretMounts: []
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
service:
annotations: {}
externalTrafficPolicy: ""
httpPortName: http
labels: {}
labelsHeadless: {}
loadBalancerIP: ""
loadBalancerSourceRanges: []
nodePort: ""
transportPortName: transport
type: ClusterIP
sidecarResources: {}
sysctlInitContainer:
enabled: true
sysctlVmMaxMapCount: 262144
terminationGracePeriod: 120
tolerations: []
transportPort: 9300
updateStrategy: RollingUpdate
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
HOOKS:
---
# Source: elasticsearch/templates/test/test-elasticsearch-health.yaml
apiVersion: v1
kind: Pod
metadata:
name: "loges-crokf-test"
annotations:
"helm.sh/hook": test
"helm.sh/hook-delete-policy": hook-succeeded
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
containers:
- name: "loges-jsnja-test"
image: "docker.elastic.co/elasticsearch/elasticsearch:7.11.2"
imagePullPolicy: "IfNotPresent"
command:
- "sh"
- "-c"
- |
#!/usr/bin/env bash -e
curl -XGET --fail 'elasticsearch-master:9200/_cluster/health?wait_for_status=green&timeout=1s'
restartPolicy: Never
MANIFEST:
---
# Source: elasticsearch/templates/poddisruptionbudget.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: "elasticsearch-master-pdb"
spec:
maxUnavailable: 1
selector:
matchLabels:
app: "elasticsearch-master"
---
# Source: elasticsearch/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: elasticsearch-master-config
labels:
heritage: "Helm"
release: "loges"
chart: "elasticsearch"
app: "elasticsearch-master"
data:
elasticsearch.yml: |
http.max_content_length: 100mb
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
name: elasticsearch-master
labels:
heritage: "Helm"
release: "loges"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
{}
spec:
type: ClusterIP
selector:
release: "loges"
chart: "elasticsearch"
app: "elasticsearch-master"
ports:
- name: http
protocol: TCP
port: 9200
- name: transport
protocol: TCP
port: 9300
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
name: elasticsearch-master-headless
labels:
heritage: "Helm"
release: "loges"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
spec:
clusterIP: None # This is needed for statefulset hostnames like elasticsearch-0 to resolve
# Create endpoints also if the related pod isn't ready
publishNotReadyAddresses: true
selector:
app: "elasticsearch-master"
ports:
- name: http
port: 9200
- name: transport
port: 9300
---
# Source: elasticsearch/templates/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-master
labels:
heritage: "Helm"
release: "loges"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
esMajorVersion: "7"
spec:
serviceName: elasticsearch-master-headless
selector:
matchLabels:
app: "elasticsearch-master"
replicas: 3
podManagementPolicy: Parallel
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: elasticsearch-master
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
template:
metadata:
name: "elasticsearch-master"
labels:
release: "loges"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
configchecksum: bdba5f114d54e34bd12497f2b56ea101d6bc722173fd0b30c4cac356cdf3fe8
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
nodeSelector:
node-role.kubernetes.io/worker: loges
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- "elasticsearch-master"
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 120
volumes:
- name: esconfig
configMap:
name: elasticsearch-master-config
enableServiceLinks: true
initContainers:
- name: configure-sysctl
securityContext:
runAsUser: 0
privileged: true
image: "docker.elastic.co/elasticsearch/elasticsearch:7.11.2"
imagePullPolicy: "IfNotPresent"
command: ["sysctl", "-w", "vm.max_map_count=262144"]
resources:
{}
containers:
- name: "elasticsearch"
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
image: "docker.elastic.co/elasticsearch/elasticsearch:7.11.2"
imagePullPolicy: "IfNotPresent"
readinessProbe:
exec:
command:
- sh
- -c
- |
#!/usr/bin/env bash -e
# If the node is starting up wait for the cluster to be ready (request params: "wait_for_status=green&timeout=1s" )
# Once it has started only check that the node itself is responding
START_FILE=/tmp/.es_start_file
# Disable nss cache to avoid filling dentry cache when calling curl
# This is required with Elasticsearch Docker using nss < 3.52
export NSS_SDB_USE_CACHE=no
http () {
local path="${1}"
local args="${2}"
set -- -XGET -s
if [ "$args" != "" ]; then
set -- "$@" $args
fi
if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
set -- "$@" -u "${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
fi
curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}"
}
if [ -f "${START_FILE}" ]; then
echo 'Elasticsearch is already running, lets check the node is healthy'
HTTP_CODE=$(http "/" "-w %{http_code}")
RC=$?
if [[ ${RC} -ne 0 ]]; then
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
exit ${RC}
fi
# ready if HTTP code 200, 503 is tolerable if ES version is 6.x
if [[ ${HTTP_CODE} == "200" ]]; then
exit 0
elif [[ ${HTTP_CODE} == "503" && "7" == "6" ]]; then
exit 0
else
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
exit 1
fi
else
echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
touch ${START_FILE}
exit 0
else
echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
exit 1
fi
fi
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
ports:
- name: http
containerPort: 9200
- name: transport
containerPort: 9300
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 1000m
memory: 2Gi
env:
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: cluster.initial_master_nodes
value: "elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2,"
- name: discovery.seed_hosts
value: "elasticsearch-master-headless"
- name: cluster.name
value: "elasticsearch"
- name: network.host
value: "0.0.0.0"
- name: ES_JAVA_OPTS
value: "-Xmx1g -Xms1g"
- name: node.data
value: "true"
- name: node.ingest
value: "true"
- name: node.master
value: "true"
- name: node.ml
value: "true"
- name: node.remote_cluster_client
value: "true"
volumeMounts:
- name: "elasticsearch-master"
mountPath: /usr/share/elasticsearch/data
- name: esconfig
mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
subPath: elasticsearch.yml
NOTES:
1. Watch all cluster members come up.
$ kubectl get pods --namespace=logging -l app=elasticsearch-master -w
2. Test cluster health using Helm test.
$ helm test loges
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment