Skip to content

Instantly share code, notes, and snippets.

@apple-corps
apple-corps / gist:fe2723948171886310bf
Created January 20, 2015 02:14
hadoop standby namenode log snippet
015-01-19 18:55:46,845 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Edits file http://us3sm2zk012r09.comp.prod.local:8480/getJournal?jid=whprod&segmentTxId=50626862&storageInfo=-55%3A977585766%3A0%3ACID-1ccb02a5-bfd7-4808-a925-8e9804d40ec4, http://us3sm2zk010r07.comp.prod.local:8480/getJournal?jid=whprod&segmentTxId=50626862&storageInfo=-55%3A977585766%3A0%3ACID-1ccb02a5-bfd7-4808-a925-8e9804d40ec4 of size 113416 edits # 711 loaded in 0 seconds
2015-01-19 18:55:47,323 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Loaded 711 edits starting from txid 50626861
2015-01-19 18:57:17,662 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Get corrupt file blocks returned error: Operation category READ is not supported in state standby
2015-01-19 18:57:19,705 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Get corrupt file blocks returned error: Operation category READ is not supported in state standby
2015-01-19 18:57:21,275 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesy
@apple-corps
apple-corps / gist:002f6629d5db24d28592
Created January 20, 2015 19:30
hadoop standby namenode stuck getting block information
2015-01-20 12:20:43,444 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Edits file http://us3sm2zk011r08.comp.prod.local:8480/getJournal?jid=whprod&segmentTxId=50815793&storageInfo=-55%3A977585766%3A0%3ACID-1ccb02a5-bfd7-4808-a925-8e9804d40ec4, http://us3sm2zk010r07.comp.prod.local:8480/getJournal?jid=whprod&segmentTxId=50815793&storageInfo=-55%3A977585766%3A0%3ACID-1ccb02a5-bfd7-4808-a925-8e9804d40ec4 of size 100746 edits # 569 loaded in 0 seconds
2015-01-20 12:20:43,964 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Loaded 569 edits starting from txid 50815792
2015-01-20 12:22:44,496 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode us3sm2nn011r08.comp.prod.local/10.51.28.141:8020
2015-01-20 12:22:44,619 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4eba7f05 expecting start txid #50816362
2015-01-20 12:22:44,619 INFO org.apache.hadoop.hdfs.server.namenode.
@apple-corps
apple-corps / gist:61b6df698e48386f12f8
Created February 12, 2015 01:37
Hadoop 2.3.0-cdh5.0.1 balancer stack trace
sudo -u hdfs hdfs balancer
15/02/11 18:33:03 INFO balancer.Balancer: namenodes = [hdfs://whqa/, hdfs://whqa]
15/02/11 18:33:03 INFO balancer.Balancer: p = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0]
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
15/02/11 18:33:04 INFO net.NetworkTopology: Adding a new node: /9/10.51.28.200:50010
15/02/11 18:33:04 INFO net.NetworkTopology: Adding a new node: /7/10.51.28.201:50010
15/02/11 18:33:04 INFO net.NetworkTopology: Adding a new node: /9/10.51.28.203:50010
15/02/11 18:33:04 INFO net.NetworkTopology: Adding a new node: /8/10.51.28.202:50010
15/02/11 18:33:04 INFO balancer.Balancer: 0 over-utilized: []
15/02/11 18:33:04 INFO balancer.Balancer: 0 underutilized: []
@apple-corps
apple-corps / gist:ee5fac1cd4bb610ca991
Last active August 29, 2015 14:18
elasticsearch 1.4.1 cluster timeouts
tail -f es_cluster.log
[2015-04-10 18:29:52,784][WARN ][transport ] [us3sm2zk012r09] Received response for a request that has timed out, sent [260248ms] ago, timed out [245248ms] ago, action [cluster:monitor/nodes/stats[n]], node [[densm2es002][OM4RguhkTb2Ai9BtHYI7TA][densm2es002.prod.local][inet[/10.51.29.2:9300]]{rack=0, group=2, master=false}], id [277821587]
[2015-04-10 18:29:53,001][WARN ][transport ] [us3sm2zk012r09] Received response for a request that has timed out, sent [230465ms] ago, timed out [215465ms] ago, action [cluster:monitor/nodes/stats[n]], node [[densm2es002][OM4RguhkTb2Ai9BtHYI7TA][densm2es002.prod.local][inet[/10.51.29.2:9300]]{rack=0, group=2, master=false}], id [277831180]
[2015-04-10 18:29:53,219][WARN ][transport ] [us3sm2zk012r09] Received response for a request that has timed out, sent [200683ms] ago, timed out [185683ms] ago, action [cluster:monitor/nodes/stats[n]], node [[densm2es002][OM4RguhkTb2Ai9BtHYI7TA][densm2es002.prod.local][in
@apple-corps
apple-corps / master.log
Created May 21, 2015 16:09
Entire HBASE cluster goes down due to Zookeeper expiry
2015-05-20 06:43:54,808 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 38 catalog row(s) and gc'd 0 unreferenced parent region(s)
2015-05-20 06:48:54,775 DEBUG org.apache.hadoop.hbase.client.HTable$ClientScanner: Creating scanner over .META. starting at key ''
2015-05-20 06:48:54,775 DEBUG org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing internal scanner to startKey at ''
2015-05-20 06:48:54,816 DEBUG org.apache.hadoop.hbase.client.HTable$ClientScanner: Finished with scanning at {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192,}
2015-05-20 06:48:54,817 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 38 catalog row(s) and gc'd 0 unreferenced parent region(s)
2015-05-20 06:53:54,776 DEBUG org.apache.hadoop.hbase.client.HTable$ClientScanner: Creating scanner over .META. starting at key ''
2015-05-20 06:53:54,776 DEBUG org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing internal scanner to startKey at ''
2015-05-20 06:53:54,809 DEBUG org
@apple-corps
apple-corps / gist:3e4e8d00b68cd51011dd
Created July 24, 2015 20:08
ecryptfs encrypted home broken ubuntu 14.04
sudo adduser --home /home/colin --ingroup adm --encrypt-home colin
Adding user `colin' ...
Adding new user `colin' (1001) with group `adm' ...
Creating home directory `/home/colin' ...
Setting up encryption ...
************************************************************************
YOU SHOULD RECORD YOUR MOUNT PASSPHRASE AND STORE IT IN A SAFE LOCATION.
ecryptfs-unwrap-passphrase ~/.ecryptfs/wrapped-passphrase
THIS WILL BE REQUIRED IF YOU NEED TO RECOVER YOUR DATA AT A LATER TIME.
@apple-corps
apple-corps / gist:1438eead63651112dcdc
Created August 21, 2015 18:16
network interfaces
coreos-test ~ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether d4:ae:52:67:58:4b brd ff:ff:ff:ff:ff:ff
inet 10.51.31.240/22 brd 10.51.31.255 scope global dynamic eno1
{
"persistent": {
"action": {
"destructive_requires_name": "true"
},
"indices": {
"store": {
"throttle": {
"max_bytes_per_sec": "60mb"
}
@apple-corps
apple-corps / gist:9741e847ad7dd0c7b16d
Created October 15, 2015 22:26
etcd2 keeping state, has hostname not defined as an option.
core@coreos003 ~ $ sudo rm -rf /var/lib/etcd/*
core@coreos003 ~ $ sudo rm -rf /var/lib/etcd2/*
core@coreos003 ~ $ sudo systemctl stop etcd2
core@coreos003 ~ $ sudo systemctl disable etcd2
core@coreos003 ~ $ sudo systemctl stop etcd
core@coreos003 ~ $ sudo systemctl disable etcd
etcd2 -name coreos002 -initial-advertise-peer-urls http://10.5.29.211:2380 -listen-peer-urls http://10.5.29.211:2380 -listen-client-urls http://10.5.29.211:2379,http://127.0.0.1:2379 -advertise-client-urls http://10.5.29.211:2379 -initial-cluster-token etcd-core-42 -initial-cluster coreos002=http://10.5.29.211:2380,coreos003=http://10.5.29.218:2380,coreos004=http://10.5.29.220:2380 -initial-cluster-state new
etcd2 -name coreos003 -initial-advertise-peer-urls http://10.5.29.218:2380 -listen-peer-urls http://10.5.29.218:2380 -listen-client-urls http://10.5.29.218:2379,http://127.0.0.1:2379 -advertise-client-urls http://10.5.29.218:2379 -initial-cluster-token etcd-core-42 -initial-cluster coreos002=http://10.5
@apple-corps
apple-corps / gist:b0da92eb313b1bf71912
Last active January 17, 2016 20:48
Running out of memory locally launching multiple spark jobs using spark yarn / submit from shell.
I launch around 30-60 of these jobs defined like start-job.sh in the background from a wrapper script. I wait about 30 seconds between launches, then the wrapper monitors yarn to determine when to launch more. There is a limit defined at around 60 jobs, but even if I set it to 30, I run out of memory on the host submitting the jobs. Why does my approach to using spark-submit cause me to run out of memory. I have about 6G free, and I don't feel like I should be running out of memory when submitting jobs.
start-job.sh
export HADOOP_CONF_DIR=/etc/hadoop/conf
spark-submit \
--class sap.whcounter.WarehouseCounter \
--master yarn-cluster \
--num-executors 1 \
--driver-memory 1024m \