apple-corps’s gists

apple-corps / gist:fe2723948171886310bf

Created January 20, 2015 02:14

hadoop standby namenode log snippet

	015-01-19 18:55:46,845 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Edits file http://us3sm2zk012r09.comp.prod.local:8480/getJournal?jid=whprod&segmentTxId=50626862&storageInfo=-55%3A977585766%3A0%3ACID-1ccb02a5-bfd7-4808-a925-8e9804d40ec4, http://us3sm2zk010r07.comp.prod.local:8480/getJournal?jid=whprod&segmentTxId=50626862&storageInfo=-55%3A977585766%3A0%3ACID-1ccb02a5-bfd7-4808-a925-8e9804d40ec4 of size 113416 edits # 711 loaded in 0 seconds
	2015-01-19 18:55:47,323 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Loaded 711 edits starting from txid 50626861
	2015-01-19 18:57:17,662 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Get corrupt file blocks returned error: Operation category READ is not supported in state standby
	2015-01-19 18:57:19,705 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Get corrupt file blocks returned error: Operation category READ is not supported in state standby
	2015-01-19 18:57:21,275 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesy

apple-corps / gist:002f6629d5db24d28592

Created January 20, 2015 19:30

hadoop standby namenode stuck getting block information

	2015-01-20 12:20:43,444 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Edits file http://us3sm2zk011r08.comp.prod.local:8480/getJournal?jid=whprod&segmentTxId=50815793&storageInfo=-55%3A977585766%3A0%3ACID-1ccb02a5-bfd7-4808-a925-8e9804d40ec4, http://us3sm2zk010r07.comp.prod.local:8480/getJournal?jid=whprod&segmentTxId=50815793&storageInfo=-55%3A977585766%3A0%3ACID-1ccb02a5-bfd7-4808-a925-8e9804d40ec4 of size 100746 edits # 569 loaded in 0 seconds
	2015-01-20 12:20:43,964 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Loaded 569 edits starting from txid 50815792
	2015-01-20 12:22:44,496 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode us3sm2nn011r08.comp.prod.local/10.51.28.141:8020
	2015-01-20 12:22:44,619 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4eba7f05 expecting start txid #50816362
	2015-01-20 12:22:44,619 INFO org.apache.hadoop.hdfs.server.namenode.

apple-corps / gist:61b6df698e48386f12f8

Created February 12, 2015 01:37

Hadoop 2.3.0-cdh5.0.1 balancer stack trace

	sudo -u hdfs hdfs balancer
	15/02/11 18:33:03 INFO balancer.Balancer: namenodes = [hdfs://whqa/, hdfs://whqa]
	15/02/11 18:33:03 INFO balancer.Balancer: p = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0]
	Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
	15/02/11 18:33:04 INFO net.NetworkTopology: Adding a new node: /9/10.51.28.200:50010
	15/02/11 18:33:04 INFO net.NetworkTopology: Adding a new node: /7/10.51.28.201:50010
	15/02/11 18:33:04 INFO net.NetworkTopology: Adding a new node: /9/10.51.28.203:50010
	15/02/11 18:33:04 INFO net.NetworkTopology: Adding a new node: /8/10.51.28.202:50010
	15/02/11 18:33:04 INFO balancer.Balancer: 0 over-utilized: []
	15/02/11 18:33:04 INFO balancer.Balancer: 0 underutilized: []

apple-corps / gist:ee5fac1cd4bb610ca991

Last active August 29, 2015 14:18

elasticsearch 1.4.1 cluster timeouts

	tail -f es_cluster.log
	[2015-04-10 18:29:52,784][WARN ][transport ] [us3sm2zk012r09] Received response for a request that has timed out, sent [260248ms] ago, timed out [245248ms] ago, action [cluster:monitor/nodes/stats[n]], node [[densm2es002][OM4RguhkTb2Ai9BtHYI7TA][densm2es002.prod.local][inet[/10.51.29.2:9300]]{rack=0, group=2, master=false}], id [277821587]
	[2015-04-10 18:29:53,001][WARN ][transport ] [us3sm2zk012r09] Received response for a request that has timed out, sent [230465ms] ago, timed out [215465ms] ago, action [cluster:monitor/nodes/stats[n]], node [[densm2es002][OM4RguhkTb2Ai9BtHYI7TA][densm2es002.prod.local][inet[/10.51.29.2:9300]]{rack=0, group=2, master=false}], id [277831180]
	[2015-04-10 18:29:53,219][WARN ][transport ] [us3sm2zk012r09] Received response for a request that has timed out, sent [200683ms] ago, timed out [185683ms] ago, action [cluster:monitor/nodes/stats[n]], node [[densm2es002][OM4RguhkTb2Ai9BtHYI7TA][densm2es002.prod.local][in

apple-corps / master.log

Created May 21, 2015 16:09

Entire HBASE cluster goes down due to Zookeeper expiry

	2015-05-20 06:43:54,808 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 38 catalog row(s) and gc'd 0 unreferenced parent region(s)
	2015-05-20 06:48:54,775 DEBUG org.apache.hadoop.hbase.client.HTable$ClientScanner: Creating scanner over .META. starting at key ''
	2015-05-20 06:48:54,775 DEBUG org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing internal scanner to startKey at ''
	2015-05-20 06:48:54,816 DEBUG org.apache.hadoop.hbase.client.HTable$ClientScanner: Finished with scanning at {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192,}
	2015-05-20 06:48:54,817 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 38 catalog row(s) and gc'd 0 unreferenced parent region(s)
	2015-05-20 06:53:54,776 DEBUG org.apache.hadoop.hbase.client.HTable$ClientScanner: Creating scanner over .META. starting at key ''
	2015-05-20 06:53:54,776 DEBUG org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing internal scanner to startKey at ''
	2015-05-20 06:53:54,809 DEBUG org

apple-corps / gist:3e4e8d00b68cd51011dd

Created July 24, 2015 20:08

ecryptfs encrypted home broken ubuntu 14.04

	sudo adduser --home /home/colin --ingroup adm --encrypt-home colin
	Adding user `colin' ...
	Adding new user `colin' (1001) with group `adm' ...
	Creating home directory `/home/colin' ...
	Setting up encryption ...

	************************************************************************
	YOU SHOULD RECORD YOUR MOUNT PASSPHRASE AND STORE IT IN A SAFE LOCATION.
	ecryptfs-unwrap-passphrase ~/.ecryptfs/wrapped-passphrase
	THIS WILL BE REQUIRED IF YOU NEED TO RECOVER YOUR DATA AT A LATER TIME.

apple-corps / gist:1438eead63651112dcdc

Created August 21, 2015 18:16

network interfaces

	coreos-test ~ # ip addr
	1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host
	valid_lft forever preferred_lft forever
	2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
	link/ether d4:ae:52:67:58:4b brd ff:ff:ff:ff:ff:ff
	inet 10.51.31.240/22 brd 10.51.31.255 scope global dynamic eno1

apple-corps / gist:04fe63f4bb7a5c5a24bf

Created August 28, 2015 14:23

elasticsearch.yml

	{
	"persistent": {
	"action": {
	"destructive_requires_name": "true"
	},
	"indices": {
	"store": {
	"throttle": {
	"max_bytes_per_sec": "60mb"
	}

apple-corps / gist:9741e847ad7dd0c7b16d

Created October 15, 2015 22:26

etcd2 keeping state, has hostname not defined as an option.

	core@coreos003 ~ $ sudo rm -rf /var/lib/etcd/*
	core@coreos003 ~ $ sudo rm -rf /var/lib/etcd2/*
	core@coreos003 ~ $ sudo systemctl stop etcd2
	core@coreos003 ~ $ sudo systemctl disable etcd2
	core@coreos003 ~ $ sudo systemctl stop etcd
	core@coreos003 ~ $ sudo systemctl disable etcd

	etcd2 -name coreos002 -initial-advertise-peer-urls http://10.5.29.211:2380 -listen-peer-urls http://10.5.29.211:2380 -listen-client-urls http://10.5.29.211:2379,http://127.0.0.1:2379 -advertise-client-urls http://10.5.29.211:2379 -initial-cluster-token etcd-core-42 -initial-cluster coreos002=http://10.5.29.211:2380,coreos003=http://10.5.29.218:2380,coreos004=http://10.5.29.220:2380 -initial-cluster-state new
	etcd2 -name coreos003 -initial-advertise-peer-urls http://10.5.29.218:2380 -listen-peer-urls http://10.5.29.218:2380 -listen-client-urls http://10.5.29.218:2379,http://127.0.0.1:2379 -advertise-client-urls http://10.5.29.218:2379 -initial-cluster-token etcd-core-42 -initial-cluster coreos002=http://10.5

apple-corps / gist:b0da92eb313b1bf71912

Last active January 17, 2016 20:48

Running out of memory locally launching multiple spark jobs using spark yarn / submit from shell.

	I launch around 30-60 of these jobs defined like start-job.sh in the background from a wrapper script. I wait about 30 seconds between launches, then the wrapper monitors yarn to determine when to launch more. There is a limit defined at around 60 jobs, but even if I set it to 30, I run out of memory on the host submitting the jobs. Why does my approach to using spark-submit cause me to run out of memory. I have about 6G free, and I don't feel like I should be running out of memory when submitting jobs.

	start-job.sh

	export HADOOP_CONF_DIR=/etc/hadoop/conf
	spark-submit \
	--class sap.whcounter.WarehouseCounter \
	--master yarn-cluster \
	--num-executors 1 \
	--driver-memory 1024m \