jeromatron’s gists

jeromatron / gist:d73403094cf63621edf5

Last active August 29, 2015 14:12

	#!/bin/sh
	# A shell script you can give to customers to check if all their repair sessions are complete.
	# Change SYSTEM_LOG_PATH to the directory containing system.log
	if [ "x$SYSTEM_LOG_PATH" = "x" ]; then
	SYSTEM_LOG_PATH=.
	fi

	LOG_FILE_PATTERN=.log
	NEW_SESSION_PATH=/tmp/new-session
	COMPLETED_SESSION_PATH=/tmp/completed-successfully

jeromatron / gist:6436422

Last active December 22, 2015 07:09

	= Using Cassandra for large data sets (lots of data per node) =

	This page aims to to give some advice as to the issues one may need to consider when using Cassandra for large data sets (meaning hundreds of gigabytes or terabytes per node). The intent is not to make original claims, but to collect in one place some issues that are operationally relevant. Other parts of the wiki are highly recommended in order to fully understand the issues involved.

	This is a work in progress. If you find information out of date (e.g., a JIRA ticket referenced has been resolved but this document has not been updated), please help by editing or e-mailing cassandra-user.

	Note that not all of these issues are specific to Cassandra. For example, any storage system is subject to the trade-offs of cache sizes relative to active set size, and IOPS will always be strongly correlated with the percentage of requests that penetrate caching layers. Also of note, the more data stored per node, the more data will have to be streamed in

jeromatron / notes on oozie setup

Created August 12, 2013 10:59

	wget http://mirror.ox.ac.uk/sites/rsync.apache.org/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
	sudo tar xzf apache-maven-3.0.5-bin.tar.gz -C /usr/local
	cd /usr/local
	sudo ln -s apache-maven-3.0.5 maven

	mkdir libext
	cd libext
	wget http://extjs.com/deploy/ext-2.2.zip

	export DSE_LIB=/usr/share/dse

jeromatron / Data modelling

Created August 7, 2013 14:19

	Part 1: The Data Model is Dead, Long Live the Data Model:
	- http://www.youtube.com/watch?v=px6U2n74q3g
	- http://www.slideshare.net/patrickmcfadin/the-data-model-is-dead-long-live-the-data-model

	Part 2: Become a Super Modeler:
	- http://www.youtube.com/watch?v=qphhxujn5Es
	- http://www.slideshare.net/patrickmcfadin/become-a-super-modeler

	Part 3: The World's Next Top Data Model
	- http://www.youtube.com/watch?v=HdJlsOZVGwM

jeromatron / SimpleClient.java

Created May 30, 2013 17:26

Basic encode/decode of composite values with DataStax Java driver

	package org.mostlyharmless;

	import com.datastax.driver.core.*;
	import org.apache.cassandra.db.marshal.*;

	import java.util.ArrayList;
	import java.util.List;

	public class SimpleClient {

jeromatron / gist:5372552

Created April 12, 2013 14:47

	Class Name \| Shallow Heap \| Retained Heap \| Percentage
	-----------------------------------------------------------------------------------------
	java.lang.Object[340186] @ 0x639a368b0 \| 1,360,760 \| 7,754,588,120 \| 98.24%
	\|- org.apache.cassandra.db.Row @ 0x5fae884a8 \| 24 \| 34,704 \| 0.00%
	\|- org.apache.cassandra.db.Row @ 0x625f9eb88 \| 24 \| 34,704 \| 0.00%
	\|- org.apache.cassandra.db.Row @ 0x605d82328 \| 24 \| 34,704 \| 0.00%
	\|- org.apache.cassandra.db.Row @ 0x6888426f0 \| 24 \| 34,704 \| 0.00%
	\|- org.apache.cassandra.db.Row @ 0x689d73a00 \| 24 \| 34,704 \| 0.00%
	\|- org.apache.cassandra.db.Row @ 0x68c49c360 \| 24 \| 34,704 \| 0.00%
	\|- org.apache.cassandra.db.Row @ 0x68d5ff600 \| 24 \| 34,704 \| 0.00%

jeromatron / cassandra-for-cdh4.java

Created February 16, 2013 17:23

	//
	// Old Hadoop API
	//
	public org.apache.hadoop.mapred.InputSplit[] getSplits(JobConf jobConf, int numSplits) throws IOException
	{
	//TaskAttemptContext tac = new TaskAttemptContext(jobConf, new TaskAttemptID());
	TaskAttemptContext tac = new org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl(jobConf, new TaskAttemptID());
	List<org.apache.hadoop.mapreduce.InputSplit> newInputSplits = this.getSplits(tac);

	org.apache.hadoop.mapred.InputSplit[] oldInputSplits = new org.apache.hadoop.mapred.InputSplit[newInputSplits.size()];

jeromatron / gist:1128625

Created August 5, 2011 21:57

jmx/jconsole shell function

	# jmx/jconsole tunneling shell function - add to .profile
	#
	# usage: jmx [keypair] [remote host] [remote port]
	# example: jmx /keys/keypair.pem [email protected] 8080

	function jmx() {
	keypair=$1
	remote_host=$2
	remote_port=$3
	proxy_port=10999

jeromatron / gist:1008398

Created June 4, 2011 21:51

	<property>
	<name>hadoop.proxyuser.oozie.hosts</name>
	<value>*</value>
	</property>
	<property>
	<name>hadoop.proxyuser.oozie.groups</name>
	<value>*</value>
	</property>

jeromatron / gist:893065

Created March 29, 2011 19:36

	2011-03-29 18:07:39,968 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201103080244_1300 for user-log deletion with retainTimeStamp:1301508459323
	2011-03-29 18:07:39,968 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201103080244_1277 for user-log deletion with retainTimeStamp:1301508459323
	2011-03-29 18:07:39,968 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201103080244_1144 for user-log deletion with retainTimeStamp:1301508459323
	2011-03-29 18:07:39,968 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201103080244_2777 for user-log deletion with retainTimeStamp:1301508459323
	2011-03-29 18:07:39,969 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201103080244_1298 for user-log deletion with retainTimeStamp:1301508459323
	2011-03-29 18:07:39,969 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201103080244_2771 for user-log deletion with retainTimeStamp:1301508459323
	2011-03-29 18:07:39,969 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_2011030

	Class Name \| Shallow Heap \| Retained Heap \| Percentage
	-----------------------------------------------------------------------------------------
	java.lang.Object[340186] @ 0x639a368b0 \| 1,360,760 \| 7,754,588,120 \| 98.24%
	\|- org.apache.cassandra.db.Row @ 0x5fae884a8 \| 24 \| 34,704 \| 0.00%
	\|- org.apache.cassandra.db.Row @ 0x625f9eb88 \| 24 \| 34,704 \| 0.00%
	\|- org.apache.cassandra.db.Row @ 0x605d82328 \| 24 \| 34,704 \| 0.00%
	\|- org.apache.cassandra.db.Row @ 0x6888426f0 \| 24 \| 34,704 \| 0.00%
	\|- org.apache.cassandra.db.Row @ 0x689d73a00 \| 24 \| 34,704 \| 0.00%
	\|- org.apache.cassandra.db.Row @ 0x68c49c360 \| 24 \| 34,704 \| 0.00%
	\|- org.apache.cassandra.db.Row @ 0x68d5ff600 \| 24 \| 34,704 \| 0.00%

Jeremy Hanna jeromatron