ashwanthkumar / matsya.conf

Last active January 21, 2016 12:53

Sample Matysa Configuration

	matsya {
	# Path on the local FS to write the Cluster states and Time series data we create
	# ${working-dir}/state and ${working-dir}/history respectively for the same
	working-dir = "local_run"

	clusters = [{
	# Unique Identifier - Please don't change once assigned
	name = "Staging Hadoop Cluster"

	# (Mandatory) AutoScaling group backing the spot machines

ashwanthkumar / pipeline.xml

Created December 22, 2015 17:11

Sample pipeline configuration for running GoCD Janitor (https://github.com/ashwanthkumar/gocd-janitor)

	<pipeline name="gocd-janitor" labeltemplate="cleanup-${COUNT}" isLocked="false">
	<timer>0 30 5 ? * MON</timer>
	<stage name="cleanup">
	<approval type="manual" />
	<jobs>
	<job name="cleanup">
	<tasks>
	<exec command="wget">
	<arg>https://github.com/ashwanthkumar/gocd-janitor/releases/download/v0.0.1/gocd-janitor-0.0.1-jar-with-dependencies.jar</arg>
	</exec>

ashwanthkumar / hadoop_hosts_diff.sh

Last active September 25, 2015 02:32

Useful to find the missing host, when you've 100s of machines on AWS running TT and DN process and their dns names are autogenerated

	# This assumes that jq is installed on your machine

	JT_MACHINE="jt-host"
	NN_MACHINE="nn-host"

	# Get all the TTs
	curl "http://${JT_MACHINE}:50030/jmx?qry=hadoop:service=JobTracker,name=JobTrackerInfo" \| jq -r .beans[].AliveNodesInfoJson \| jq -r .[].hostname \| sort > tasktrackers
	# Get all the DNs
	curl "http://${NN_MACHINE}:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo" \| jq -r .beans[].LiveNodes \| jq -r 'keys \| .[]' \| sort > datanodes

ashwanthkumar / gist:f04b1e45aaf880b6d363

Last active August 7, 2016 23:28 — forked from anonymous/gist:3805751

The Ying-Yang of CEOs and Engineers

All successful startups begin with two founders. One guy is the Engineer, and the other guy is the business dude. Over the years of working with various people, I've learned what makes a good engineer, and what makes a good business dude, and the two are complete opposites of each other.

CEO	Engineer

ashwanthkumar / FixedPointJob.scala

Created June 12, 2014 16:53

User Identifier Normalization from Big Data book by Nathan Marz implementated in Scalding.

	import com.twitter.scalding.{Tsv, Job, Args}
	import scala.collection.immutable.TreeSet

	/*
	Lets assume we are reading a file of format (a,b) where a,b denote that node a and node b are connected in a graph.
	For simplicity we will assume that a and b are ints. We want to find the mapping of all the nodes on a fixed point.
	*/
	class FixedPointJob(args: Args) extends Job(args) {
	val input = args("input")
	val outputBaseDir = args("output-base-dir")

ashwanthkumar / InstallCert.java

Last active August 29, 2015 14:00

Adding the mv command to move the generate 'jssecacerts' file to right location

	/*
	* Copyright 2006 Sun Microsystems, Inc. All Rights Reserved.
	*
	* Redistribution and use in source and binary forms, with or without
	* modification, are permitted provided that the following conditions
	* are met:
	*
	* - Redistributions of source code must retain the above copyright
	* notice, this list of conditions and the following disclaimer.
	*

ashwanthkumar / DefaultValuesInThrift.thrift

Last active August 29, 2015 13:56

	struct Vehicle {
	1: string name
	2: string model
	}

	struct Person {
	1: string name
	2: i32 age
	// Empty collections can be specified using []. Same works for map and set as well.
	3: list<Vehicle> cars = []

ashwanthkumar / MetaInfo.rb

Created March 14, 2013 05:34

HRegionInfo was null or empty in .META.

	include 'Java'
	...

	admin = HBaseAdmin.new(HBaseConfiguration.create)
	table_regions = admin.getTableRegions(Bytes.toBytes("test_table"))

	table_regions.each do \|table_region\|
	puts "#{table_region.getRegionNameAsString())}"
	}

ashwanthkumar / HFileInputFormat.java

Created March 11, 2013 11:55

A MapReduce InputFormat for HBase's HFile. - Tested on HBase 0.94.2 and Hadoop 1.0.1

	import org.apache.hadoop.fs.FileSystem;
	import org.apache.hadoop.fs.Path;
	import org.apache.hadoop.hbase.KeyValue;
	import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
	import org.apache.hadoop.hbase.io.hfile.CacheConfig;
	import org.apache.hadoop.hbase.io.hfile.HFile;
	import org.apache.hadoop.hbase.io.hfile.HFileScanner;
	import org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics;
	import org.apache.hadoop.mapreduce.InputSplit;
	import org.apache.hadoop.mapreduce.RecordReader;

ashwanthkumar / HFileInputFormat.scala

Last active December 14, 2015 18:59 — forked from coltfred/HFileInputFormat.scala

Updating the HFile Reader init for HBase 0.94.2 - Fixing the null CacheConfig

	import org.apache.hadoop.fs.Path
	import org.apache.hadoop.hbase.io.hfile.{ HFile, HFileScanner, CacheConfig }
	import org.apache.hadoop.hbase.io.hfile.HFile.Reader
	import org.apache.hadoop.hbase.io.ImmutableBytesWritable
	import org.apache.hadoop.hbase.KeyValue
	import org.apache.hadoop.mapreduce.{ JobContext, InputSplit, TaskAttemptContext, RecordReader }
	import org.apache.hadoop.mapreduce.lib.input.{ FileInputFormat, FileSplit }
	import org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics

	/**

Ashwanth Kumar ashwanthkumar

The Ying-Yang of CEOs and Engineers