Skip to content

Instantly share code, notes, and snippets.

View ashwanthkumar's full-sized avatar

Ashwanth Kumar ashwanthkumar

View GitHub Profile
@ashwanthkumar
ashwanthkumar / matsya.conf
Last active January 21, 2016 12:53
Sample Matysa Configuration
matsya {
# Path on the local FS to write the Cluster states and Time series data we create
# ${working-dir}/state and ${working-dir}/history respectively for the same
working-dir = "local_run"
clusters = [{
# Unique Identifier - Please don't change once assigned
name = "Staging Hadoop Cluster"
# (Mandatory) AutoScaling group backing the spot machines
@ashwanthkumar
ashwanthkumar / pipeline.xml
Created December 22, 2015 17:11
Sample pipeline configuration for running GoCD Janitor (https://github.com/ashwanthkumar/gocd-janitor)
<pipeline name="gocd-janitor" labeltemplate="cleanup-${COUNT}" isLocked="false">
<timer>0 30 5 ? * MON</timer>
<stage name="cleanup">
<approval type="manual" />
<jobs>
<job name="cleanup">
<tasks>
<exec command="wget">
<arg>https://github.com/ashwanthkumar/gocd-janitor/releases/download/v0.0.1/gocd-janitor-0.0.1-jar-with-dependencies.jar</arg>
</exec>
@ashwanthkumar
ashwanthkumar / hadoop_hosts_diff.sh
Last active September 25, 2015 02:32
Useful to find the missing host, when you've 100s of machines on AWS running TT and DN process and their dns names are autogenerated
# This assumes that jq is installed on your machine
JT_MACHINE="jt-host"
NN_MACHINE="nn-host"
# Get all the TTs
curl "http://${JT_MACHINE}:50030/jmx?qry=hadoop:service=JobTracker,name=JobTrackerInfo" | jq -r .beans[].AliveNodesInfoJson | jq -r .[].hostname | sort > tasktrackers
# Get all the DNs
curl "http://${NN_MACHINE}:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo" | jq -r .beans[].LiveNodes | jq -r 'keys | .[]' | sort > datanodes

The Ying-Yang of CEOs and Engineers

All successful startups begin with two founders. One guy is the Engineer, and the other guy is the business dude. Over the years of working with various people, I've learned what makes a good engineer, and what makes a good business dude, and the two are complete opposites of each other.

CEO Engineer
@ashwanthkumar
ashwanthkumar / FixedPointJob.scala
Created June 12, 2014 16:53
User Identifier Normalization from Big Data book by Nathan Marz implementated in Scalding.
import com.twitter.scalding.{Tsv, Job, Args}
import scala.collection.immutable.TreeSet
/*
Lets assume we are reading a file of format (a,b) where a,b denote that node a and node b are connected in a graph.
For simplicity we will assume that a and b are ints. We want to find the mapping of all the nodes on a fixed point.
*/
class FixedPointJob(args: Args) extends Job(args) {
val input = args("input")
val outputBaseDir = args("output-base-dir")
@ashwanthkumar
ashwanthkumar / InstallCert.java
Last active August 29, 2015 14:00
Adding the mv command to move the generate 'jssecacerts' file to right location
/*
* Copyright 2006 Sun Microsystems, Inc. All Rights Reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* - Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
struct Vehicle {
1: string name
2: string model
}
struct Person {
1: string name
2: i32 age
// Empty collections can be specified using []. Same works for map and set as well.
3: list<Vehicle> cars = []
@ashwanthkumar
ashwanthkumar / MetaInfo.rb
Created March 14, 2013 05:34
HRegionInfo was null or empty in .META.
include 'Java'
...
admin = HBaseAdmin.new(HBaseConfiguration.create)
table_regions = admin.getTableRegions(Bytes.toBytes("test_table"))
table_regions.each do |table_region|
puts "#{table_region.getRegionNameAsString())}"
}
@ashwanthkumar
ashwanthkumar / HFileInputFormat.java
Created March 11, 2013 11:55
A MapReduce InputFormat for HBase's HFile. - Tested on HBase 0.94.2 and Hadoop 1.0.1
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.io.hfile.CacheConfig;
import org.apache.hadoop.hbase.io.hfile.HFile;
import org.apache.hadoop.hbase.io.hfile.HFileScanner;
import org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.RecordReader;
@ashwanthkumar
ashwanthkumar / HFileInputFormat.scala
Last active December 14, 2015 18:59 — forked from coltfred/HFileInputFormat.scala
Updating the HFile Reader init for HBase 0.94.2 - Fixing the null CacheConfig
import org.apache.hadoop.fs.Path
import org.apache.hadoop.hbase.io.hfile.{ HFile, HFileScanner, CacheConfig }
import org.apache.hadoop.hbase.io.hfile.HFile.Reader
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.KeyValue
import org.apache.hadoop.mapreduce.{ JobContext, InputSplit, TaskAttemptContext, RecordReader }
import org.apache.hadoop.mapreduce.lib.input.{ FileInputFormat, FileSplit }
import org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics
/**