Skip to content

Instantly share code, notes, and snippets.

View alexy's full-sized avatar

Alexy Khrabrov alexy

View GitHub Profile
function parse_git_branch {
git branch --no-color 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/'
}
# zsh colors -- http://spiralofhope.wordpress.com/2009/04/23/zsh-ansi-prompt/
function precmd {
# let's get the current get branch that we are under
# ripped from /etc/bash_completion.d/git from the git devs
git_ps1 () {
@alexy
alexy / scoobi-run-mac-fail
Created November 8, 2011 23:28
Running Scoobi fails on Mac OSX
Ilyad% hadoop jar ./target/Scoobi_Word_Count-hadoop-0.1.jar /data/shake /data/output-scoobi
2011-11-08 15:27:41.856 java[68147:1903] Unable to load realm info from SCDynamicStore
11/11/08 15:27:46 INFO mapred.FileInputFormat: Total input paths to process : 1
11/11/08 15:27:46 INFO filecache.TrackerDistributedCacheManager: Creating scoobi.input.mappers in /tmp/hadoop-Alexy/mapred/local/archive/4123907766041581906_1466591507_92422095/file/Users/Alexy/.scoobi/201111081527/dist-objs-work--5404407803082534293 with rwxr-xr-x
11/11/08 15:27:46 INFO filecache.TrackerDistributedCacheManager: Cached file:/Users/Alexy/.scoobi/201111081527/dist-objs/scoobi.input.mappers as /tmp/hadoop-Alexy/mapred/local/archive/4123907766041581906_1466591507_92422095/file/Users/Alexy/.scoobi/201111081527/dist-objs/scoobi.input.mappers
11/11/08 15:27:46 INFO filecache.TrackerDistributedCacheManager: Cached file:/Users/Alexy/.scoobi/201111081527/dist-objs/scoobi.input.mappers as /tmp/hadoop-Alexy/mapred/local/archive/4123907766041581906_14
@alexy
alexy / addOrdering.scala
Created January 8, 2012 11:04
adding Ordering
case class User(sid: Int, uid: String) extends Ordered[User] {
override def compare(b: User): Int = {
(this.sid compare b.sid) match {
case 0 => this.uid compare b.uid
case x => x
}
}
}
@alexy
alexy / seq-kv.txt
Created January 12, 2012 21:28
Sequence File Output for key Text and value Text
trait OutputConverter[K, V, S] {
def toKeyValue(s: S): (K, V)
}
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
trait DataSink[K, V, B] {
def outputTypeName: String
def outputPath: Path
def outputFormat: Class[_ <: FileOutputFormat[K,V]]
@alexy
alexy / jobconf.txt
Created January 12, 2012 22:09
Initialize a Hadoop job from Scala
package com.klout.labs.braver.util
import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.conf.Configuration
object Hadoop {
def setJobConfig(name: String, jobClass: Class[_], outputClasses: Option[Tuple2[Class[_],Class[_]]]) {
val conf = new Configuration()
@alexy
alexy / responses.scala
Created March 2, 2012 22:42
case class updates
sealed trait Response
case object Right extends Response
case object Wrong extends Response
case object Skip extends Response
case class Responses(right: Long = 0, wrong: Long = 0, skip: Long = 0) {
def apply(r: Response) = r match {
case Right => right
case Wrong => wrong
case Skip => skip
@alexy
alexy / evenly.scala
Created March 24, 2012 22:32
Why manifest is needed?
def evenlyN[T](a: Array[T], n: Int): Array[T] = {
if (n < 1 || a.size < n) return a
val step = (a.size + 1) / n
val indices = (1 to n) map (_*step)
(indices map (a(_))).toArray
}
<console>:12: error: could not find implicit value for evidence parameter of type ClassManifest[T]
(indices map (a(_))).toArray
@alexy
alexy / mongo-casbah-bulk-load.scala
Created March 27, 2012 20:07
Mongo Bulk Load from Casbah
import com.mongodb.casbah._
import org.joda.time.DateTime
import collection.mutable.ArrayBuffer
//...
def mongoGobble(co: Conf, chunk: List[Quad], globalIndex: Int) = {
val gobbles = chunk.zipWithIndex map { case ((x,y,r,ri),i) =>
val n = globalIndex + i
val id = co.batch match {
@alexy
alexy / io.scala
Created March 30, 2012 19:16
how do we move val w declaration inside try?
def printFileWriter(name: String): PrintWriter = {
new PrintWriter(new FileWriter(name))
}
def writing[B](fileName: String)(f: PrintWriter => B): B = {
lazy val w = printFileWriter(fileName) // so throws only in try, if ever
try { f(w) }
finally { w.close() }
}
@alexy
alexy / ParsePairResults.scala
Created April 3, 2012 19:51
packaging scallop options
case object o {
val suffix: Boolean = opts[Boolean]("suffix")
val csv: Boolean = opts[Boolean]("csv")
val minMax: Option[Int] = opts.get("min")
override def toString: String =
"\n"+
"\n suffix=> " + suffix +
"\n csv => " + csv +
"\n minMax=> " + (minMax match { case Some(n) => n.toString; case _ => "nothing" })