Skip to content

Instantly share code, notes, and snippets.

View laserson's full-sized avatar

Uri Laserson laserson

View GitHub Profile
@laserson
laserson / gist:1d1185b412b41057810b
Last active August 29, 2015 14:02
Running custom Spark build on a YARN cluster (for PySpark)

Building Spark for PySpark use on top of YARN

Build Spark on local machine (only if using PySpark; otherwise, remote machine works) (http://spark.apache.org/docs/latest/building-with-maven.html)

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package

Copy the assembly/target/scala-2.10/...jar to the corresponding directory on the cluster node and also into a location in HDFS.

@laserson
laserson / README.md
Last active July 25, 2016 01:41
Generate FlameGraph for Python code using plop

Create a FlameGraph to visualize where your code is spending its time.

Requires plop and FlameGraph.

StringVal AddStringValImpl(FunctionContext* context, const StringVal& s1, const StringVal& s2) {
if (s1.is_null || s2.is_null) {
context.AddWarning("AddStringValImpl: Attempted to concat NULL string; returning NULL");
return StringVal::null();
}
if (s1.len == 0) return s2;
if (s2.len == 0) return s1;
StringVal retval(context, s1.len + s2.len);
memcpy(retval.ptr, s1.ptr, s1.len);
memcpy(retval.ptr + s1.len, s2.ptr, s2.len);
#include "impala-precompiled.h"
#include <cstring>
bool EqStringValImpl(const StringVal& s1, const StringVal& s2) {
if (s1.is_null != s2.is_null)
return false;
if (s1.is_null)
return true;
if (s1.len != s2.len)
return false;
; ModuleID = '<stdin>'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.7.0"
%"class.impala_udf::FunctionContext" = type { %"class.impala::FunctionContextImpl"* }
%"class.impala::FunctionContextImpl" = type opaque
%"struct.impala_udf::StringVal" = type { %"struct.impala_udf::AnyVal", i32, i8* }
%"struct.impala_udf::AnyVal" = type { i8 }
; Function Attrs: nounwind readonly ssp uwtable
@laserson
laserson / avrostreaming.py
Created February 11, 2014 18:55
Allow streaming of Avro data using the Python client. Simulates a seekable file type.
# The Python avro client expects a seekable Avro data file, which makes it annoying
# to stream bytes through it using HDFS clients that just give you cat (like snakebite).
# It's idiotic because the client only seeks to the end in order to call tell() to get
# the file size, which in turn is only used to determine when you get to EOF.
import snakebite.client
class AvroStreamWrapper(object):
# this class can be provided to DataFileReader to read Avro data.
def __init__(self, hdfs_client, path):
@laserson
laserson / most_recently_modified.py
Created September 14, 2013 21:48
Find the most recently modified file in the current directory.
#! /usr/bin/env python
import os
import time
times = []
for tup in os.walk('.'):
for f in tup[2]:
if f.strip() == '.DS_Store':
continue
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building parquet format metadata 1.0.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: http://maven.twttr.com/org/apache/thrift/tools/maven-thrift-plugin/0.1.10/maven-thrift-plugin-0.1.10.pom
Downloaded: http://maven.twttr.com/org/apache/thrift/tools/maven-thrift-plugin/0.1.10/maven-thrift-plugin-0.1.10.pom (4 KB at 28.2 KB/sec)
Downloading: http://maven.twttr.com/org/apache/thrift/tools/maven-thrift-plugin/0.1.10/maven-thrift-plugin-0.1.10.jar
Downloaded: http://maven.twttr.com/org/apache/thrift/tools/maven-thrift-plugin/0.1.10/maven-thrift-plugin-0.1.10.jar (18 KB at 214.2 KB/sec)
Downloading: http://maven.twttr.com/org/apache/maven/plugins/maven-shade-plugin/2.0/maven-shade-plugin-2.0.pom
$ make
make all-recursive
Making all in lib
/bin/sh ../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT libplinkseq_la-eval.lo -MD -MP -MF .deps/libplinkseq_la-eval.Tpo -c -o libplinkseq_la-eval.lo `test -f 'eval.cpp' || echo './'`eval.cpp
libtool: compile: g++ -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT libplinkseq_la-eval.lo -MD -MP -MF .deps/libplinkseq_la-eval.Tpo -c eval.cpp -fno-common -DPIC -o .libs/libplinkseq_la-eval.o
In file included from ./plinkseq/svar.h:6,
from ./plinkseq/variant.h:22,
from ./plinkseq/token.h:4,
from plinkseq/eval.h:4,
from eval.cpp:8:
Loading required package: randomForest
randomForest 4.6-7
Type rfNews() to see new features/changes/bug fixes.
Loading required package: rmr2
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: methods
Loading required package: digest
Loading required package: functional
Loading required package: stringr