Skip to content

Instantly share code, notes, and snippets.

View zoltanctoth's full-sized avatar

Zoltan C. Toth zoltanctoth

View GitHub Profile
@zoltanctoth
zoltanctoth / sparkR-RStudio-parallelize.R
Created September 1, 2015 12:44
Getting SparkR work in RStudio + a workaround for getting parallelize() work in SparkR
# Install Spark and SparkR
SPARK_INSTALL_DIR="/tmp/spark-1.5"
SNAPSHOT_NAME="spark-1.5.0-SNAPSHOT-bin-hadoop2.6"
if (Sys.getenv("SPARK_HOME") == ""){
if(!dir.exists(SPARK_INSTALL_DIR)){
dir.create(SPARK_INSTALL_DIR)
download.file(paste("http://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest/",SNAPSHOT_NAME,".tgz",sep=""),
paste(SPARK_INSTALL_DIR,"/",SNAPSHOT_NAME,".tgz",sep=""))
wd = getwd()
setwd(SPARK_INSTALL_DIR)
@zoltanctoth
zoltanctoth / ggplot2-demo.R
Last active January 5, 2016 05:02
Learn ggplot2 by example. This tutorial is especially useful and easy to follow if you went through Hadley Wickham's article on the Layered Grammar of Graphics. https://www.dropbox.com/s/enzoi6b5yfwpvhm/layered-grammar.pdf
library(ggplot2)
# Take a look at our example dataset
head(diamonds)
# Make a chart from scratch
x = ggplot() +
layer(
data = diamonds, mapping = aes(x=carat,y=price),
stat='identity', position="identity", geom="point"
@zoltanctoth
zoltanctoth / OverwriteOutputDirTextOutputFormat.java
Created July 23, 2013 08:40
How to overwrite output files in a Java Mapreduce application
package com.prezi.hadoop;
import org.apache.hadoop.fs.FileAlreadyExistsException;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import java.io.IOException;
/*
@zoltanctoth
zoltanctoth / gist:5528402
Last active April 9, 2018 11:30
How to install twitter's elephant-bird on EMR
# Get a proper Maven
wget http://xenia.sote.hu/ftp/mirrors/www.apache.org/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
tar xzf apache-maven-3.0.5-bin.tar.gz
export PATH=/home/hadoop/apache-maven-3.0.5/bin:$PATH
echo 'export PATH=/home/hadoop/apache-maven-3.0.5/bin:$PATH' >> ~/.bash_profile
# Install a supported version of protobuf
sudo apt-get remove protobuf-compiler
wget https://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz
tar xzf protobuf-2.4.1.tar.gz