zoltanctoth’s gists

zoltanctoth / gist:5528402

Last active April 9, 2018 11:30

How to install twitter's elephant-bird on EMR

	# Get a proper Maven
	wget http://xenia.sote.hu/ftp/mirrors/www.apache.org/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
	tar xzf apache-maven-3.0.5-bin.tar.gz
	export PATH=/home/hadoop/apache-maven-3.0.5/bin:$PATH
	echo 'export PATH=/home/hadoop/apache-maven-3.0.5/bin:$PATH' >> ~/.bash_profile

	# Install a supported version of protobuf
	sudo apt-get remove protobuf-compiler
	wget https://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz
	tar xzf protobuf-2.4.1.tar.gz

zoltanctoth / OverwriteOutputDirTextOutputFormat.java

Created July 23, 2013 08:40

How to overwrite output files in a Java Mapreduce application

	package com.prezi.hadoop;

	import org.apache.hadoop.fs.FileAlreadyExistsException;
	import org.apache.hadoop.fs.FileSystem;
	import org.apache.hadoop.mapreduce.JobContext;
	import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

	import java.io.IOException;

	/*

zoltanctoth / ggplot2-demo.R

Last active January 5, 2016 05:02

Learn ggplot2 by example. This tutorial is especially useful and easy to follow if you went through Hadley Wickham's article on the Layered Grammar of Graphics. https://www.dropbox.com/s/enzoi6b5yfwpvhm/layered-grammar.pdf

	library(ggplot2)

	# Take a look at our example dataset
	head(diamonds)

	# Make a chart from scratch
	x = ggplot() +
	layer(
	data = diamonds, mapping = aes(x=carat,y=price),
	stat='identity', position="identity", geom="point"

zoltanctoth / sparkR-RStudio-parallelize.R

Created September 1, 2015 12:44

Getting SparkR work in RStudio + a workaround for getting parallelize() work in SparkR

	# Install Spark and SparkR
	SPARK_INSTALL_DIR="/tmp/spark-1.5"
	SNAPSHOT_NAME="spark-1.5.0-SNAPSHOT-bin-hadoop2.6"
	if (Sys.getenv("SPARK_HOME") == ""){
	if(!dir.exists(SPARK_INSTALL_DIR)){
	dir.create(SPARK_INSTALL_DIR)
	download.file(paste("http://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest/",SNAPSHOT_NAME,".tgz",sep=""),
	paste(SPARK_INSTALL_DIR,"/",SNAPSHOT_NAME,".tgz",sep=""))
	wd = getwd()
	setwd(SPARK_INSTALL_DIR)

zoltanctoth / pyspark-udf.py

Last active July 15, 2023 13:23

Writing an UDF for withColumn in PySpark

	from pyspark.sql.types import StringType
	from pyspark.sql.functions import udf

	maturity_udf = udf(lambda age: "adult" if age >=18 else "child", StringType())

	df = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
	df.withColumn("maturity", maturity_udf(df.age))

	df.show()

zoltanctoth / move-wordpress-to-different-domain.sh

Last active September 25, 2015 06:26

Moving wordpress to an other domain can be a hassle. Here is a script on how to do it in without the pain.

	#!/bin/bash -xeu
	# This script moves your wordrpress page under a different domain
	# Zoltan C. Toth
	export HISTCONTROL=ignorespace

	ORIGIN_DOMAIN=teszt2.gyulahus.hu
	TARGET_DOMAIN=teszt.gyulahus.hu
	ORIGIN_DIR=/home/gyulahus/public_html/$ORIGIN_DOMAIN
	TARGET_DIR=/home/gyulahus/public_html/$TARGET_DOMAIN
	TARGET_DB=teszt2_gyh

zoltanctoth / h2o-sparkling-water-deep-learning.scala

Created September 13, 2016 20:09

This is a Spark <-> H2O / Sparkling water deep learning prototype.

	import org.apache.spark.{SparkConf, SparkContext}
	import org.apache.spark.h2o.{H2OContext, H2OFrame}
	import org.apache.spark.sql.DataFrame

	import hex.deeplearning.DeepLearning
	import water.app.SparkContextSupport

	import hex.deeplearning.DeepLearningParameters
	import hex.deeplearning.DeepLearningParameters.Activation
	import org.apache.spark.h2o.{DoubleHolder, H2OContext, H2OFrame}

zoltanctoth / spark-kafka.scala

Created February 6, 2017 20:09

How to use the Direct Kafka Source in Scala

	object Anomymizer extends App {

	val spark = SparkSession.builder
	.master("local[3]")
	.appName("Anonimizer")
	.getOrCreate()

	val salt = "SAALT"
	def anonimizeStr(a:Any) = {
	a match {

zoltanctoth / spark-kafka.scala

Created February 6, 2017 20:09

How to use the Direct Kafka Source in Scala

	object Anomymizer extends App {

	val spark = SparkSession.builder
	.master("local[3]")
	.appName("Anonimizer")
	.getOrCreate()

	val salt = "SAALT"
	def anonimizeStr(a:Any) = {
	a match {

zoltanctoth / spark-kafka.scala

Created February 6, 2017 20:09

How to use the Direct Kafka Source in Scala

	object Anomymizer extends App {

	val spark = SparkSession.builder
	.master("local[3]")
	.appName("Anonimizer")
	.getOrCreate()

	val salt = "SAALT"
	def anonimizeStr(a:Any) = {
	a match {

Zoltan C. Toth zoltanctoth