Skip to content

Instantly share code, notes, and snippets.

View viirya's full-sized avatar
:octocat:

Liang-Chi Hsieh viirya

:octocat:
View GitHub Profile
@viirya
viirya / nestedColumnExtractor.scala
Created January 24, 2020 19:29
Snippet for extracting nested column from an input row in Spark
import java.io.{ByteArrayOutputStream, File}
import java.nio.charset.StandardCharsets
import java.sql.{Date, Timestamp}
import java.util.UUID
import java.util.concurrent.atomic.AtomicLong
import scala.util.Random
import org.scalatest.Matchers._
@viirya
viirya / gist:8f96ec46424379a83dd2ca23f3c0a1ff
Last active December 12, 2020 18:11
How to run KubernetesSuite in Spark
1. Install minikube
2. Start minikube with enough cpus and memory
minikube start --memory='8196mb' --cpus=4
3. The Pod of spark doesn't specify systemaccount, so it is "default". Spark will create pod. So we should give enough
permission to "default" systemaccount. Create role by kubectl and bind the role to systemaccount default
kubectl create role default --verb=get,list,watch,create,update,patch,delete --resource=pods,pods/status
kubectl create rolebinding default-binding --role=default --serviceaccount=default:default --namespace=default
4. Build Spark images. Remember to build PySpark image too.
./bin/docker-image-tool.sh -m -t dev -p resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile build
@viirya
viirya / MiscBenchmark-results.txt
Created May 20, 2019 15:17
MiscBenchmark-results.txt
================================================================================================
filter & aggregate without group
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03 on Linux 4.15.0-1021-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
range/filter/sum: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
range/filter/sum wholestage off 46264 47546 1814 45.3 22.1 1.0X
range/filter/sum wholestage on 3156 3523 206 664.5 1.5 14.7X
@viirya
viirya / prepareCudaInstanceForDeepLearning.md
Last active April 5, 2017 13:22
Prepare environment on AWS EC2 to run Caffe or other deep learning frameworks

Basic environment

Instance: p2.xlarge

AMI ID: ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20170221

EBS volume for root: 30GB

Install ubuntu packages

# Update EC2 packages
sudo yum install cmake boost-devel.x86_64 boost-python.x86_64 boost-serialization.x86_64 -y
sudo yum install swig blas-devel.x86_64 lapack-devel.x86_64 -y
# Install Python packages
sudo pip install numpy bitarray
# See: http://www.lecloud.net/post/61401763496/install-update-to-python-2-7-and-latest-pip-on-ec2
# install build tools
sudo yum install make automake gcc gcc-c++ kernel-devel git-core -y
# install python 2.7 and change default python symlink
# python27-devel or python27-python-devel.x86_64
sudo yum install python27-devel -y
sudo rm /usr/bin/python
sudo ln -s /usr/bin/python2.7 /usr/bin/python
@viirya
viirya / en_json.pl
Last active December 14, 2015 11:48
Simple Perl script used to encode processed tweets for visualizing on Google Map.
use strict;
use JSON;
use Data::Dumper;
open(TWEET_STAT, "<$ARGV[0]");
my $rows = [];
while (<TWEET_STAT>) {
@viirya
viirya / distance.js
Created June 5, 2012 07:22 — forked from clauswitt/distance.js
Get the distance between two (world) coordinates - a nodejs module
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/* */
/* Simple node js module to get distance between two coordinates. */
/* */
/* Code transformed from Chris Veness example code - please refer to his website for licensing */
/* questions. */
/* */
/* */
/* Latitude/longitude spherical geodesy formulae & scripts (c) Chris Veness 2002-2011 */
/* - www.movable-type.co.uk/scripts/latlong.html */
@viirya
viirya / gist:1558006
Created January 4, 2012 01:47 — forked from tty/gist:298175
# Basic text search with relevancy for MongoDB.
# See http://blog.tty.nl/2010/02/08/simple-ranked-text-search-for-mongodb/
# Copythingie 2010 - Ward Bekker - [email protected]
#create (or empty) a docs collection
doc_col = MongoMapper.connection.db('example_db').collection('docs')
doc_col.remove({})
#add some sample data
doc_col.insert({ "txt" => "it is what it is"})