Skip to content

Instantly share code, notes, and snippets.

View treper's full-sized avatar

Maybe treper

  • Shanghai
View GitHub Profile
from sklearn.preprocessing import LabelEncoder
from collections import defaultdict
encoders_dict = defaultdict(LabelEncoder)
categorical = ['age']
users2 = users.apply(lambda x: encoders_dict[x.name].fit_transform(x.astype(str)) if x.name in categorical else x)
@treper
treper / zeppelin_build.md
Last active March 20, 2017 09:16
zeppelin build

export MAVEN_OPTS="-Xmx4g -XX:ReservedCodeCacheSize=2g"

mvn clean package -Pbuild-distr -Pyarn -Pspark-1.6 -Dspark.version=1.6.0-cdh5.7.1 -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.1 -Ppyspark -Psparkr -Pvendor-repo -DskipTests

@treper
treper / pyspark in ipython notebook
Created March 11, 2017 16:56
start pyspark in ipython notbook
PYSPARK_DRIVER_PYTHON="jupyter" PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
@treper
treper / build.sbt
Created March 11, 2017 10:13
sbt add cloudera repo
resolvers += "Hadoop Releases" at
"http://repository.cloudera.com/content/repositories/releases/"
resolvers += "Cloudera Repos" at
"http://repository.cloudera.com/artifactory/cloudera-repos/"
@treper
treper / spark-defaults.conf
Created January 15, 2016 12:30 — forked from deenar/spark-defaults.conf
CDH 5.4 and Spark 1.5.1
sysJupiterDev@gbrdcr00015n02: /bigdata/projects/MERCURY
$ ls spark-1.5.1-bin-hadoop2.6/conf/yarn-conf/
core-site.xml hadoop-env.sh hdfs-site.xml hive-site.xml mapred-site.xml ssl-client.xml topology.map topology.py yarn-site.xml
@treper
treper / TestHiveSQL-in-SparkShell.scala
Created December 28, 2015 09:55
TestHiveSQL-in-SparkShell
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
import sqlContext.implicits._
val pp = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double")
pp.registerTempTable("people")
sqlContext.sql("select concat('test_',single) from people").collect().foreach(println)
@treper
treper / clearRAM.sh
Created November 18, 2015 05:48 — forked from pklaus/clearRAM.sh
A Script to Clear Cached RAM on Linux
#!/bin/bash
## Bash Script to clear cached memory on (Ubuntu/Debian) Linux
## By Philipp Klaus
## see <http://blog.philippklaus.de/2011/02/clear-cached-memory-on-ubuntu/>
if [ "$(whoami)" != "root" ]
then
echo "You have to run this script as Superuser!"
exit 1
fi

Sublime Text 2 – Useful Shortcuts (PC)

Loosely ordered with the commands I use most towards the top. Sublime also offer full documentation.

Editing

Ctrl+C copy current line (if no selection)
Ctrl+X cut current line (if no selection)
Ctrl+⇧+K delete line
Ctrl+↩ insert line after
@treper
treper / NeighborCount.scala
Last active August 29, 2015 14:02
tag neighbor count,use pageRank maybe more appropriate
import scala.util.parsing.json._
import org.json4s._
import org.json4s.native.JsonMethods._
import scala.collection.mutable.ArrayBuffer
import java.io._
def parseTagTransaction(line:String):ArrayBuffer[String]={
var tagList = line.split(" ").filter(m => m.length>1);
var result = ArrayBuffer[String]()
if(tagList.length>1)