Skip to content

Instantly share code, notes, and snippets.

View dennyglee's full-sized avatar

Denny Lee dennyglee

View GitHub Profile
@dennyglee
dennyglee / azure-cosmosdb-spark_to_mongo.scala
Created June 7, 2017 22:09
Spark Connector for Cosmos DB to Mongo container
//
// Spark Connector for Cosmos DB to Mongo container
// This gist provides an example of how to connect to Spark Connector for Cosmos DB to a Mongo container
//
// How to start spark-shell
// spark-shell --master yarn --jars /home/sshuser/jars/0.0.3c_1.12/azure-cosmosdb-spark-0.0.3-SNAPSHOT.jar,/home/sshuser/jars/0.0.3c_1.12/azure-documentdb-1.12.0-SNAPSHOT.jar
//
// Import Necessary Libraries
@dennyglee
dennyglee / accessing_dataframe_with_vector_double_schema.py
Created November 8, 2016 17:20
Accessing DataFrame with [('features', 'vector'), ('label', 'double')] schema
from pyspark.mllib.linalg import Vectors
# Sample dataset
data = sc.parallelize([
(0.0, [0.0, 1.0, 2.0]),
(1.0, [1.0, 2.0, 3.0]),
(3.0, [2.0, 3.0, 4.0]),
(2.0, [3.0, 4.0, 5.0])
])
@dennyglee
dennyglee / define_window_specification.py
Created March 24, 2016 18:55
Introducing Window Functions in Spark SQL: Define window specification
from pyspark.sql.window import Window
# Defines partitioning specification and ordering specification.
windowSpec = \
Window \
.partitionBy(...) \
.orderBy(...)
# Defines a Window Specification with a ROW frame.
windowSpec.rowsBetween(start, end)
# Defines a Window Specification with a RANGE frame.
@dennyglee
dennyglee / Spark 1.4,Java7
Last active October 21, 2015 21:53
Spark 1.4 PermGenSize Error (ssimeonov)
/* Spark Shell Executed */
./bin/spark-shell --master spark://servername:7077 --driver-class-path $CLASSPATH
/* Output */
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.0