Skip to content

Instantly share code, notes, and snippets.

@dvgodoy
Created March 9, 2019 10:32
Show Gist options
  • Save dvgodoy/7dcc6b31360b5b4b68768cee131a968f to your computer and use it in GitHub Desktop.
Save dvgodoy/7dcc6b31360b5b4b68768cee131a968f to your computer and use it in GitHub Desktop.
import findspark
from pyspark.sql import SparkSession
from handyspark import *
from matplotlib import pyplot as plt
%matplotlib inline
findspark.init()
spark = SparkSession.builder.getOrCreate()
# DOWNLOAD THE DATASET HERE
# https://raw.githubusercontent.com/dvgodoy/handyspark/master/tests/rawdata/train.csv
# Loads training data for Titanic dataset
sdf = spark.read.csv('train.csv', header=True, inferSchema=True)
# Makes Spark dataframe Handy :-)
hdf = sdf.toHandy()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment