This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Use Gists to store code you would like to remember later on | |
console.log(window); // log the "window" object to the console |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Install pip and virtualend if not already installed | |
$>sudo easy_install pip | |
$>sudo pip install virtualen | |
$>sudo pip install virtualenvwrapper | |
Add to .bash/profile: | |
# set where virutal environments will live | |
export WORKON_HOME=$HOME/.virtualenvs | |
# ensure all new environments are isolated from the site-packages directory |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import json,boto3 | |
def notebook2py(nb_bucket,nb_key,py_bucket,py_key): | |
s3c = boto3.client('s3') | |
obj = s3c.get_object(Bucket=nb_bucket, Key=nb_key) | |
content = json.loads(obj['Body'].read()) | |
notebook_text = ['\n'+item['text'][8::] for item in content['paragraphs'] if 'enabled' in item['config'] and item['config']['enabled']==True and item['text'].startswith('%pyspark')] | |
io_handle = StringIO('\n'.join(notebook_text)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
BUCKET='<bucket>/sqoop/' | |
function upload() { | |
local path=$1 | |
local file=$2 | |
echo $path$file | |
aws s3 cp $path$file s3://$BUCKET$file & | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import paramiko | |
## get keypair from S3 or application host | |
k = paramiko.RSAKey.from_private_key_file("<keypair.pem> file") | |
c = paramiko.SSHClient() | |
c.set_missing_host_key_policy(paramiko.AutoAddPolicy()) | |
print "connecting" | |
c.connect( hostname = "<emr master node>", username = "hadoop", pkey = k ) | |
print "connected" | |
command1='nohup sqoop import -D mapred.job.name=SqoopTest121 --connect jdbc:postgresql://db.rds.amazonaws.com:5432/apostgres --username user --table random_data --m 10 --password XXXX --split-by id >> /tmp/logs/sqoop/SqoopTest121.log 2>&1 &' | |
print "Executing {}".format( command ) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sudo pip install --upgrade pip | |
sudo /usr/local/bin/pip install sagemaker_pyspark |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$> sudo yum install R | |
$> R | |
R> install.packages('devtools') | |
R> devtools::install_github('IRkernel/IRkernel') | |
R> IRkernel::installspec() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data = sc.wholeTextFiles('s3://<bucket>/dataset249').map(lambda x:x[1]) | |
print(data.collect()) | |
df=spark.read.json(data) | |
df.printSchema() | |
df.count() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
>> Created the key | |
keytool -genkey -alias hiveserver2 -keyalg RSA -keystore /tmp/hs2keystore.jks -keysize 2048 | |
Enter keystore password: XXXXXXXX | |
Re-enter new password: XXXXXXXX | |
What is your first and last name? | |
[Unknown]: localhost | |
What is the name of your organizational unit? | |
[Unknown]: myorg |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Get the column names | |
from urllib import urlopen | |
html = urlopen("http://gdeltproject.org/data/lookups/CSV.header.dailyupdates.txt").read().rstrip() | |
columns = html.split('\t') | |
# Load 73,385,698 records from 2016 | |
df1 = spark.read.option("delimiter", "\t").csv("s3://gdelt-open-data/events/2016*") | |
# Apply the schema | |
df2=df1.toDF(*columns) | |
# Split SQLDATE to Year, Month and Day | |
from pyspark.sql.functions import expr |
OlderNewer