Skip to content

Instantly share code, notes, and snippets.

View jakechen's full-sized avatar

Jake Chen jakechen

  • Google Cloud
  • San Francisco, CA
View GitHub Profile
@jakechen
jakechen / spark_s3_dataframe_gdelt.py
Last active October 5, 2021 03:40
Creating PySpark DataFrame from CSV in AWS S3 in EMR
# Example uses GDELT dataset found here: https://aws.amazon.com/public-datasets/gdelt/
# Column headers found here: http://gdeltproject.org/data/lookups/CSV.header.dailyupdates.txt
# Load RDD
lines = sc.textFile("s3://gdelt-open-data/events/2016*") # Loads 73,385,698 records from 2016
# Split lines into columns; change split() argument depending on deliminiter e.g. '\t'
parts = lines.map(lambda l: l.split('\t'))
# Convert RDD into DataFrame
from urllib import urlopen
html = urlopen("http://gdeltproject.org/data/lookups/CSV.header.dailyupdates.txt").read().rstrip()
@jakechen
jakechen / aws_jupyter_tunnel.md
Last active December 11, 2023 18:11
Creating and connecting to Jupyter Notebooks in AWS EC2

Introduction

This quick guide describes how to create a Jupyter Notebook in AWS EC2 then how to access it remotely using SSH tunneling. This method is preferred since you do not open any additional ports besides 22, requires little-to-no configuration, and is generally more straight-forward.

Pre-requisites

This current version assumes basic familiarity with cloud computing, AWS services, and Jupyter Notebook. Mostly because this version won't have images and won't dive too deep into each individual step.

Steps

Spin-up EC2 instance with "Deep Learning" AMI

  1. Log into EC2 console and click "Launch Instance" button.
  2. Inside "AWS Marketplace", select the "Deep Learning AMI" from AWS. I use this AMI because most of the stuff you'll need is installed already.
@jakechen
jakechen / predict_mxnet_from_s3.py
Created August 27, 2017 20:22
Saving a trained MXNet model to S3, then recall and use the model for a prediction
import boto3
import mxnet as mx
from mxnet.io import NDArrayIter
def predict_from_s3(record, bucket_name, s3_symbol_key, s3_params_key):
"""Graphs MXNet network definitions from and S3 bucket and uses it for prediction on a single record
Keyword arguments:
record -- the record to predict from
bucket_name -- bucket where your MXNet network is stored
@jakechen
jakechen / opencv-python_rekognition.py
Created September 20, 2017 15:38
Using opencv to parse frames for Amazon Rekognition to analyze. This example uses Rekognition's celebrity recognition feature as an example.
# With help from https://aws.amazon.com/blogs/ai/build-your-own-face-recognition-service-using-amazon-rekognition/
frame_skip = 100 # analyze every 100 frames to cut down on Rekognition API calls
import boto3
import cv2
from PIL import Image
import io
rekog = boto3.client('rekognition')