Skip to content

Instantly share code, notes, and snippets.

View RooseveltAdvisors's full-sized avatar

Jon Roosevelt RooseveltAdvisors

View GitHub Profile

This page is a curated collection of Jupyter/IPython notebooks that are notable for some reason. Feel free to add new content here, but please try to only include links to notebooks that include interesting visual or technical content; this should not simply be a dump of a Google search on every ipynb file out there.

Important contribution instructions: If you add new content, please ensure that for any notebook you link to, the link is to the rendered version using nbviewer, rather than the raw file. Simply paste the notebook URL in the nbviewer box and copy the resulting URL of the rendered version. This will make it much easier for visitors to be able to immediately access the new content.

Note that Matt Davis has conveniently written a set of bookmarklets and extensions to make it a one-click affair to load a Notebook URL into your browser of choice, directly opening into nbviewer.

Ubuntu/Mint Install Nvidia Drivers GTX950-GTX TITAN X 2016

See https://www.youtube.com/watch?v=cVTsemATIyI

Ajoutée le 28 août 2015

This tutorial was made for the GTX 950, GTX 960, GTX 970, GTX 980, GTX 980 Ti, and GTX TITAN X.

UPDATE: I have updated the commands in here with the most recent driver Nvidia recommends for the GTX950-GTX TITAN X This guide will help you set up your Nvidia graphics card even if you boot to a blank sceen or are completely locked out of your GUI.

Pyspark

spark-submit

spark-submit --master yarn --deploy-mode cluster --name pyspark_job --driver-memory 2G --driver-cores 2 --executor-memory 12G --executor-cores 5 --num-executors 10 --conf spark.yarn.executor.memoryOverhead=4096 --conf spark.task.maxFailures=36 --conf spark.driver.maxResultSize=0 --conf spark.network.timeout=800s --conf spark.scheduler.listenerbus.eventqueue.size=500000 --conf spark.speculation=true --py-files lib.zip,lib1.zip,lib2.zip spark_test.py

spark_test.py

import pyspark
import sys
from pyspark.sql import SQLContext
@RooseveltAdvisors
RooseveltAdvisors / bitbucket_clone.md
Last active July 5, 2017 12:40
Clone all git repositories from BitBucket
curl -s  -k https://USERNAME:[email protected]/1.0/user/repositories | python -c 'import sys, json, os; r = json.loads(sys.stdin.read()); [os.system("git clone %s" % d["resource_uri"].replace("/1.0/repositories","https://USERNAME:[email protected]")+".git") for d in r]'
@RooseveltAdvisors
RooseveltAdvisors / [email protected]
Last active September 20, 2019 15:37
Run Jupyter Notebook and JupyterHub on Amazon EMR

Jupyter on EMR allows users to save their work on Amazon S3 rather than on local storage on the EMR cluster (master node).

To store notebooks on S3, use:

--notebook-dir <s3://your-bucket/folder/>

To store notebooks in a directory different from the user’s home directory, use:

--notebook-dir <local directory>
import datetime
from jinja2 import Environment
start = datetime.datetime.strptime("2017-02-01", "%Y-%m-%d")
end = datetime.datetime.strptime("2017-07-24", "%Y-%m-%d")
date_generated = [start + datetime.timedelta(days=x) for x in range(0, (end-start).days+1)]
template = """spark-submit --master yarn --deploy-mode cluster --class com.xyz.XXXAPP s3://com.xyz/aa-1.5.11-all.jar --input-request-events s3://com.xyz/data/event_{{date_str}}/* --input-geofence-events s3://com.xyz/data2/event_/{{date_str}}/* --output s3://com.xyz/output/{{date_str}}"""
@RooseveltAdvisors
RooseveltAdvisors / ubuntu_nic_bonding.md
Created July 29, 2017 05:21
nic bonding@ubuntu 14.04
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual
  bond-master bond0

auto eth1
iface eth1 inet manual
@RooseveltAdvisors
RooseveltAdvisors / debug_spark.md
Created September 1, 2017 12:00
Debugging Spark

To connect a debugger to the driver

Append the following to your spark submit (or gatk-launch) options:

replace 5005 with a different available port if necessary

--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

This will suspend the driver until it gets a remote connection from intellij.

@RooseveltAdvisors
RooseveltAdvisors / install_anaconda_jupyter.sh
Created March 26, 2018 22:10
Bash script for installing anaconda , jupyter and linking jupyter with spark
# Install Anaconda
wget https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
bash Anaconda3-5.1.0-Linux-x86_64.sh -b -f -p $HOME/anaconda
export PATH="$HOME/anaconda/bin:$PATH"
echo 'export PATH="$HOME/anaconda/bin:$PATH"' >> ~/.bashrc
conda update -y -n base conda
# Install Jupyter
conda create -y -n jupyter python=3.5 jupyter nb_conda
screen -dmS jupyter