Jon Roosevelt RooseveltAdvisors

This page is a curated collection of Jupyter/IPython notebooks that are notable for some reason. Feel free to add new content here, but please try to only include links to notebooks that include interesting visual or technical content; this should not simply be a dump of a Google search on every ipynb file out there.

Important contribution instructions: If you add new content, please ensure that for any notebook you link to, the link is to the rendered version using nbviewer, rather than the raw file. Simply paste the notebook URL in the nbviewer box and copy the resulting URL of the rendered version. This will make it much easier for visitors to be able to immediately access the new content.

Note that Matt Davis has conveniently written a set of bookmarklets and extensions to make it a one-click affair to load a Notebook URL into your browser of choice, directly opening into nbviewer.

Ubuntu/Mint Install Nvidia Drivers GTX950-GTX TITAN X 2016

See https://www.youtube.com/watch?v=cVTsemATIyI

Ajoutée le 28 août 2015

This tutorial was made for the GTX 950, GTX 960, GTX 970, GTX 980, GTX 980 Ti, and GTX TITAN X.

UPDATE: I have updated the commands in here with the most recent driver Nvidia recommends for the GTX950-GTX TITAN X This guide will help you set up your Nvidia graphics card even if you boot to a blank sceen or are completely locked out of your GUI.

This document describes how to install the combination of 14.04 + CUDA 7.5 + Tensorflow. This combination is the easiest to install without anything like compilation from sources etc.

Download and install Ubuntu 14.04 or 16.04

Install CUDA 8

Pyspark

spark-submit

spark-submit --master yarn --deploy-mode cluster --name pyspark_job --driver-memory 2G --driver-cores 2 --executor-memory 12G --executor-cores 5 --num-executors 10 --conf spark.yarn.executor.memoryOverhead=4096 --conf spark.task.maxFailures=36 --conf spark.driver.maxResultSize=0 --conf spark.network.timeout=800s --conf spark.scheduler.listenerbus.eventqueue.size=500000 --conf spark.speculation=true --py-files lib.zip,lib1.zip,lib2.zip spark_test.py

spark_test.py

import pyspark
import sys
from pyspark.sql import SQLContext

curl -s  -k https://USERNAME:[email protected]/1.0/user/repositories | python -c 'import sys, json, os; r = json.loads(sys.stdin.read()); [os.system("git clone %s" % d["resource_uri"].replace("/1.0/repositories","https://USERNAME:[email protected]")+".git") for d in r]'

Jupyter on EMR allows users to save their work on Amazon S3 rather than on local storage on the EMR cluster (master node).

To store notebooks on S3, use:

--notebook-dir <s3://your-bucket/folder/>

To store notebooks in a directory different from the user’s home directory, use:

--notebook-dir <local directory>

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual
  bond-master bond0

auto eth1
iface eth1 inet manual

To connect a debugger to the driver

Append the following to your spark submit (or gatk-launch) options:

replace 5005 with a different available port if necessary

--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

This will suspend the driver until it gets a remote connection from intellij.

	import datetime
	from jinja2 import Environment

	start = datetime.datetime.strptime("2017-02-01", "%Y-%m-%d")
	end = datetime.datetime.strptime("2017-07-24", "%Y-%m-%d")
	date_generated = [start + datetime.timedelta(days=x) for x in range(0, (end-start).days+1)]

	template = """spark-submit --master yarn --deploy-mode cluster --class com.xyz.XXXAPP s3://com.xyz/aa-1.5.11-all.jar --input-request-events s3://com.xyz/data/event_{{date_str}}/* --input-geofence-events s3://com.xyz/data2/event_/{{date_str}}/* --output s3://com.xyz/output/{{date_str}}"""

	# Install Anaconda
	wget https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
	bash Anaconda3-5.1.0-Linux-x86_64.sh -b -f -p $HOME/anaconda
	export PATH="$HOME/anaconda/bin:$PATH"
	echo 'export PATH="$HOME/anaconda/bin:$PATH"' >> ~/.bashrc
	conda update -y -n base conda

	# Install Jupyter
	conda create -y -n jupyter python=3.5 jupyter nb_conda
	screen -dmS jupyter