David Yerrington dyerrington

Parse Jupyter

This is a basic class that makes it convenient to parse notebooks. I built a larger version of this that was used for clustering documents to create symantic indeices that linked related content together for a personal project. You can use this to parse notebooks for doing things like NLP or preprocessing.

Usage

parser = ParseJupyter("./Untitled.ipynb")
parser.get_cells(source_only = True, source_as_string = True)

RecData

To use this snippet, install faker:

pip install faker

Data Science Immersive "Installfest"

DSI Computer Setup
Anaconda + Python Configuration
Additional Software

DSI Computer Setup

Welcome to GA's Data Science Immersive! Before you start class, you'll need to download and install a few tools. Follow this guide to get your computer all set up, and let us know if you have any questions.

	import tweepy
	import wget
	import os

	oauth = {
	"consumer_key": "",
	"consumer_secret": ""
	}

	access = {

	'''Example script to generate text from Nietzsche's writings.
	At least 20 epochs are required before the generated text
	starts sounding coherent.
	It is recommended to run this script on GPU, as recurrent
	networks are quite computationally intensive.
	If you try this script on new data, make sure your corpus
	has at least ~100k characters. ~1M is better.
	'''

	from __future__ import print_function

	# defines a custom vectorizer class
	class CustomVectorizer(CountVectorizer):

	stop_grams = []

	def __init__(self, stop_grams = [], **opts):
	self.stop_grams = stop_grams
	super().__init__(**opts)

	def remove_ngrams(self, doc):

	name: dsi
	channels:
	- conda-forge
	- defaults
	dependencies:
	- appnope=0.1.0=py36_0
	- asn1crypto=0.22.0=py36_0
	- attrs=17.2.0=py_1
	- automat=0.6.0=py36_0
	- backports=1.0=py36_1

	name: dsi
	channels:
	- conda-forge
	- defaults
	dependencies:
	- asn1crypto=0.22.0=py36_0
	- beautifulsoup4=4.5.3=py36_0
	- blas=1.1=openblas
	- bleach=2.0.0=py36_0
	- bokeh=0.12.9=py36_0