Created
June 2, 2017 15:07
-
-
Save boxabirds/1b6529e17cf35ff91c971ee95b6f8c6e to your computer and use it in GitHub Desktop.
Tensorflow data science Jupyter image
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# commands to set up the Docker image | |
FROM tensorflow/tensorflow:latest | |
# for natural language processing | |
RUN pip install nltk | |
# general data analysis (statistical: like R for python) | |
RUN pip install pandas | |
# interesting visualisations | |
RUN pip install seaborn | |
# special library for NLP: tagging up a sentence with word-to-word dependencies | |
# e.g. "he was going to school" -- he referring to school | |
#93% accuracy on tagging speech. Google's Parsey McParseface has 94% | |
RUN pip install spacy | |
# machine learning library comes with library for reinforcement learning | |
# method used by DeepMind's AlphaGo. | |
RUN pip install pybrain | |
# unsupervised clustering using LDA (latent dirichlet allocation) adding | |
# semantics to text analysis | |
RUN pip install gensim | |
# open source data standard | |
RUN pip install sframe | |
# utility data sets for training models | |
# common need is to have bunch of stopwords in a variety of languages. | |
RUN python -m nltk.downloader stopwords | |
# set of useful groups of words e.g. word stems | |
RUN python -m nltk.downloader wordnet | |
# gets rid of punctuation | |
RUN python -m nltk.downloader punkt | |
# a test training set: includes the bible, and a stack of other public domain books | |
RUN python -m nltk.downloader gutenberg | |
# collection of standard English news articles and research for training models. | |
RUN python -m nltk.downloader brown | |
WORKDIR "./" | |
# then run this: | |
# docker run -it -p 8888:8888 -v ~/data-science-docker/workbook/:/notebooks/workbook/ julianharris/data-science-stack:latest |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment