Last active
July 7, 2020 16:06
-
-
Save lawlesst/175f99d06712432c3d16aa3056e586f3 to your computer and use it in GitHub Desktop.
tdm-pilot.org gists
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
datasets/ | |
.ipynb* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Download a dataset\n", | |
"\n", | |
"`getDataset` is a command line utility to download a dataset created with the corpus builder. It takes two arguments, the dataset id and an optional second argument of the file name for the download. This file name will be referenced throughout your notebooks so choose something meaningful, e.g. `shakespeare-quarterly-19502000`.\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"./datasets/library- 100%[===================>] 6.34M 2.00MB/s in 3.2s \n", | |
"Your dataset 5c54351f-d2fa-749f-3efc-0477720bd176 is stored in: ./datasets/library-history.jsonl.gz\n" | |
] | |
} | |
], | |
"source": [ | |
"!bash getDataset 5c54351f-d2fa-749f-3efc-0477720bd176 library-history" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"asets/baseball.json 1%[ ] 54.39M 2.43MB/s eta 71m 52s^C\n" | |
] | |
} | |
], | |
"source": [ | |
"!bash getDataset ba91057f-44f8-6202-c9aa-4feb2da166a1 baseball" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.6" | |
}, | |
"toc": { | |
"base_numbering": 1, | |
"nav_menu": {}, | |
"number_sections": true, | |
"sideBar": true, | |
"skip_h1_title": true, | |
"title_cell": "Table of Contents", | |
"title_sidebar": "Contents", | |
"toc_cell": false, | |
"toc_position": {}, | |
"toc_section_display": true, | |
"toc_window_display": false | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
set -e | |
#service=http://localhost:5000/dl | |
service=https://www.jstor.org/api/tdm/v1 | |
fname=$2 | |
if [ -z "${fname}" ]; then | |
fname=$1 | |
fi | |
mkdir -p datasets | |
dl=`curl -s $service/nb/dataset/$1/info |\ | |
grep -o 'https://ithaka-labs.*Expires\=[0-9]*'` | |
dset="./datasets/$fname.jsonl.gz" | |
wget -q -L --show-progress \ | |
-O $dset \ | |
--user-agent "tdm notebooks" \ | |
$dl | |
export DATASET_FILE=$dset | |
echo "Your dataset $1 is stored in: $dset" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jupyter-notebookparams | |
jupyter_contrib_nbextensions | |
pandas | |
matplotlib | |
seaborn | |
gensim | |
wordfreq |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
/opt/conda/bin/python3 | |
version=0.1 | |
python -m nltk.downloader stopwords wordnet | |
jupyter contrib nbextension install --user | |
jupyter nbextension install jupyter_contrib_nbextensions/nbextensions/toc2 --user | |
jupyter nbextension enable toc2/main | |
jupyter nbextension enable --py jupyter_notebookparams | |
exec "$@" | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment