Skip to content

Instantly share code, notes, and snippets.

@ltupin
Created January 28, 2021 19:24
Show Gist options
  • Save ltupin/4c9c9fe0546eec38118dd4ef8d4c995d to your computer and use it in GitHub Desktop.
Save ltupin/4c9c9fe0546eec38118dd4ef8d4c995d to your computer and use it in GitHub Desktop.
This EMR bootstrap script installed spacy
#!/bin/bash -xe
#### WARNING #####
## After modifying this script you have to push it on s3 with :
## aws s3 cp emr-bootstrap-script-spacy.sh s3://tf-emr-bootstrap-sandbox-eu-west-1
## Works with EMR 5.32.0, spacy 2.3.5
version=1.1
printf "Script $version"
# Non-standard and non-Amazon Machine Image Python modules:
sudo /usr/bin/pip3.7 install -U \
boto3 \
pandas \
langdetect \
hdfs \
tqdm \
pathos \
wikipedia \
filechunkio \
gensim \
termcolor \
awswrangler
# Install spacy dependancies
sudo /usr/bin/pip3.7 install -U \
numpy \
Cython \
pip
# After upgrading pip, path change
sudo /usr/local/bin/pip3.7 install -U spacy
python3 -m spacy download en_core_web_sm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment