Firstly, I strongly think that if you're working with NLP/ML/AI related tools, getting things to work on Linux and Mac OS is much easier and save you quite a lot of time.
Disclaimer: I am not affiliated with Continuum (conda), Git, Java, Windows OS or Stanford NLP or MaltParser group. And the steps presented below is how I, IMHO, would setup a Windows computer if I own one.
Please please please understand the solution don't just copy and paste!!! We're not monkeys typing Shakespeare ;P
To make sure that you get a working NLTK version that works properly for Windows when using Stanford / Malt,
Step 1a: Install Conda for Python 3.5 from https://www.continuum.io/downloads#_windows
Step 2: Install Git on your Machine from https://git-scm.com/download/win (Optional)
You can skip this if you're not going to use Git but I've left the screenshots here, just in case.
Step 2b: Check that Git works on Power Shell
Step 3: Install Java from http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html
Use ONLY one of the below commands in Powershell to install NLTK (NOT ALL of them)
Now, install the NLTK in Powershell using
conda install nltk
or to install the bleeding edge (also installing through Powershell)
pip install -U https://github.com/nltk/nltk/archive/develop.zip
or through git
:
pip install -U git+https://github.com/nltk/nltk.git
Stay within the Power Shell, don't close it yet. Open the Python3.5 interpreter within Powershell and run the following code:
The code below will automatically download and the files needed for MaltParser and the pre-trained English model.
REMEMBER TO CHANGE THE C:\Users\Thu\Desktop\
path to your user's Desktop path, e.g. if your user name is "Alvas" on Windows then most probably the path is C:\Users\Alvas\Desktop\
:
The following code snippets are tested within Windows Powershell (I suppose it should also work in other modern Python IDEs).
In Python3
:
import urllib.request
import zipfile
# First we retrieve the model file from the website.
urllib.request.urlretrieve(r'http://www.maltparser.org/mco/english_parser/engmalt.poly-1.7.mco', r'C:\Users\Thu\Desktop\engmalt.poly-1.7.mco')
# Then we retrieve the parser zip file from the website.
urllib.request.urlretrieve(r'http://maltparser.org/dist/maltparser-1.8.1.zip', r'C:\Users\Thu\Desktop\maltparser-1.8.1.zip')
# Then we create a Pythonic zipfile object by initializing it with the full path to the zipfile.
zfile = zipfile.ZipFile(r'C:\Users\Thu\Desktop\maltparser-1.8.1.zip')
# And asks python to extact the file to the directory: C:\Users\Thu\Desktop\maltparser-1.8.1
zfile.extractall(r'C:\Users\Thu\Desktop\maltparser-1.8.1')
from nltk.parse import malt
# We initialize the MaltParser API with the DIRECT PATH to the malt parser DIRECTORY (not the jar file) and the .mco file.
mp = malt.MaltParser(r'C:\Users\Thu\Desktop\maltparser-1.8.1', r'C:\Users\Thu\Desktop\engmalt.poly-1.7.mco')
mp.parse_one('I shot an elephant in my pajamas .'.split()).tree()
The code below will automatically download and the files needed for Stanford NER.
import urllib.request
import zipfile
urllib.request.urlretrieve(r'http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip', r'C:\Users\Thu\Desktop\stanford-ner-2015-04-20.zip')
zfile = zipfile.ZipFile(r'C:\Users\Thu\Desktop\stanford-ner-2015-04-20.zip')
zfile.extractall(r'C:\Users\Thu\Desktop\stanford-ner')
from nltk.tag.stanford import StanfordNERTagger
# First we set the direct path to the NER Tagger.
_model_filename = r'C:\Users\Thu\Desktop\stanford-ner\classifiers/english.all.3class.distsim.crf.ser.gz'
_path_to_jar = r'C:\Users\Thu\Desktop\stanford-ner\stanford-ner.jar'
# Then we initialize the NLTK's Stanford NER Tagger API with the DIRECT PATH to the model and .jar file.
st = StanfordNERTagger(model_filename=_model_filename, path_to_jar=_path_to_jar)
Gotcha, there won't be a spoon-fed answer here but the idea is the same as the above steps.
As said at the beginning of this gist, understand the solution don't just copy and paste!!! We're not monkeys typing Shakespeare ;P
Now using the knowledge from step 5a and 5b, use the same steps to get the Stanford POS tagger from http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip
If you need some hints, see:
- https://gist.github.com/alvations/e1df0ba227e542955a8a
- https://github.com/alvations/nltk_cli/blob/master/stanford.py
Do the same for Stanford Parser but do note that the API in NLTK for Stanford Parser is a little different and there will be a code overhaul once nltk/nltk#1249 is merged.
Hint: Reading this carefully will help a lot.
Disclaimer: Skip this to avoid hate, anger, suffering, etc; they're just my personal opinion =)
Now that the Stanford + MaltParser works in NLTK in Powershell. But you need a proper enviornment so that you code happily and enjoy the Python + NLP awesomeness, so here's some unsolicited advice ;P
- TRY NOT to use Python IDLE for NLP development (Python IDLE is a great tool to learn and start your Python journey but if you're going to do NLP work, you're better off using
notepad
and thecommand prompt
terminal or other IDE). Also, I encourage you to try https://try.jupyter.org/ instead IDLE if you're moving from the basic lessons. - Make sure that you get NLTK v3.2 (it has quite a lot of bugfixes, esp. better Python 3.5 support and better Windows support)
- TRY to use an IDE other than IDLE!! (There's lots of them out there, Atom, Vim, Emacs, PyCharm, Eclipse+PyDev, etc.)
- Try IPython Notebooks (https://ipython.org/ipython-doc/2/install/install.html#windows)
- Get Unix or Mac.