Installation notes on the GLAD repository.
These examples are *nix-based. Hopefully windows will not give too much extra trouble. If you run into anything, please post a comment on this gist with
- your operating system
- the command you ran
- the output of the command (trim for readability if needed)
#1 install anaconda https://docs.continuum.io/anaconda/install
#2 clone the git repo (use ssh - the top link - if you know how)
git clone [email protected]:pan-webis-de/glad.git
git clone https://github.com/pan-webis-de/glad.git
#3
cd glad
#4 installing python dependencies
See Zahra's comment below.
Then to install nltk data, open a python shell (with the glad environment active) with
python3
and in there type
import nltk
nltk.download()
This should show a little UI. Browse to "Models" and select "punkt" in the list. Press download.
#5 downloading the data
Go to http://pan.webis.de/clef15/pan15-web/author-identification.html and download both the training and testing corpus
unzip both zips in the glad directory. This will create directories that I've renamed to "training" and testing "respectively". In these directories, there are more zips.
I've only unpacked the english ones (both training and testing) for now.
#6 run
This command should now train and test the model
note: make sure the conda environment is active, with source activate glad.
python3 glad-main.py --training training/pan15-authorship-verification-training-dataset-english-2015-04-19 -i testing/pan15-authorship-verification-test-dataset2-english-2015-04-19/ --save_model models/default
On my computer, this spews out DeprecationWarnings, because my python is up to date and the code from the repositories is 2 years old. This should be fine for now (but is definitely something we should fix/report for fixing).
- there are 5 predefined feature combinations. The default seems to just select all features.
- the code is generally well-documented
- the python package
argparseis used to create the command line interface. This should be easy to modify to suit our needs.
If the program gives error, try to create a new environment for anaconda using the following command lines
save the following as
req.txtNow we have to do some patching (because the code is old)
replace line 551 of glad-main.py with:
and replace
__scale_featureswith this definitionwhen you want to run
glad-main.py, make sure you are in the conda environment first (withsource activate glad).