Installation notes on the GLAD repository.

These examples are *nix-based. Hopefully windows will not give too much extra trouble. If you run into anything, please post a comment on this gist with

your operating system
the command you ran
the output of the command (trim for readability if needed)

#1 install anaconda https://docs.continuum.io/anaconda/install

#2 clone the git repo (use ssh - the top link - if you know how)

git clone [email protected]:pan-webis-de/glad.git
git clone https://github.com/pan-webis-de/glad.git

cd glad

#4 installing python dependencies

See Zahra's comment below.

Then to install nltk data, open a python shell (with the glad environment active) with

python3

and in there type

import nltk
nltk.download()

This should show a little UI. Browse to "Models" and select "punkt" in the list. Press download.

#5 downloading the data

Go to http://pan.webis.de/clef15/pan15-web/author-identification.html and download both the training and testing corpus

unzip both zips in the glad directory. This will create directories that I've renamed to "training" and testing "respectively". In these directories, there are more zips. I've only unpacked the english ones (both training and testing) for now.

#6 run

This command should now train and test the model

note: make sure the conda environment is active, with source activate glad.

python3 glad-main.py --training training/pan15-authorship-verification-training-dataset-english-2015-04-19 -i testing/pan15-authorship-verification-test-dataset2-english-2015-04-19/ --save_model models/default

On my computer, this spews out DeprecationWarnings, because my python is up to date and the code from the repositories is 2 years old. This should be fine for now (but is definitely something we should fix/report for fixing).

Further notes

there are 5 predefined feature combinations. The default seems to just select all features.
the code is generally well-documented
the python package argparse is used to create the command line interface. This should be easy to modify to suit our needs.

# This file may be used to create an environment using: # $ conda create --name <env> --file <this file> # platform: linux-64 libgfortran=3.0.0=1 mkl=2017.0.1=0 nltk=3.2.2=py35_0 numpy=1.12.0=py35_0 openssl=1.0.2k=1 pip=9.0.1=py35_1 python=3.5.3=0 readline=6.2=2 scikit-learn=0.18.1=np112py35_1 scipy=0.18.1=np112py35_1 setuptools=27.2.0=py35_0 six=1.10.0=py35_0 sqlite=3.13.0=0 tk=8.5.18=0 wheel=0.29.0=py35_0 xz=5.2.2=1 zlib=1.2.8=3

def __scale_features(dataset): """ Check if features should and could be scaled. :param dataset: an array of arrays :return: scaled feature set if requested/possible, else original dataset. """ try: if (not args.no_feature_scaling) and clf.scaling_possible: # scale features if requested, warn if impossible return preprocessing.scale(dataset) elif (not args.no_feature_scaling) and (not clf.scaling_possible): log.warning("Can't scale features with classifier '%s'. Proceeding without feature scaling." % args.clf) return dataset except ValueError: return dataset

folkertdev/GLAD.md

Select an option

No results found

Select an option

No results found

Further notes

zahrafitrianti commented Mar 4, 2017 •

edited by folkertdev

Loading

Uh oh!

folkertdev/GLAD.md

Further notes

zahrafitrianti commented Mar 4, 2017 • edited by folkertdev Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zahrafitrianti commented Mar 4, 2017 •

edited by folkertdev

Loading