Skip to content

Instantly share code, notes, and snippets.

@folkertdev
Last active March 4, 2017 16:15
Show Gist options
  • Select an option

  • Save folkertdev/3fa00beaf72e42f04e041fbe20678f27 to your computer and use it in GitHub Desktop.

Select an option

Save folkertdev/3fa00beaf72e42f04e041fbe20678f27 to your computer and use it in GitHub Desktop.
Installation notes on the GLAD repository.

Installation notes on the GLAD repository.

These examples are *nix-based. Hopefully windows will not give too much extra trouble. If you run into anything, please post a comment on this gist with

  • your operating system
  • the command you ran
  • the output of the command (trim for readability if needed)

#1 install anaconda https://docs.continuum.io/anaconda/install

#2 clone the git repo (use ssh - the top link - if you know how)

git clone [email protected]:pan-webis-de/glad.git
git clone https://github.com/pan-webis-de/glad.git

#3

cd glad 

#4 installing python dependencies

See Zahra's comment below.

Then to install nltk data, open a python shell (with the glad environment active) with

python3

and in there type

import nltk
nltk.download()

This should show a little UI. Browse to "Models" and select "punkt" in the list. Press download.

#5 downloading the data

Go to http://pan.webis.de/clef15/pan15-web/author-identification.html and download both the training and testing corpus

unzip both zips in the glad directory. This will create directories that I've renamed to "training" and testing "respectively". In these directories, there are more zips. I've only unpacked the english ones (both training and testing) for now.

#6 run

This command should now train and test the model

note: make sure the conda environment is active, with source activate glad.

python3 glad-main.py --training training/pan15-authorship-verification-training-dataset-english-2015-04-19 -i testing/pan15-authorship-verification-test-dataset2-english-2015-04-19/ --save_model models/default

On my computer, this spews out DeprecationWarnings, because my python is up to date and the code from the repositories is 2 years old. This should be fine for now (but is definitely something we should fix/report for fixing).

Further notes

  • there are 5 predefined feature combinations. The default seems to just select all features.
  • the code is generally well-documented
  • the python package argparse is used to create the command line interface. This should be easy to modify to suit our needs.
@zahrafitrianti
Copy link

zahrafitrianti commented Mar 4, 2017

If the program gives error, try to create a new environment for anaconda using the following command lines

save the following as req.txt

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
libgfortran=3.0.0=1
mkl=2017.0.1=0
nltk=3.2.2=py35_0
numpy=1.12.0=py35_0
openssl=1.0.2k=1
pip=9.0.1=py35_1
python=3.5.3=0
readline=6.2=2
scikit-learn=0.18.1=np112py35_1
scipy=0.18.1=np112py35_1
setuptools=27.2.0=py35_0
six=1.10.0=py35_0
sqlite=3.13.0=0
tk=8.5.18=0
wheel=0.29.0=py35_0
xz=5.2.2=1
zlib=1.2.8=3
conda create --name glad --file req.txt

# activate the new environment you just created with
source activate glad

# install arff into the environment
pip install arff

Now we have to do some patching (because the code is old)

replace line 551 of glad-main.py with:

svm.SVC(probability=True, kernel="sigmoid", C=0.01, coef0=0.01, gamma='auto')'

and replace __scale_features with this definition

def __scale_features(dataset):
    """
    Check if features should and could be scaled.

    :param dataset: an array of arrays
    :return: scaled feature set if requested/possible, else original dataset.
    """
    try:
        if (not args.no_feature_scaling) and clf.scaling_possible:  # scale features if requested, warn if impossible
            return preprocessing.scale(dataset)

        elif (not args.no_feature_scaling) and (not clf.scaling_possible):
            log.warning("Can't scale features with classifier '%s'. Proceeding without feature scaling." % args.clf)
            return dataset

    except ValueError: 
        return dataset

when you want to run glad-main.py, make sure you are in the conda environment first (with source activate glad).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment