by Philip Herron
Birmingham: Packt Publishing, 2013, available in print and as ebook; this review is based on the PDF, 110 pp.
Reviewed by
Andreas van Cranenburgh
University of Amsterdam
| // ==UserScript== | |
| // @name Gmane vertical frames | |
| // @namespace [email protected] | |
| // @include http://news.gmane.org/* | |
| // @include http://thread.gmane.org/* | |
| // @version 1 | |
| // @grant none | |
| // ==/UserScript== | |
| // The default GMane 'news' view has horizontal panes which wastes lots of screen space; |
| """Extract metadata from Project Gutenberg RDF catalog into a Python dict. | |
| Based on https://bitbucket.org/c-w/gutenberg/ | |
| >>> md = readmetadata() | |
| >>> md[123] | |
| {'LCC': {'PS'}, | |
| 'author': u'Burroughs, Edgar Rice', | |
| 'authoryearofbirth': 1875, | |
| 'authoryearofdeath': 1950, |
| """Apply PCA to a CSV file and plot its datapoints (one per line). | |
| The first column should be a category (determines the color of each datapoint), | |
| the second a label (shown alongside each datapoint).""" | |
| import sys | |
| import pandas | |
| import pylab as pl | |
| from sklearn import preprocessing | |
| from sklearn.decomposition import PCA |
These scripts produce the train-dev-test splits for the Tiger & Lassy treebanks
used in my 2013 IWPT paper. The Tiger treebank version 2.1 was used, namely
tiger_release_aug07.export. The Lassy treebank was version 1.1, or
lassy-r19749. The reason for not just taking the last 20% for the
development & test set is to ensure a balanced distribution of sentences, which
otherwise would have an uneven distribution of length & topics.
| """ A simple multiprocessing example with process pools, shared data and | |
| per-process initialization. """ | |
| import multiprocessing | |
| # global read-only data can be shared by each process | |
| DATA = 11 | |
| def initworker(a): | |
| """ Initialize data specific to each process. """ | |
| global MOREDATA |
| """ Classify rows from CSV files with SVM with leave-one-out cross-validation; | |
| labels taken from first column, of the form 'label_description'. """ | |
| import sys | |
| import pandas | |
| from sklearn import svm, cross_validation, preprocessing | |
| data = pandas.read_csv(sys.argv[1]) | |
| xdata = data.as_matrix(data.columns[1:]) | |
| #xdata = preprocessing.scale(xdata) # normalize data => mean of 0, stddev of 1 | |
| ylabels = [a.split('_')[0] for a in data.icol(0)] | |
| ytarget = preprocessing.LabelEncoder().fit(ylabels).transform(ylabels) |