Skip to content

Instantly share code, notes, and snippets.

@andreasvc
andreasvc / tiger2.2singleparent.diff
Created November 2, 2013 23:48
The TigerXML versions of the Tiger corpus contain a few nodes with multiple <edge> elements for a single node. The following is a patch against version 2.2 of the Tiger corpus that removes such edges to follow the v2.1 export version of the corpus. Since the export version does not contain these edges as secondary edges, they are probably spurious.
--- tiger_release_aug07.corrected.16012013.xml 2013-01-16 16:35:23.000000000 +0100
+++ tiger_2.2a.xml 2013-11-03 00:02:12.890306125 +0100
@@ -3097934,7 +3097934,6 @@
<nt id="s46234_505" cat="PP">
<edge label="AC" idref="s46234_24" />
<edge label="NK" idref="s46234_25" />
- <edge label="CJ" idref="s46234_135" />
</nt>
<nt id="s46234_506" cat="PP">
<edge label="AC" idref="s46234_30" />
@andreasvc
andreasvc / evalpatterns.py
Created October 22, 2013 10:10
Run a set of XPath queries on a corpus of parse trees and compute precision and recall with respect to a set of hand-picked sentences.
""" Run a set of XPath queries on a corpus of parse trees and compute precision
and recall with respect to a set of hand-picked sentences. """
from __future__ import print_function
import io
import os
import glob
import nltk
import alpinocorpus
@andreasvc
andreasvc / imdbimages.py
Created September 27, 2013 11:09
Generate image with plot & rating for each movie in a directory. Useful for media players etc. Each directory in the current path is treated as the name of a movie, and data for it is obtained from IMDB (through an unofficial API). A PNG file is created with a plot summary using Python Imaging and stored in that directory as 'info.png'.
""" Generate image with plot & rating for each movie in a directory. """
from __future__ import print_function
import os
import re
import sys
import glob
import json
import time
import urllib
import textwrap
@andreasvc
andreasvc / logprobs.py
Last active December 20, 2015 22:28
Compare different strategies for adding a large number of small log probabilities.
""" Compare different strategies for adding a large number of small log
probabilities. """
from __future__ import print_function
from math import log, exp, fsum, isinf
from random import expovariate
N = 10000
def logprobadd(x, y):
@andreasvc
andreasvc / sentnums.py
Last active December 20, 2015 08:49
Match lines in one file with those of another, and produce line numbers.
""" Match lines in one file with those of another,
and produce line numbers. """
import io
import sys
USAGE = """Match lines in one file with those of another, and get line numbers.
usage: python %s sents text output
where sents and text are files with one sentence per line.
The result will be of the form "1|line", written to file "output".
Everything is assumed to be encoded with UTF-8.""" % sys.argv[0]
@andreasvc
andreasvc / cartpi.py
Last active March 29, 2017 02:33
Get the cartesian product of an arbitrary number of iterables, including infinite sequences.
def cartpi(seq):
""" A depth-first cartesian product for a sequence of iterables;
i.e., all values of the last iterable are consumed before advancing the
preceding ones. Like itertools.product(), but supports infinite sequences.
>>> from itertools import islice, count
>>> list(islice(cartpi([count(), count()]), 9))
[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8)]
"""
if seq:
@andreasvc
andreasvc / bug.pyx
Last active December 16, 2015 11:18
cdef extern from "macros.h":
# test whether the b'th bit of array a is set:
unsigned long TESTBIT(unsigned long a[], int b)
cdef unsigned long foo[2]
foo[0] = 281474976710656UL
foo[1] = 0
# what the macro does:
print foo[0] & (1UL << 48)
@andreasvc
andreasvc / pixel sorting.ipynb
Created February 17, 2013 17:15
Convert image to 1-dimensional sequence of RGB pixels, sort, and convert back.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@andreasvc
andreasvc / draaideur.py
Created January 10, 2013 22:27
Given a dictionary, find words for which all rotations occur in the dictionary.
import sys, collections
def rotations(a):
return {a[x:] + a [:x] for x in range(1, len(a))}
lexicon = collections.defaultdict(set)
for a in open(sys.argv[1]):
lexicon[len(a) - 1].add(a.strip())
for length in sorted(lexicon):
if length == 1:
continue
@andreasvc
andreasvc / t.lex
Created November 18, 2012 23:33
TSG transforms
The D 1 D@1-1 1 D@0-1 1
dog N 0.5 N@0-2 1
cat N@1-2 1 N 0.5
barks V 0.5 V@0-4 1
meows V 0.5 V@1-4 1
loudly RB 1 RB@0-5 1 RB@1-5 1