Skip to content

Instantly share code, notes, and snippets.

This file has been truncated, but you can view the full file.
WORD ABBREVIATIONS LANGUAGE CODES
's-Graveland n.a. dut
's-Gravenhage n.a. dut
's-Gravenmoer n.a. dut
's-Heerenberg n.a. dut
's-Hertogenbosch n.a. dut
-agōgē -ag. gre
-aineisto -ain. fin
-Alföld -Alf. hun
-arvio n.a. fin
<parameters>
<stopper>
<word>a</word>
<word>about</word>
<word>above</word>
<word>according</word>
<word>across</word>
<word>after</word>
<word>afterwards</word>
@lgrz
lgrz / baz.txt
Created April 14, 2021 05:30
parsable csv version of the LTR features at https://www.microsoft.com/en-us/research/project/mslr/
feature-id,feature-description,stream,comments
1,covered query term number,body,
2,covered query term number,anchor,
3,covered query term number,title,
4,covered query term number,url,
5,covered query term number,whole document,
6,covered query term ratio,body,
7,covered query term ratio,anchor,
8,covered query term ratio,title,
9,covered query term ratio,url,
@lgrz
lgrz / dm.pl
Created April 3, 2021 23:14
Metzler's DM query generator - with ARGV
#!/usr/bin/perl
#
# Perl subroutine that generates Indri dependence model queries.
#
# Written by: Don Metzler ([email protected])
# Last updated: 06/27/2005
#
# Feel free to distribute, edit, modify, or mangle this code as you see fit. If you make any interesting
# changes please email me a copy.
#!/usr/bin/perl
#
# Perl subroutine that generates Indri dependence model queries.
#
# Written by: Don Metzler ([email protected])
# Last updated: 06/27/2005
#
# Feel free to distribute, edit, modify, or mangle this code as you see fit. If you make any interesting
# changes please email me a copy.
\begin{tabular}{r@{$.$}l}
$3$ &$14$ \\
$9$ &$80665$
\end{tabular}
<parmeters>
<memory>128G</memory>
<index>myindex</index>
<storeDocs>true</storeDocs>
<stemmer><name>krovetz</name></stemmer>
<corpus>
<path>/path/to/corpus</path>
<class>warc</class>
<inlink>/path/to/links/sorted</inlink>
</corpus>
# build sdm template file
awk -F\; 'BEGIN {
print "<parameters>"
}
{
"./dm.pl \"" $2 "\" sd 1WFI 1WOD 1WUW" | getline qry
printf "<query><number>%s</number><text>%s</text></query>\n", $1, qry
}
END {
print "</parameters>"
https://blog.archive.org/developers/
:g/^/p| call system(getline('.'))