Skip to content

Instantly share code, notes, and snippets.

View jjfiv's full-sized avatar
🦀
RiiR

John Foley jjfiv

🦀
RiiR
View GitHub Profile
@jjfiv
jjfiv / IndexFromGalago.java
Created November 1, 2016 20:30
import from a Galago index to a Lucene index
import ciir.jfoley.chai.time.Debouncer;
import org.apache.lucene.analysis.en.EnglishAnalyzer;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.FSDirectory;
import org.jsoup.Jsoup;
@jjfiv
jjfiv / mk.py
Created January 3, 2017 15:41
my usual make generation voodoo
import socket
host = socket.gethostname()
onSydney=(host == 'sydney.cs.umass.edu')
if onSydney:
print('CRFSUITE:=/mnt/nfs/work3/jfoley/bin/crfsuite-0.12/bin/crfsuite')
print("PREFIX=qsub -b y -cwd -sync y -l mem_free=8G -l mem_token=8G -o $@.out -e $@.err ")
print("JAVA:=/mnt/nfs/work3/jfoley/bin/jdk1.8.0_31/bin/java -ea -Xmx7G")
print("SUFFIX:=") # log file created through qsub
else:
@jjfiv
jjfiv / AP.java
Created February 25, 2017 22:05
AP that doesn't sort or use other datastructures. Just List<Boolean> for relevance in ranked list.
public static double computeAP(List<Boolean> isTruePositiveFromRanking, int numRelevant) {
// if there are no relevant documents,
// the average is artificially defined as zero, to mimic trec_eval
// Really, the output is NaN, or the query should be ignored [point of debate]
if(numRelevant == 0) return 0;
double sumPrecision = 0;
int recallPointCount = 0;
for (int i = 0; i < data.size(); i++) {
def ternary(bool, pos, neg):
if bool:
return pos
else:
return neg
@jjfiv
jjfiv / pack_lstm.py
Last active June 19, 2018 18:36
Pack sequences within a batch for a pytorch LSTM.
import numpy as np
import torch
from torch.nn.utils.rnn import pack_padded_sequence, pad_sequence
def pack_lstm(items, lstm):
N = len(items)
reorder_args = np.argsort([len(it) for it in items])[::-1]
origin_args = torch.from_numpy(np.argsort(reorder_args))
ordered = [items[i] for i in reorder_args]
packed_items = pack_padded_sequence(pad_sequence(ordered, batch_first=True), [len(od) for od in ordered], batch_first=True)
import agent_ql as aq
# Diaz, F. "Condensed List Relevance Models." (ICTIR 2015)
def CLRM3(query, originalWeight=0.3):
first_pass = aq.ql(aq.tokenize(query))
RM = aq.term_probability_model()
for doc in first_pass.search_now():
RM += doc.to_term_probabilities() * doc.score
return first_pass.results().re_rank( first_pass.mixture_model(RM, originalWeight) )
@jjfiv
jjfiv / GuessingGame.java
Last active September 6, 2018 18:14
P0 Solution
import java.util.Random;
import java.util.Scanner;
// We discussed academic honesty, so when you re-type this code, be sure to cite it in a comment!
public class GuessingGame {
/**
* A Java program will run code in a special ``main`` method.
* Note that Java has two types of comments: block (slash-star ... star-slash), and line ("slash-slash") comments.
* For now we ignore args, which is an array of strings that the user might have passed in.
@jjfiv
jjfiv / Echo.java
Last active February 11, 2019 22:30
Giving away complex Echo program for CSC262P1
package edu.smith.cs.csc262.coopsh.apps;
import edu.smith.cs.csc262.coopsh.ShellEnvironment;
import edu.smith.cs.csc262.coopsh.Task;
/**
* This is a full implementation of Echo.
* @author jfoley
*
*/
@jjfiv
jjfiv / ZipSplit.java
Created June 5, 2019 17:11
ZipSplit.java
import java.io.*;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipOutputStream;
public class ZipSplit {
@jjfiv
jjfiv / tantivy_stats.rs
Last active December 31, 2019 16:27
Collecting statistics from Tantivy's index structures.
use std::convert::TryInto;
use tantivy::{Searcher, Term};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CountStats {
pub collection_frequency: u64,
pub document_frequency: u64,
pub collection_length: u64,
pub document_count: u64,
}