Skip to content

Instantly share code, notes, and snippets.

View renaud's full-sized avatar

Renaud Richardet renaud

View GitHub Profile
@renaud
renaud / DkPro Binary Cas evaluation.md
Last active December 28, 2015 13:09
DkPro Binary Cas evaluation
@renaud
renaud / subscripts.java
Last active December 27, 2015 23:59
finding misextracted subscripts in pdfs, using PdfTextStream
@Test
public void testSubscripts() throws Exception {
final Pattern SUBSCRIPTS = Pattern.compile("^[ \\d]{10,1000}$");
File ROOT = new File(
"/Volumes/scratch/richarde/pdfs/201307/");
for (File pdf : ROOT.listFiles()) {
if (pdf.getName().endsWith(".pdf")) {
try {
@renaud
renaud / git_aliases.sh
Created October 30, 2013 13:46
aliases to list files to add/delete from git
alias gita="git status | egrep 'modified: ' | sed $'s/#\tmodified:/git add/g'"
alias gitd="git status | egrep 'deleted: ' | sed $'s/#\tdeleted:/git rm/g'"
@renaud
renaud / go_fetch.py
Created October 17, 2013 07:36
Retrieve synonyms of gene ontology (GO) terms
import urllib, sys
from xml.etree import cElementTree as ElementTree
def get_go_name(go_id):
sys.stdout.write("GO"+go_id+"\t"),
#get the GO entry as XML
xml = urllib.urlopen("http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:"+go_id+"&format=oboxml")
#open in cElementTree, for fast XML parsing
for event, element in ElementTree.iterparse(xml):
#need to make sure we are getting the name contained within the 'term' entry
package topic
import spark.broadcast._
import spark.SparkContext
import spark.SparkContext._
import spark.RDD
import spark.storage.StorageLevel
import scala.util.Random
import scala.math.{ sqrt, log, pow, abs, exp, min, max }
import scala.collection.mutable.HashMap
@renaud
renaud / dca2ldac.py
Last active December 25, 2015 09:39
Transforms topic-model input file, from DCA format (space separated) to LDA-C format (column-separated)
'''
Transforms topic-model input file,
from DCA format (space separated)
to LDA-C format (column-separated)
@author [email protected]
'''
import sys
dca_file = sys.argv[1]
@renaud
renaud / migrate_to_uimaFIT_2.sh
Created September 27, 2013 09:57
shell commands to ease the migration to uimaFIT version 2
#!/bin/sh
############################################
# MAKE SURE TO BACKUP YOUR FILES FIRST
############################################
# see http://uima.apache.org/d/uimafit-2.0.0/tools.uimafit.book.html#d5e617
#Change of package names:
find . -name '*.java' -print | xargs perl -p -i -e 's/org.uimafit/org.apache.uima.fit/g'
@renaud
renaud / 1_ReferencesClassifier2.java
Last active December 22, 2015 19:39
Mallet MaxEnt classifier for paper references
package ch.epfl.bbp.uima.projects.references;
import static cc.mallet.pipe.iterator.FileIterator.LAST_DIRECTORY;
import static com.google.common.collect.Lists.newArrayList;
import static java.util.regex.Pattern.compile;
import static org.apache.commons.lang.StringUtils.join;
import static org.slf4j.LoggerFactory.getLogger;
import java.io.File;
import java.io.FileFilter;
@renaud
renaud / Biostar45366.java
Last active December 20, 2015 01:49 — forked from lindenb/Biostar45366.java
OBO parser
package ch.epfl.bbp.nlp.obo;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
<!DOCTYPE html>
<html>
<head>
<title>Hellllo</title>
<style type="text/css">
html, body {
background-color:#000;