Skip to content

Instantly share code, notes, and snippets.

@bencomp
bencomp / author-keys_2012-01-31.csv
Created February 22, 2012 12:11
Identifiers found in Open Library Edition records, keys found in all records by record type
key records
_date 1
alternate_names 18975
authors 21
bio 9291
birth_date 1240612
body 1
by_statement 10
comment 12165
contributions 1
@bencomp
bencomp / author-entity_type-choices.tsv
Created February 23, 2012 21:02 — forked from anandology/author-keys_2012-01-31.csv
Open Library statistics and errors
org 306
person 82
author 34
Writer 17
Pseudonym 8
individual 6
Escritor 5
poetry 5
Book 4
Poet 4
@bencomp
bencomp / readme.txt
Created October 10, 2012 06:35
Ben's statistics for Open Library's dump of September 30, 2012
{
"confused": [
[
"/type/page",
"/works/OL8483608W/Jack-and-the-Beanstalk-(Another-Point-of-View)"
],
[
"/type/backreference",
"/type/author/books"
],
@bencomp
bencomp / return_unittitle_with_comma.xq
Created June 13, 2013 12:17
Return all <unittitle> elements in which the first word (sequence of non-space characters) ends with a comma.
let $content := <top><unittitle>Test, another word</unittitle></top>
return $content//unittitle[ends-with(tokenize(./text(), " ")[1], ",")]
@bencomp
bencomp / log
Created September 18, 2013 15:38
After I added apache-jena-libs as a Maven dependency, I see this in the Eclipse console when I run a JUnit test. Also, nothing happens after this, except that java goes to 100% cpu.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/ben/.m2/repository/ch/qos/logback/logback-classic/1.0.0/logback-classic-1.0.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/ben/.m2/repository/org/slf4j/slf4j-log4j12/1.6.4/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.selector.DefaultContextSelector]
@bencomp
bencomp / convert.sh
Last active January 4, 2016 21:59
XSLT conversion and test strategy
#!/bin/bash
# Generalised conversion and validation script for XSLT conversions:
# 1. Convert all XMLs in an input directory to new files in an output directory
# 2. Validate all output XMLs against an XML Schema
# 3. Summarise the validation results
# (4. Fix XSLT and re-run)
# Input XML files are in `./inputxml`
# Output XML files go in `./outputxml`
@bencomp
bencomp / glassfish
Created August 1, 2014 15:50
Grok patterns for Glassfish server.log and access.log
# Glassfish server.log format. May span multiple lines (e.g. Java stacktrace), so in logstash use with multiline codec/filter.
THREADNAME Thread-%{INT:threadnumberinname}
GLASSFISHTHREADS _ThreadID=%{INT:threadid};_ThreadName=%{THREADNAME};
GLASSFISHLOG \[#\|%{TIMESTAMP_ISO8601:timestamp}\|%{LOGLEVEL:loglevel}\|%{DATA:application}\|%{GREEDYDATA:component}\|%{GLASSFISHTHREADS:threadinfo}\|%{GREEDYDATA:message}\|#\]
# Glassfish access.log format
GLASSFISHACCESS "%{IPORHOST:clientip}" "%{USER:auth}" "%{HTTPDATE:timestamp}" "(?:%{WORD:verb} %{SESSIONREQUEST:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
@bencomp
bencomp / vagrant.log
Created January 14, 2015 15:17
Log of vagrant up from the dataverse code, 2015-01-14
ben:dataverse ben$ vagrant up
OPERATING_SYSTEM environment variable not specified. Using centos by default.
To specify it in bash: export OPERATING_SYSTEM=debian
MAIL_SERVER environment variable not specified. Using localhost by default.
To specify it in bash: export MAIL_SERVER=localhost
Bringing machine 'standalone' up with 'virtualbox' provider...
Bringing machine 'solr' up with 'virtualbox' provider...
Bringing machine 'test' up with 'virtualbox' provider...
==> standalone: Importing base box 'puppet-vagrant-boxes.puppetlabs.com-centos-65-x64-virtualbox-puppet.box'...
==> standalone: Matching MAC address for NAT networking...
Running edu.harvard.iq.dataverse.util.xml.XmlPrinterTest
Warning: org.apache.xerces.parsers.SAXParser: Property 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized.
Warning: org.apache.xerces.parsers.SAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized.
Warning: org.apache.xerces.parsers.SAXParser: Property 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized.
Warning: org.apache.xerces.parsers.SAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized.
[Fatal Error] :1:1: Content is not allowed in prolog.
ERROR: 'Content is not allowed in prolog.'
mrt 31, 2016 10:28:25 AM edu.harvard.iq.dataverse.util.xml.XmlPrinter prettyPrintXml
INFO: Returning XML as-is due to problem pretty printing it: javax.xml.transform.TransformerException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
Tests run: 2, Failures: 1, Errors: 0
@bencomp
bencomp / compose.log
Created May 13, 2020 15:53
Trellis-LDP startup
$ docker-compose -f ~/serverconf/bencomp/files/docker-compose.trellis.yml up
files_trellisdb_1 is up-to-date
Starting files_trellis_1 ... done
Attaching to files_trellisdb_1, files_trellis_1
trellisdb_1 |
trellisdb_1 | PostgreSQL Database directory appears to contain a database; Skipping initialization
trellisdb_1 |
trellisdb_1 | 2020-05-13 14:29:53.133 UTC [1] LOG: starting PostgreSQL 12.2 (Debian 12.2-2.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
trellisdb_1 | 2020-05-13 14:29:53.133 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
trellisdb_1 | 2020-05-13 14:29:53.134 UTC [1] LOG: listening on IPv6 address "::", port 5432