Skip to content

Instantly share code, notes, and snippets.

View bnewbold's full-sized avatar

bnewbold

View GitHub Profile
@bnewbold
bnewbold / README.md
Created November 5, 2019 05:00
IA/COS Registration Download Example

This gist contains hack-y python scripts that pull registration content (as JSON) and any files and wiki pages into a directory structure. One then runs a bagit script and mixes in metadata to get a BagPack.

NOTE: the bagit script I used wasn't BagPack-aware, so it didn't actually include the files under ./metadata/ in the manifests. Also all the bagit metadata is just defaults; these examples are just to show the "shape" of the results.

Check how example items show up on archive.org in this collection: https://archive.org/details/cos-dev-sandbox

Thoughts

Description HTML is probably going to link to any child registrations (items). Should also link back to, at least, collection page.

@bnewbold
bnewbold / README.md
Created October 15, 2019 14:44
ROR-ification Experiments (FORCE11 2019 ROR Workshop)

Can experiment with GROBID affiliation extraction (raw affiliation string to structured object) at, eg, http://grobid.qa.fatcat.wiki/. Go to "TEI", then select "Process Affiliations".

GROBID dataset here is from millions of research papers found on the web by wayback crawlers. The structured affiliations were gathered into a dataset available at https://archive.org/details/ia_research_affiliation_datasets.

The bioxriv dataset (of top 1000 institutional affiliations, mostly

fixed_issnl issnl fatcat_ident name
1530-1311 1550-1311 2nklacmgkjdfjib7kqi3hdh76a International Symposium on Temporal Representation and Reasoning/TIME, Proceedings of the
2306-0441 2223-0441 rngvdeed65ffhmgfxti7s2z6by Journal of Local and Global Health Science
1641-6554 1641-6565 yviicehmubf4bcxo23q43sbkzu Kolposkopia
1526-7539 1526-7639 pnyefvclqfabjlnl4suox6pdte International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Proceedings of the
0276-6574 0276-6547 hcrk2xeoknf7daxwurje2vg3n4 Computers in Cardiology Conference
0018-9413 0359-4237 ngthxcdwgzhovfgimtll6owdnm IEEE Transactions on Geoscience Electronics
2630-4301 drgmggxvfjbjjkakrjegaujkgy Food Modelling Journal
1089-7771 1086-7771 sjjfcknh3zawndw6jdmvfrix7a IEEE transactions on information technology in biomedicine
1093-1139 1039-1139 nnrvd2qmhzbebk2hsopvtchodq Academic Physician and Scientist

RE: https://phabricator.wikimedia.org/T223528

The images of the National Library of Aruba are on the internet archive (https://archive.org/details/bibliotecanacionalaruba). Can you get their images from their site (currently 1.017) and put them (urls) + all the metadata into OpenRefine or CSV file so I can prepare them for upload to Wikimedia Commons?

First, you need the following command line tools installed:

@bnewbold
bnewbold / dat-spec-thoughts.md
Last active October 31, 2017 03:24
Feedback on dat spec/paper

I've been implementing a dat client in Rust: https://github.com/bnewbold/geniza

It's been fun! The "whitepaper"/spec has been very helpful. Below are a few thoughts/comments on the paper, documentation, and protocols.

Informal Protocol Proposals

With my archival and inter-op hat on, I wish that the hyperdrive metadata register (specifically Node protobuf messages) included standard full-file hashes (eg, SHA1 or BLAKE2b of the entire file, with no length prefix). These could be optional, but could presumably be calculated when adding files to archives with little overhead. This could make auditing, verification, and interoperability between distributed networks easier. Storage and compute overhead would be non-zero.

It seems like the network protocol really should have a version field... in the initial pre-encryption Register message?

var hyperdrive = require('hyperdrive')
var Dat = require('dat-node')
var datHttp = require('dat-http')
var storage = datHttp('https://static.bnewbold.net/tmp/dummy-dat/')
console.log('Starting...')
Dat(storage, function (err, dat) {
if (err) throw err
console.log('Writable?', dat.writable)
@bnewbold
bnewbold / README.md
Last active October 16, 2017 11:52
Broken python3 gluish TSV unicode

Do the whole thing:

PYTHONPATH='.' luigi --module small Small --local-scheduler
@bnewbold
bnewbold / pyrecwrap.jl
Last active June 16, 2016 20:14
Recursive version of PyCall.jl's pywrap(), for turning nested Python modules into nested Julia modules
#
# NOTE: This code is actually *not* used in PyX.jl right now, it's just here as
# an example.
#
# This file contains a recursive version of the pywrap() function from PyCall:
# it will generate nested Julia Modules for nested Python modules.
# This extends to recursive or infinitely looping modules. For example, the
# Python "os" module goes infinitely deep:
#
TTY/Serial Scalability Protocol Transport
==========================================
This document describes a new transport for the Scalability Protocols (SP). The
Scalability Protocols (as implemented in the nanomsg library) specify
lightweight protocols for common message-passing patterns in distributed
systems. They are specifically desgined to accomodate both new message types
and new message transports (as well as being completely agnostic to the content
or encoding of all messages).
@bnewbold
bnewbold / xrep_test.c
Created April 3, 2015 19:22
nanomsg xrep_test.c
/*
* nanomsg "XREP" RAW REQ/REP header handling test.
*
* Terminal #1:
* gcc xrep_test.c -o xrep_test -Wall -lnanomsg
* ./xrep_test
*
* Terminal #2:
* nn_req -l 5555 --ascii --data "ping"
*/