Skip to content

Instantly share code, notes, and snippets.

View bnewbold's full-sized avatar
🌴
Vacation / Travel

bnewbold

🌴
Vacation / Travel
View GitHub Profile
@bnewbold
bnewbold / pyrecwrap.jl
Last active June 16, 2016 20:14
Recursive version of PyCall.jl's pywrap(), for turning nested Python modules into nested Julia modules
#
# NOTE: This code is actually *not* used in PyX.jl right now, it's just here as
# an example.
#
# This file contains a recursive version of the pywrap() function from PyCall:
# it will generate nested Julia Modules for nested Python modules.
# This extends to recursive or infinitely looping modules. For example, the
# Python "os" module goes infinitely deep:
#
@bnewbold
bnewbold / README.md
Last active October 16, 2017 11:52
Broken python3 gluish TSV unicode

Do the whole thing:

PYTHONPATH='.' luigi --module small Small --local-scheduler
var hyperdrive = require('hyperdrive')
var Dat = require('dat-node')
var datHttp = require('dat-http')
var storage = datHttp('https://static.bnewbold.net/tmp/dummy-dat/')
console.log('Starting...')
Dat(storage, function (err, dat) {
if (err) throw err
console.log('Writable?', dat.writable)
@bnewbold
bnewbold / dat-spec-thoughts.md
Last active October 31, 2017 03:24
Feedback on dat spec/paper

I've been implementing a dat client in Rust: https://github.com/bnewbold/geniza

It's been fun! The "whitepaper"/spec has been very helpful. Below are a few thoughts/comments on the paper, documentation, and protocols.

Informal Protocol Proposals

With my archival and inter-op hat on, I wish that the hyperdrive metadata register (specifically Node protobuf messages) included standard full-file hashes (eg, SHA1 or BLAKE2b of the entire file, with no length prefix). These could be optional, but could presumably be calculated when adding files to archives with little overhead. This could make auditing, verification, and interoperability between distributed networks easier. Storage and compute overhead would be non-zero.

It seems like the network protocol really should have a version field... in the initial pre-encryption Register message?

RE: https://phabricator.wikimedia.org/T223528

The images of the National Library of Aruba are on the internet archive (https://archive.org/details/bibliotecanacionalaruba). Can you get their images from their site (currently 1.017) and put them (urls) + all the metadata into OpenRefine or CSV file so I can prepare them for upload to Wikimedia Commons?

First, you need the following command line tools installed:

fixed_issnl issnl fatcat_ident name
1530-1311 1550-1311 2nklacmgkjdfjib7kqi3hdh76a International Symposium on Temporal Representation and Reasoning/TIME, Proceedings of the
2306-0441 2223-0441 rngvdeed65ffhmgfxti7s2z6by Journal of Local and Global Health Science
1641-6554 1641-6565 yviicehmubf4bcxo23q43sbkzu Kolposkopia
1526-7539 1526-7639 pnyefvclqfabjlnl4suox6pdte International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Proceedings of the
0276-6574 0276-6547 hcrk2xeoknf7daxwurje2vg3n4 Computers in Cardiology Conference
0018-9413 0359-4237 ngthxcdwgzhovfgimtll6owdnm IEEE Transactions on Geoscience Electronics
2630-4301 drgmggxvfjbjjkakrjegaujkgy Food Modelling Journal
1089-7771 1086-7771 sjjfcknh3zawndw6jdmvfrix7a IEEE transactions on information technology in biomedicine
1093-1139 1039-1139 nnrvd2qmhzbebk2hsopvtchodq Academic Physician and Scientist
@bnewbold
bnewbold / README.md
Created October 15, 2019 14:44
ROR-ification Experiments (FORCE11 2019 ROR Workshop)

Can experiment with GROBID affiliation extraction (raw affiliation string to structured object) at, eg, http://grobid.qa.fatcat.wiki/. Go to "TEI", then select "Process Affiliations".

GROBID dataset here is from millions of research papers found on the web by wayback crawlers. The structured affiliations were gathered into a dataset available at https://archive.org/details/ia_research_affiliation_datasets.

The bioxriv dataset (of top 1000 institutional affiliations, mostly

@bnewbold
bnewbold / README.md
Created November 5, 2019 05:00
IA/COS Registration Download Example

This gist contains hack-y python scripts that pull registration content (as JSON) and any files and wiki pages into a directory structure. One then runs a bagit script and mixes in metadata to get a BagPack.

NOTE: the bagit script I used wasn't BagPack-aware, so it didn't actually include the files under ./metadata/ in the manifests. Also all the bagit metadata is just defaults; these examples are just to show the "shape" of the results.

Check how example items show up on archive.org in this collection: https://archive.org/details/cos-dev-sandbox

Thoughts

Description HTML is probably going to link to any child registrations (items). Should also link back to, at least, collection page.

#!/usr/bin/env python3
"""
Depends on:
- articlemetaapi
Refs:
- https://github.com/scieloorg/articlemetaapi/blob/master/articlemeta/client.py
- https://github.com/scieloorg/xylose/blob/master/xylose/scielodocument.py
"""
@bnewbold
bnewbold / gist:e3c2007c580d5c55c96efc145efc7eca
Created March 21, 2021 00:40
2020-03-16 to 2021-03-15 great meals
Monday March 23 french onion soup and butter miso carrots
Tuesday March 24 pasta al pesto genovese e asparago
Thursday April 2 Coronation chickpeas and yogurt tomatoes
Friday April 3 queso fondido
Sunday April 5 Sun-dried tomato almond pesto pasta
Monday April 6 gua bao! (割包)
Wednesday April 8 okonomiyaki
Friday April 10 broccoli pesto pasta + burrata salad
Sunday April 12 veggie reuben
Thursday April 16 pipian pascal