Skip to content

Instantly share code, notes, and snippets.

View billdueber's full-sized avatar

Bill Dueber billdueber

View GitHub Profile
require "benchmark/ips"
require "marc"
require "../lib/marc/jsonl_reader"
require "zinzout"
source_file = "/Users/dueberb/devel/mlibrary/data/search_full_bibs/all.jsonl.gz"
temp_file = "/tmp/500k.bat"
require "pathname"

Summarize the algorithm described between START and STOP

START

This is the process:

  1. Take the first 12 digits of the 13-digit ISBN

  2. Multiply each number in turn, from left to right by a number. The first digit is multiplied by 1, the second by 3, the third by 1 again, the fourth by 3 again, and so on to the eleventh which is multiplied by 1 and the twelfth by 3.

  3. Add all of the 12 answers.

@billdueber
billdueber / README.md
Created December 6, 2022 03:14
Truffleruby JSON parsing slow on previously-gzipped data

Benchmark JSON parsing: gzipped vs. non-gzipped data

This is a very simple, self-contained (well, except for benchmark-ips) benchmark. Conceptually, it does the following:

  • Create an array-of-arrays, each element being a 20-element array of integers
  • Write it out to a file as JSON
  • Write it out to a gzipped file as JSON
  • Read them both back in as strings
  • Compare how long it takes to JSON.parse the never-gzipped (plain) string vs the previously-gzipped string

Usage via slow_gzip_bench.rb -h

@billdueber
billdueber / rsolr_streamer.rb
Last active January 13, 2022 17:18
Simple RSolr extension to use a cursor-based stream to iterate over docs in a solr core
# frozen_string_literal: true
# # Simple example -- get ids and titles of all items without an author
#
# rsolr = RSolr.connect(url: 'http://localhost:8025/solr/catalog')
# stream = rsolr.streamer(handler: 'select') do |s|
# s.filter = 'NOT author:[* TO *]'
# s.sort = 'id asc'
# s.fields = ['id', 'title']
# s.batch_size = 2_000
@billdueber
billdueber / aael.xml
Created April 8, 2021 17:59
Alma document from AAEL
<?xml version="1.0" encoding="UTF-8"?>
<collection>
<record>
<leader>00000nas a2200000 i 4500</leader>
<controlfield tag="001">990155606890206381</controlfield>
<controlfield tag="005">20190626105359.0</controlfield>
<controlfield tag="008">170919c20179999miuuu p 6 0 a0eng d</controlfield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">(MiU)015560689MIU01</subfield>
</datafield>
@billdueber
billdueber / slip_flow.md
Last active March 31, 2021 20:41
Basic path through slip

SLIP flow for normal (non-print-holdings or collection-builder) items

A basic run through of how things move through SLIP.

DB Tables overview

  • slip_rights: (one row per item). A copy-ish of rights_current with additional information about when an item was last updated. Populated/updated from vufind solr.
  • slip_queue: (one row per item-to-update). A list of HTIDs along with slots to hold information about which (if any) process is
@billdueber
billdueber / results.txt
Last active December 1, 2022 20:44
Slowness of parsing a string read from a gzipped file
truffleruby 22.3.0, like ruby 3.0.3, GraalVM CE Native [x86_64-darwin]
Base unit is an array of 20 integers
JSON-decode a string encoding an array of 10 of those base units.
Calculating -------------------------------------
JSON 10 plain 6.817k (±14.0%) i/s - 133.364k in 19.995768s
JSON 10 gzdata 1.204k (±28.0%) i/s - 15.611k in 20.047997s
JSON 10 flattened 6.954k (± 9.6%) i/s - 138.006k in 20.090469s
@billdueber
billdueber / pipe_to.rb
Created May 3, 2019 18:37
Model (in some ways) the elixir pipe from ruby. This is an abomination.
class BasicObject
# kwargs always goes in as at least an empty hash, so need to
# special-case it for methods/functions without keyword args
def safe_call(*args, **kwargs)
kwargs.empty? ? self.call(*args) : self.call(*args, **kwargs)
end
# Call the provided callable (or symbol for a method on the current object)
# with the caller as the first argument along with whatever else
# was passed in as subsequent arguments
@billdueber
billdueber / zip_contents_summary.rb
Last active February 8, 2019 17:45
Summary of zipfile by mime type
require 'zip'
require 'mimemagic'
zipfilename = ARGV[0]
class MimeStats
attr_accessor :type, :size, :csize, :count
def initialize(type, size = 0, csize = 0)
@type = type
@size = size
@billdueber
billdueber / test_solr_analysis.rb
Last active January 22, 2019 16:37
How to test solr analysis output against live solr
require 'simple_solr_client'
# https://github.com/billdueber/simple_solr_client
client = SimpleSolrClient::Client.new("http://localhost:9639/solr")
# What do we have?
client.cores
#=> ['med']
core = client.core('med')
# Can get a list of the field types if you like: