Skip to content

Instantly share code, notes, and snippets.

View billdueber's full-sized avatar

Bill Dueber billdueber

View GitHub Profile
@billdueber
billdueber / results.txt
Last active December 1, 2022 20:44
Slowness of parsing a string read from a gzipped file
truffleruby 22.3.0, like ruby 3.0.3, GraalVM CE Native [x86_64-darwin]
Base unit is an array of 20 integers
JSON-decode a string encoding an array of 10 of those base units.
Calculating -------------------------------------
JSON 10 plain 6.817k (±14.0%) i/s - 133.364k in 19.995768s
JSON 10 gzdata 1.204k (±28.0%) i/s - 15.611k in 20.047997s
JSON 10 flattened 6.954k (± 9.6%) i/s - 138.006k in 20.090469s
@billdueber
billdueber / slip_flow.md
Last active March 31, 2021 20:41
Basic path through slip

SLIP flow for normal (non-print-holdings or collection-builder) items

A basic run through of how things move through SLIP.

DB Tables overview

  • slip_rights: (one row per item). A copy-ish of rights_current with additional information about when an item was last updated. Populated/updated from vufind solr.
  • slip_queue: (one row per item-to-update). A list of HTIDs along with slots to hold information about which (if any) process is
@billdueber
billdueber / aael.xml
Created April 8, 2021 17:59
Alma document from AAEL
<?xml version="1.0" encoding="UTF-8"?>
<collection>
<record>
<leader>00000nas a2200000 i 4500</leader>
<controlfield tag="001">990155606890206381</controlfield>
<controlfield tag="005">20190626105359.0</controlfield>
<controlfield tag="008">170919c20179999miuuu p 6 0 a0eng d</controlfield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">(MiU)015560689MIU01</subfield>
</datafield>
@billdueber
billdueber / rsolr_streamer.rb
Last active January 13, 2022 17:18
Simple RSolr extension to use a cursor-based stream to iterate over docs in a solr core
# frozen_string_literal: true
# # Simple example -- get ids and titles of all items without an author
#
# rsolr = RSolr.connect(url: 'http://localhost:8025/solr/catalog')
# stream = rsolr.streamer(handler: 'select') do |s|
# s.filter = 'NOT author:[* TO *]'
# s.sort = 'id asc'
# s.fields = ['id', 'title']
# s.batch_size = 2_000
@billdueber
billdueber / README.md
Created December 6, 2022 03:14
Truffleruby JSON parsing slow on previously-gzipped data

Benchmark JSON parsing: gzipped vs. non-gzipped data

This is a very simple, self-contained (well, except for benchmark-ips) benchmark. Conceptually, it does the following:

  • Create an array-of-arrays, each element being a 20-element array of integers
  • Write it out to a file as JSON
  • Write it out to a gzipped file as JSON
  • Read them both back in as strings
  • Compare how long it takes to JSON.parse the never-gzipped (plain) string vs the previously-gzipped string

Usage via slow_gzip_bench.rb -h

Summarize the algorithm described between START and STOP

START

This is the process:

  1. Take the first 12 digits of the 13-digit ISBN

  2. Multiply each number in turn, from left to right by a number. The first digit is multiplied by 1, the second by 3, the third by 1 again, the fourth by 3 again, and so on to the eleventh which is multiplied by 1 and the twelfth by 3.

  3. Add all of the 12 answers.

require "benchmark/ips"
require "marc"
require "../lib/marc/jsonl_reader"
require "zinzout"
source_file = "/Users/dueberb/devel/mlibrary/data/search_full_bibs/all.jsonl.gz"
temp_file = "/tmp/500k.bat"
require "pathname"