Bill Dueber billdueber

SLIP flow for normal (non-print-holdings or collection-builder) items

A basic run through of how things move through SLIP.

slip_rights: (one row per item). A copy-ish of rights_current with additional information about when an item was last updated. Populated/updated from vufind solr.
slip_queue: (one row per item-to-update). A list of HTIDs along with slots to hold information about which (if any) process is

This is a very simple, self-contained (well, except for benchmark-ips) benchmark. Conceptually, it does the following:

Create an array-of-arrays, each element being a 20-element array of integers
Write it out to a file as JSON
Write it out to a gzipped file as JSON
Read them both back in as strings
Compare how long it takes to JSON.parse the never-gzipped (plain) string vs the previously-gzipped string

Usage via slow_gzip_bench.rb -h

Summarize the algorithm described between START and STOP

START

This is the process:

Take the first 12 digits of the 13-digit ISBN
Multiply each number in turn, from left to right by a number. The first digit is multiplied by 1, the second by 3, the third by 1 again, the fourth by 3 again, and so on to the eleventh which is multiplied by 1 and the twelfth by 3.
Add all of the 12 answers.

	truffleruby 22.3.0, like ruby 3.0.3, GraalVM CE Native [x86_64-darwin]
	Base unit is an array of 20 integers

	JSON-decode a string encoding an array of 10 of those base units.

	Calculating -------------------------------------
	JSON 10 plain 6.817k (±14.0%) i/s - 133.364k in 19.995768s
	JSON 10 gzdata 1.204k (±28.0%) i/s - 15.611k in 20.047997s
	JSON 10 flattened 6.954k (± 9.6%) i/s - 138.006k in 20.090469s

	<?xml version="1.0" encoding="UTF-8"?>
	<collection>
	<record>
	<leader>00000nas a2200000 i 4500</leader>
	<controlfield tag="001">990155606890206381</controlfield>
	<controlfield tag="005">20190626105359.0</controlfield>
	<controlfield tag="008">170919c20179999miuuu p 6 0 a0eng d</controlfield>
	<datafield tag="035" ind1=" " ind2=" ">
	<subfield code="a">(MiU)015560689MIU01</subfield>
	</datafield>

	# frozen_string_literal: true

	# # Simple example -- get ids and titles of all items without an author
	#
	# rsolr = RSolr.connect(url: 'http://localhost:8025/solr/catalog')
	# stream = rsolr.streamer(handler: 'select') do \|s\|
	# s.filter = 'NOT author:[* TO *]'
	# s.sort = 'id asc'
	# s.fields = ['id', 'title']
	# s.batch_size = 2_000

	require "benchmark/ips"
	require "marc"
	require "../lib/marc/jsonl_reader"
	require "zinzout"

	source_file = "/Users/dueberb/devel/mlibrary/data/search_full_bibs/all.jsonl.gz"
	temp_file = "/tmp/500k.bat"
	require "pathname"