Bill Dueber billdueber

Summarize the algorithm described between START and STOP

START

This is the process:

Take the first 12 digits of the 13-digit ISBN
Multiply each number in turn, from left to right by a number. The first digit is multiplied by 1, the second by 3, the third by 1 again, the fourth by 3 again, and so on to the eleventh which is multiplied by 1 and the twelfth by 3.
Add all of the 12 answers.

Benchmark JSON parsing: gzipped vs. non-gzipped data

This is a very simple, self-contained (well, except for benchmark-ips) benchmark. Conceptually, it does the following:

Create an array-of-arrays, each element being a 20-element array of integers
Write it out to a file as JSON
Write it out to a gzipped file as JSON
Read them both back in as strings
Compare how long it takes to JSON.parse the never-gzipped (plain) string vs the previously-gzipped string

Usage via slow_gzip_bench.rb -h

A basic run through of how things move through SLIP.

slip_rights: (one row per item). A copy-ish of rights_current with additional information about when an item was last updated. Populated/updated from vufind solr.
slip_queue: (one row per item-to-update). A list of HTIDs along with slots to hold information about which (if any) process is

	require "benchmark/ips"
	require "marc"
	require "../lib/marc/jsonl_reader"
	require "zinzout"

	source_file = "/Users/dueberb/devel/mlibrary/data/search_full_bibs/all.jsonl.gz"
	temp_file = "/tmp/500k.bat"
	require "pathname"

	# frozen_string_literal: true

	# # Simple example -- get ids and titles of all items without an author
	#
	# rsolr = RSolr.connect(url: 'http://localhost:8025/solr/catalog')
	# stream = rsolr.streamer(handler: 'select') do \|s\|
	# s.filter = 'NOT author:[* TO *]'
	# s.sort = 'id asc'
	# s.fields = ['id', 'title']
	# s.batch_size = 2_000

	<?xml version="1.0" encoding="UTF-8"?>
	<collection>
	<record>
	<leader>00000nas a2200000 i 4500</leader>
	<controlfield tag="001">990155606890206381</controlfield>
	<controlfield tag="005">20190626105359.0</controlfield>
	<controlfield tag="008">170919c20179999miuuu p 6 0 a0eng d</controlfield>
	<datafield tag="035" ind1=" " ind2=" ">
	<subfield code="a">(MiU)015560689MIU01</subfield>
	</datafield>

	truffleruby 22.3.0, like ruby 3.0.3, GraalVM CE Native [x86_64-darwin]
	Base unit is an array of 20 integers

	JSON-decode a string encoding an array of 10 of those base units.

	Calculating -------------------------------------
	JSON 10 plain 6.817k (±14.0%) i/s - 133.364k in 19.995768s
	JSON 10 gzdata 1.204k (±28.0%) i/s - 15.611k in 20.047997s
	JSON 10 flattened 6.954k (± 9.6%) i/s - 138.006k in 20.090469s

	class BasicObject
	# kwargs always goes in as at least an empty hash, so need to
	# special-case it for methods/functions without keyword args
	def safe_call(args, *kwargs)
	kwargs.empty? ? self.call(args) : self.call(args, **kwargs)
	end

	# Call the provided callable (or symbol for a method on the current object)
	# with the caller as the first argument along with whatever else
	# was passed in as subsequent arguments

	require 'zip'
	require 'mimemagic'

	zipfilename = ARGV[0]

	class MimeStats
	attr_accessor :type, :size, :csize, :count
	def initialize(type, size = 0, csize = 0)
	@type = type
	@size = size

	require 'simple_solr_client'
	# https://github.com/billdueber/simple_solr_client


	client = SimpleSolrClient::Client.new("http://localhost:9639/solr")
	# What do we have?
	client.cores
	#=> ['med']
	core = client.core('med')
	# Can get a list of the field types if you like: