Skip to content

Instantly share code, notes, and snippets.

View atomotic's full-sized avatar

raffaele messuti atomotic

View GitHub Profile
@atomotic
atomotic / readme.md
Created January 11, 2025 08:42
isbndn-sqlite
body {
max-width: 48em;
margin: auto;
line-height: 1.5;
padding: 0.8em;
word-wrap: break-word;
display: flex;
flex-direction: column;
background: #f5f5f7;
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
@atomotic
atomotic / ia-iiif-viewer.js
Last active December 13, 2024 19:14
Tampermonkey script to replace Internet Archive bookreader with Mirador viewer and IIIF content
// ==UserScript==
// @name Archive.org IIIF Viewer
// @namespace http://tampermonkey.net/
// @version 0.1
// @description Adds IIIF viewer functionality to archive.org pages
// @author Raffaele Messuti
// @match https://archive.org/details/*
// @exclude https://archive.org/details/@*
// @grant none
// ==/UserScript==
@atomotic
atomotic / RAV0302299.json
Last active November 1, 2024 10:01
Example record from ICCU API
{
"autore": "Ballard, J. G.",
"baseprov": "I",
"data_agg": "20241014",
"dataa": 1997,
"datada": 1997,
"dataf": [
"1997"
],
"dig_cover": [
@atomotic
atomotic / docker-compose.yml
Created August 25, 2024 14:21
manifold docker compose
services:
postgres:
image: postgres:16
volumes:
- ./data/postgres:/var/lib/postgresql/data
environment:
POSTGRES_DB: 'manifold_production'
POSTGRES_HOST_AUTH_METHOD: 'trust'
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.22
@atomotic
atomotic / README.md
Last active February 10, 2024 13:43
load xml files into SQLite and transform to json

Install sqlpkg

Install extensions

sqlpkg install sqlite/fileio
sqlpkg install jakethaw/xmltojson

Start

➜ file 89595bd2-8076-4da0-8880-518c291e7904
89595bd2-8076-4da0-8880-518c291e7904: EPUB document
➜ tika -m -j 89595bd2-8076-4da0-8880-518c291e7904
Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-237: Illegal SAXException from org.apache.tika.parser.epub.EpubParser@3a320ade
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:310)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:203)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:1071)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:493)
sqlite> .schema itemAnnotations
CREATE TABLE IF NOT EXISTS "itemAnnotations" (
itemID INTEGER PRIMARY KEY,
parentItemID INT NOT NULL,
type INTEGER NOT NULL,
authorName TEXT,
text TEXT,
comment TEXT,
color TEXT,
pageLabel TEXT,
@atomotic
atomotic / epub-search.md
Created November 13, 2021 12:11
indexing epub content into solr

indexing epub content into solr

solr schema

  • 1 document per chapter, then collapse
  • multivalued fields: chapter_title and chapter_text, keeping order.

text extraction

how to extract structured text from epub

version: "3"
node-exporter:
image: prom/node-exporter
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- "--path.procfs=/host/proc"