Skip to content

Instantly share code, notes, and snippets.

@cboulanger
cboulanger / gist:9c408f808ae5f8d3d9854f14faef2e54
Created November 16, 2020 16:47
zotero-cli items --filter '{"q":"allison"}'
Command execution failed: StatusCodeError: 404 - {"type":"Buffer","data":[78,111,116,32,102,111,117,110,100]}
at new StatusCodeError (/Users/cboulanger/Code/zotero-cli/node_modules/request-promise-core/lib/errors.js:32:15)
at Request.plumbing.callback (/Users/cboulanger/Code/zotero-cli/node_modules/request-promise-core/lib/plumbing.js:104:33)
at Request.RP$callback [as _callback] (/Users/cboulanger/Code/zotero-cli/node_modules/request-promise-core/lib/plumbing.js:46:31)
at Request.self.callback (/Users/cboulanger/Code/zotero-cli/node_modules/request/request.js:185:22)
at Request.emit (events.js:311:20)
at Request.<anonymous> (/Users/cboulanger/Code/zotero-cli/node_modules/request/request.js:1154:10)
at Request.emit (events.js:311:20)
at IncomingMessage.<anonymous> (/Users/cboulanger/Code/zotero-cli/node_modules/request/request.js:1076:12)
at Object.onceWrapper (events.js:417:28)
@cboulanger
cboulanger / demo.js
Created December 6, 2020 18:10
Autocomplete combo box example
qx.Class.define("custom.AutoCompleteComboBox",
{
extend: qx.ui.form.ComboBox,
properties: {
model: {
init: null,
nullable: true,
check: "qx.type.Array",
@cboulanger
cboulanger / backup-zotero.js
Last active April 1, 2021 06:42
One-way (read-only) synchronization from zotero.org to a local key-value datastore (couchbase)
/**
* Naive prototype function that does a remote-to-local synchronization of user
* and group libraries that are accessible to the owner of the API token. The
* external `sandbox` object provides preconfigured api objects which access the
* Zotero server and the local store (a couchbase server in the case of this
* prototype), and a gauge widget for some nice visual feedback.
*
* This is only a partial implementation of
* https://www.zotero.org/support/dev/web_api/v3/syncing . In particular, the
* code is unaware of mid-sync version changes, since the sync is needed only
ErrorResponse: Upload stage 2: 400: Bad Request
//snip
response: Response {
size: 0,
timeout: 0,
[Symbol(Body internals)]: { body: [PassThrough], disturbed: false, error: null },
[Symbol(Response internals)]: {
url: 'https://zoterofilestorage.s3.us-east-1.amazonaws.com/',
status: 400,
statusText: 'Bad Request',
@cboulanger
cboulanger / DatabaseMaintenanceController.php
Last active May 12, 2021 17:14
PHP-Script to convert MySQL tables & columns to InnoDb / utf8mb4 / utf8mb4_unicode_ci (Yii2/Standalone)
<?php
namespace app\controllers;
use yii\console\ExitCode;
use Yii;
/**
* This is a Yii2 console controller class which can also be used standalone to run the
* actionUpdateEncoding() method if you remove the Yii2 stuff.
@cboulanger
cboulanger / 10.1515_zfrs-1980-0101.xml
Created May 12, 2021 14:55
Abbyy Cloude OCR XML Output
This file has been truncated, but you can view the full file.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0" producer="ABBYY FineReader Engine 12" languages="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
<documentData>
<paragraphStyles>
<paragraphStyle id="{93BCCE1C-1547-4388-AA39-146AEF764F40}" name="Body text|1" mainFontStyleId="{DCF0C577-94EF-48F5-929F-C9402CFAA588}" role="text" align="Left" startIndent="0" leftIndent="0" rightIndent="0" lineSpacing="1197" lineSpacingRatio="1.1000000238418579" fixedLineSpacing="0">
<fontStyle id="{DCF0C577-94EF-48F5-929F-C9402CFAA588}" baseFont="1" ff="Times New Roman" fs="9.5" backgroundColor="4278190079"/>
<fontStyle id="{BBC2CAB0-2ACF-4618-9DCB-1E0FA6775B73}" ff="Times New Roman" fs="10." backgroundColor="4278190079"/>
<fontStyle id="{C50
@cboulanger
cboulanger / error.txt
Created May 13, 2021 17:00
Error running `ocr-transform abbyy page`
docker run --rm -it -v "$PWD":/data ubma/ocr-fileformat ocr-transform abbyy page 10.1515_zfrs-1980-0101.xml 10.1515_zfrs-1980-0101.page.xml
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Premature end of file.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1014)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
@cboulanger
cboulanger / 10.1515_zfrs-1980-0101.xml.page.xml
Last active May 13, 2021 17:03
result of `ocr-transform abbyy page 10.1515_zfrs-1980-0101.xml` (pretty-printed)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<PcGts xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd"
xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15">
<Metadata>
<Creator>ABBYY FineReader Engine 12</Creator>
<Created>1970-01-01T00:00:00</Created>
<LastChange>1970-01-01T00:00:00</LastChange>
<Comments/>
</Metadata>
@cboulanger
cboulanger / create-corpus-from-best-ocr-result.sh
Last active May 19, 2021 17:26
Selects from different UTF-8 documents that are the result of OCR processing of the same source document, choosing the one with the highest quality (i.e. highest language recognition confidence)
#! /usr/bin/env bash
# see https://ryanfb.github.io/etc/2015/03/16/automatic_evaluation_of_ocr_quality.html
# using https://github.com/saffsd/langid.py
# install with pip install langid and add the scorelines.sh & ocrquality.rb scripts from the blog entry in the same directory
# The PDF source files, which start with a DOI, adapt this for your case
FILE_SELECTOR=/path/to/source/dir/*.pdf
# The path to the directory to which the selected documents should be copied
TARGET=/path/to/target/dir
import {default as fetch} from 'node-fetch';
const { pdf } = require("pdf-to-img");
import {tmpdir} from "os";
import {createWriteStream, createReadStream} from 'fs';
import * as fsp from 'fs/promises'
import * as archiver from 'archiver';
import {ArchiverError} from "archiver";
import * as path from "path";
import {Parser, Builder} from "xml2js";