Skip to content

Instantly share code, notes, and snippets.

@bertsky
bertsky / plot-cuda-util.sh
Last active September 17, 2024 22:28
Plot GPU compute and memory utilization curve
#!/usr/bin/env bash
fname=$1
# if the log file does not exist yet, record it (stop by pressing ctrl+c)
if ! test -f $fname; then
nvidia-smi -f $fname -l 1 --format=csv --query-gpu=timestamp,memory.total,memory.used,utilization.gpu,utilization.memory &
bg=$!
@bertsky
bertsky / ocrd_fix_imagefilename.py
Created June 11, 2024 14:47
DFG METS: overwrite PAGE-XML @imageFilename from image fileGrp
import os
import click
from ocrd_utils import MIMETYPE_PAGE
from ocrd_models.ocrd_mets import OcrdMets
from ocrd_models.ocrd_page import to_xml
from ocrd_modelfactory import page_from_file
@click.command()
@click.option('-m', '--mets-file', default="mets.xml", help="path to METS of workspace")
@bertsky
bertsky / tei2txt.sh
Last active April 3, 2024 18:46
wrapper around dta-tools tei2txt.pl covering dehyphenation
#!/bin/bash
nontext_opts=(
xmlstarlet ed -N tei=http://www.tei-c.org/ns/1.0
-d //tei:note
-d //tei:fw
-d //tei:table
-d //tei:figure
-d //tei:formula
-d //tei:titlePage
@bertsky
bertsky / charfreq.py
Created April 2, 2024 11:39
Aggregate character histogram for the given text files
#!/usr/bin/env python3
import argparse
import os
import sys
import io
from functools import reduce
import json
import unicodedata
@bertsky
bertsky / mlmodel.py
Last active February 25, 2024 16:23
dump user metadata of a kraken model file or fix it
#!/usr/bin/env python3
# Dump user metadata of a kraken model file or fix it.
import click
import json
import os
if not 'TF_CPP_MIN_LOG_LEVEL' in os.environ:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # error
@bertsky
bertsky / metha-dump.py
Last active January 10, 2024 16:20
dump METS files from an OAI harvest (metha-cat output after running metha-sync), with recursive METS downloads for multipart works
#!/usr/bin/env python3
import sys
from lxml import etree as ET
from ocrd_models.constants import NAMESPACES
NAMESPACES['oai'] = "http://www.openarchives.org/OAI/2.0/"
for curie in NAMESPACES:
@bertsky
bertsky / cudatest.sh
Last active June 10, 2023 23:51
OCR-D workflow for coverage tests, esp. CUDA support
set -e
# select first CUDA device (in case there are multiple, which may fail due to [a recent Tensorflow problem](https://github.com/qurator-spk/eynollah/issues/99))
export CUDA_VISIBLE_DEVICES=0
# check we are not running into [this bug](https://github.com/shapely/shapely/issues/1598)
python3 -c "from shapely.geometry import Polygon; import torch; torch.randn(10).cuda()"
# validate CUDA support is working in TF and Torch (not an exhaustive test)
python3 -c "import torch; print(torch.cuda.is_available())"
@bertsky
bertsky / workflow.py
Last active December 1, 2020 00:41
proof of concept for an OCR-D workflow engine that uses strong (API instead CLI) integration of processors and acts as a server
import os
import sys
import re
import click
import json
from time import time
import flask
from distutils.spawn import find_executable as which
import ocrd