Skip to content

Instantly share code, notes, and snippets.

@bertsky
bertsky / EVAL-CLIP-TESS-vs-OCRO_0005.html
Created December 5, 2019 15:33
dinglehopper output
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
<style type="text/css">
.gt .diff {
color: green;
@bertsky
bertsky / preprocess-ocrd-gt.sh
Last active October 18, 2019 12:45
Commands to prepare pixel classifier training data from OCR-D GT
# Needs OCR-D/core#327 OCR-D/ocrd_olena#10 OCR-D/ocrd_segment#11 bertsky/ocrd_cis
# Runs a preprocessing and resegmentation workflow for GT annotation,
# then extracts page images along JSON descriptions of region polygons and classes;
# finally, creates a flattened directory under $TARGET.
# Run: preprocess-ocrd-gt.sh [TARGET-DIRECTORY [METS-FILE]]
# (default is all METS files anywhere under CWD)
TARGET=${1:-../1000pages-crop-sauvola-denoise-deskew-repair}
WORKSPACES=${2:-$(find . -name mets.xml)}