Skip to content

Instantly share code, notes, and snippets.

@jbaiter
Created October 2, 2017 12:50
Show Gist options
  • Save jbaiter/ca17b9fb3c894e24b60c081a071c6c9b to your computer and use it in GitHub Desktop.
Save jbaiter/ca17b9fb3c894e24b60c081a071c6c9b to your computer and use it in GitHub Desktop.

archiscribe-corpus

This is the corpus repository for https://archiscribe.jbaiter.de.

The goal is to have as much diverse OCR ground truth for 19th Century German prints as possible.

Currently the corpus contains 123 from 3 published across 3 years. Detailed statistics are available below.

Statistics: Decades

Decade # lines
1860 48
1880 50
1890 25
Total 123

Statistics: Years

Year # lines
1868 48
1881 50
1894 25
Total 123

Statistics: Works

Title Date Archive.org IIIF
Natur und Gemüth Ein Feld und Waldblüthenstrauß aus Tagen die nicht mehr sind, Gewunden von Friedrich Aulenbach 1868 bub_gb_HF46AAAAcAAJ Manifest / Mirador
Geschichte der Deutschen bis zur höchsten Machtentfaltung des Römisch ... 1881 geschichtederde00bessgoog Manifest / Mirador
Die forstlichen Verhaltnisse Preussens 1894 dieforstlichenv02hagegoog Manifest / Mirador
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment