Skip to content

Instantly share code, notes, and snippets.

@kba
Created October 19, 2020 09:39
Show Gist options
  • Save kba/407df1bf65577f1c85752b75d9c8a970 to your computer and use it in GitHub Desktop.
Save kba/407df1bf65577f1c85752b75d9c8a970 to your computer and use it in GitHub Desktop.

Fix imageFilename in PAGE

sed -i 's,imageFilename=",imageFilename="page/",' page/*.xml

Initiate mets.xml

ocrd workspace init

Add PAGE/JPEG

ocrd workspace bulk-add \
        --ignore \
        --regex '^.*/(?P<fileGrp>[^/]+)/altstrelitz_friedregister(?P<pageid>.*)\.(?P<ext>[^\.]*)$' \
        --file-id 'FILE_{{ fileGrp }}_{{ pageid }}' \
        --page-id 'PHYS_{{ pageid }}' \
        --file-grp "{{ fileGrp }}" \
        --url '{{ fileGrp }}/altstrelitz_friedregister{{ pageid }}.{{ ext }}' page/*.xml jpg/*.jpg

Add ALTO

(must be a separate step because both PAGE and ALTO use .xml)

ocrd workspace bulk-add \
        --ignore \
        --regex '^.*/(?P<fileGrp>[^/]+)/altstrelitz_friedregister(?P<pageid>.*)\.(?P<ext>[^\.]*)$' \
        --file-id 'FILE_{{ fileGrp }}_{{ pageid }}' \
        --page-id 'PHYS_{{ pageid }}' \
        --file-grp "{{ fileGrp }}" \
        --mimetype "application/alto+xml" \
        --url '{{ fileGrp }}/altstrelitz_friedregister{{ pageid }}.{{ ext }}' alto/*.xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment