Skip to content

Instantly share code, notes, and snippets.

@ross-spencer
Last active April 11, 2019 09:08
Show Gist options
  • Save ross-spencer/dc4f89f2c2d559a8e49bd3549c568f6b to your computer and use it in GitHub Desktop.
Save ross-spencer/dc4f89f2c2d559a8e49bd3549c568f6b to your computer and use it in GitHub Desktop.
Bagit layouts and DC metadata in Archivematica

Bagit layouts and DC metadata

Turns out it's not straightforward and very much depends on how your bag is configured.

Layout one

Highlights

  • Metadata not part of the bag, dropped inside the top-level.
  • Path needs to start data/....
  • Eg. data/filename.file.

Tree

md_outside/
├── bag-info.txt
├── bagit.txt
├── data
│   ├── beihai.tif
│   ├── bird.mp3
│   ├── ocr-image.png
│   ├── piiTestDataCreditCardNumbers.txt
│   └── View_from_lookout_over_Queenstown_towards_the_Remarkables_in_spring.jpg
├── manifest-sha256.txt
├── manifest-sha512.txt
├── metadata
│   └── metadata.csv
├── tagmanifest-sha256.txt
└── tagmanifest-sha512.txt

2 directories, 12 files

CSV

filename,dc.title,dc.creator,dc.subject,dc.subject,dc.subject,dc.subject,dc.subject,dc.description,dc.publisher,dc.contributor,dc.date,dc.type,dc.format,dc.identifier,dc.source,dc.language,dc.language,dc.relation,dc.coverage,dc.rights
data/bird.mp3,"14000 Caen, France - Bird in my garden",Nicolas Germain,field recording,phonography,soundscapes,sound art,radio aporee,"Bird singing in my garden, Caen, France, Zoom H6",Radio Aporee,,2017-05-27,,,aporee_36644_41997,Internet Archive,,,,"Caen, France",Public domain
data/beihai.tif,"Beihai, Guanxi, China, 1988",NASA/GSFC/METI/ERSDAC/JAROS and U.S./Japan ASTER Science Team,satellite imagery,China,Beihai,,,Beihai is a city in the south of Guangxi.,NASA Jet Propulsion Laboratory,,"February 29,2016",,,,NASA Jet Propulsion Laboratory,,,,"Beihai, China",Public domain
data/View_from_lookout_over_Queenstown_towards_the_Remarkables_in_spring.jpg,Morning view from lookout over Queenstown towards the Remarkables in spring,Pseudopanax at English Wikipedia,The Remarkables,Lake Wakatipu,,,,,Wikimedia Commons,,7 October 2014,,,,Wikimedia Commons,,,,Lake Wakatipu,Public domain
data/ocr-image.png,OCR image,Tesseract,,,,,,This image was retrieved from the Tesseract wiki (https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage) to test optical character recognition in Archivematica.,,,,,,,Tesseract project,en,de,,,

Layout two

Highlights

  • Metadata part of the bag
  • Path needs to start data/objects/....
  • Eg. data/objects/filename.file`.

Tree

md_inside/
├── bag-info.txt
├── bagit.txt
├── data
│   ├── metadata
│   │   └── metadata.csv
│   └── objects
│       ├── beihai.tif
│       ├── bird.mp3
│       ├── ocr-image.png
│       ├── piiTestDataCreditCardNumbers.txt
│       └── View_from_lookout_over_Queenstown_towards_the_Remarkables_in_spring.jpg
├── manifest-sha256.txt
├── manifest-sha512.txt
├── tagmanifest-sha256.txt
└── tagmanifest-sha512.txt

3 directories, 12 files

CSV

filename,dc.title,dc.creator,dc.subject,dc.subject,dc.subject,dc.subject,dc.subject,dc.description,dc.publisher,dc.contributor,dc.date,dc.type,dc.format,dc.identifier,dc.source,dc.language,dc.language,dc.relation,dc.coverage,dc.rights
data/objects/bird.mp3,"14000 Caen, France - Bird in my garden",Nicolas Germain,field recording,phonography,soundscapes,sound art,radio aporee,"Bird singing in my garden, Caen, France, Zoom H6",Radio Aporee,,2017-05-27,,,aporee_36644_41997,Internet Archive,,,,"Caen, France",Public domain
data/objects/beihai.tif,"Beihai, Guanxi, China, 1988",NASA/GSFC/METI/ERSDAC/JAROS and U.S./Japan ASTER Science Team,satellite imagery,China,Beihai,,,Beihai is a city in the south of Guangxi.,NASA Jet Propulsion Laboratory,,"February 29,2016",,,,NASA Jet Propulsion Laboratory,,,,"Beihai, China",Public domain
data/objects/View_from_lookout_over_Queenstown_towards_the_Remarkables_in_spring.jpg,Morning view from lookout over Queenstown towards the Remarkables in spring,Pseudopanax at English Wikipedia,The Remarkables,Lake Wakatipu,,,,,Wikimedia Commons,,7 October 2014,,,,Wikimedia Commons,,,,Lake Wakatipu,Public domain
data/objects/ocr-image.png,OCR image,Tesseract,,,,,,This image was retrieved from the Tesseract wiki (https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage) to test optical character recognition in Archivematica.,,,,,,,Tesseract project,en,de,,,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment