Skip to content

Instantly share code, notes, and snippets.

@shsdev
Last active October 10, 2024 08:01
Show Gist options
  • Save shsdev/70674ff2678d95a040559f8626a7ffa9 to your computer and use it in GitHub Desktop.
Save shsdev/70674ff2678d95a040559f8626a7ffa9 to your computer and use it in GitHub Desktop.
CSIPSTR2 issue

An issue was reported by stakeholders concerning requirement CSIPSTR2:

CSIPSTR2: The Information Package root folder SHOULD be named with the ID or name of the Information Package, that is the value of the package METS.xml's root `<mets>` element's `@OBJID` attribute.

Enforcing that the name must be the same as the attribute may cause file system interoperability issues because certain characters used in identifiers may cause errors in specific file systems.

There is a need to make sure that the translation of the packages' identifier into a file or folder name is conformant with constraints in different types of commonly used file systems, such as NTFS or FAT32 on Windows, Ext4 or XFS on Linux etc.

Our recommendation is to use Kunze's section 3 of the pair tree specification as the starting point:

https://www.ietf.org/archive/id/draft-kunze-pairtree-01.txt

As Kunze's pairtree specification is outdated (Expired May 29, 2009) we suggest taking over the relevant section 3, adapt it, and create a new appendix in the CSIP named “Mapping Object Identifiers to File-System Safe Names for Interoperability”.

CSIPSTR2 would then reference the appendix:

CSIPSTR2: The Information Package root folder SHOULD be named using the ID or name of the Information Package, which is the value of the package `METS.xml`'s root <mets> element's `@OBJID` attribute. When creating the folder name, the 'Mapping Object Identifiers to File-System Safe Names for Interoperability' process SHOULD be applied to ensure compatibility with file system naming conventions.

Apart from this identifier-filename mapping specification other possible file system issues can be dealt with at the same time. for example, enforcing the use of case sensitive information package naming may cause issues on file systems which are not case sensitive, such as NTFS (case-preserving but not differentiating) or FAT32 (entirely case-insensitive) on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment