mass_spec_formats.md

Summary of the problem from mz5 paper (concerning .mzML but just as true for .imzML):

Although based on excellent ontologies, relying on the extended markup language (XML) for the straightforward implementation of mzData, mzXML, and mzML makes for a major efficiency bottleneck. XML was designed to be a human readable, textual data format with considerable inherent verbosity and redundancy. XML was not designed for efficient bulk data storage, and the general modus operandi requires reading complete files to construct the XML parse tree. The mzXML and mzML formats partly circumvent these limitations by using base-64 encoding and (optional) compression of the raw MS scan data in combination with an application-specific indexing system. Despite the improvements gained from these efforts, vendor formats in general outperform mzXML and mzML in terms of space requirements, as well as in read and write efficiency.

HDF5-based: mz5

Specification (kind of)
Paper
currently limited to collections of (m/z, abundance) and (time, abundance)

SQLite-based: mzDB

Designed for LC-MS data but extension for imaging MS data should be easy.

usage of JSON for metadata was considered but rejected
instead, metadata can be stored as XML, although there are also tables for metadata
not possible to store both centroided and profile data in the same file
data is organized into chunks
range queries are implemented with R*tree structure which is built into SQLite
SQLite does all the indexing, although the setup of chunking and multiple indices is not trivial
compression is planned for the next version (MS-Numpress)

OpenMSI data format (HDF5-based)

Designed only for imaging MS data, not for LC-MS
Supports only profile-mode data (binning is performed on centroided data)
By default stores two copies of data for fast access to both spectra and images.

Some closed formats also store two copies of data in profile mode: msiQuant, Scils Lab .sl format. In msiQuant, centroided data is not binned but converted to profile via resolution estimation and gaussian smoothing.

lomereiter/mass_spec_formats.md

HDF5-based: mz5

SQLite-based: mzDB

OpenMSI data format (HDF5-based)