Skip to content

Instantly share code, notes, and snippets.

@gmaclennan
Last active October 6, 2021 06:00
Show Gist options
  • Save gmaclennan/7c5cee11422412752cd0e6699f108641 to your computer and use it in GitHub Desktop.
Save gmaclennan/7c5cee11422412752cd0e6699f108641 to your computer and use it in GitHub Desktop.
Ideas for an alternative to mbtiles for map tile storage

MBTiles (MapBox Tiles)

https://github.com/mapbox/mbtiles-spec

MBTiles is a specification for storing tiled map data in SQLite databases for immediate usage and for transfer.

MBTiles is a useful format for storing image and vector tilesets (used in "slippy-maps") on disk, and for transferring between devices for offline use.

MBTiles advantages

  1. A single file that does not need to be decompressed for use. We currently distribute tilesets in a folder structure, but it is easy for users to accidentally move files, or get the folder hierarchy messed up.

  2. It is a content-addressable store, which saves space with repeat tiles. A map tileset can have many repeat tiles (think of all that ocean) so this saves a lot of space.

  3. Better use of storage space. A map tileset can have many very small files (empty tiles or tiles with a single feature). Most operating systems have a minimal size that a small file takes up on space, and this adds up.

  4. The format is read-write, not needing any decompression.

MBTiles disadvantages

  1. It needs a (native) SQLite client to read/write data. We would like a pure-JS solution so that (a) we can make electron apps for offline maps without needing to compile SQLite and (b) we could load tilesets into websites for offline-first web apps. There is a JS port of SQLite but it requires loading the entire DB into memory.

Proposals / ideas

I have wondering if we can use a very simple flat-file format, with a very basic index. Tilesets are normally written to only once, and are primarily a read-only format. We do want JS clients, in particular electron apps, to be able to write tilesets to the format we use. I would love to be able to write a website that could download maptiles and save directly to a file without needing to keep the whole thing in memory.

The index could be a simple as a list of start/length to the position of tiles in a second file, ordered by the quadkey of the tile. Quadkeys are a sequential 4-bit number to identify a tile. Converting between the more common z/x/y tile format and quadkeys is pretty simple. To read the index you would calculate the quadkey, multiply by the bits used to store the start/length, read the index file to get the start/end, then read the tile from the second file.

For writing we would need to maintain a second index of content hashes in order to write the index avoiding duplicates.

The index would have a lot of empty space, since most tilesets do not include the entire world, but with sparse files and gzip compression, this might not be an issue. The index could be stored gzipped, in a tarball with the single file of all tiles. The index might be small enough to load into memory - it is bounded by the max zoom of map tiles (zoom 22) which means a max number of 2^22 tiles = 2^22 (4,194,304). Start position and length could not need many bits.

@AliFlux
Copy link

AliFlux commented Oct 5, 2021

You're right. We need an mbtiles alternative that doesnt depend on sqlite. I have a couple of opensource projects that support mbtiles and sqlite interop is always a headache in all sorts of platforms from js, c# to golang.

A simple binary file system may actually be much faster than sqlite as well.

@AliFlux
Copy link

AliFlux commented Oct 5, 2021

ASAR looks interesting. It has random access support, checksum, chunking. And can scale upto 8PB in size!

@gmaclennan
Copy link
Author

Thanks @AliFlux. We have been using ASAR for offline tiles, but are considering moving away from it. Further discussion is here: digidem/mapeo-server#53

@AliFlux
Copy link

AliFlux commented Oct 6, 2021

You're right, random access is a super important thing. My company is making multi-terrabyte mapping systems and random access from multiple threads at same time makes the system much more performant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment