Quantization removes information by reducing the precision of each coordinate, effectively snapping each point to a regular grid.
Simplification removes information by removing points applying a heuristic to make the least-perceptible change.
Mike Bostock's answer on SO
The total size of your geometry is controlled by two factors: the number of points and the number of digits (the precision) of each coordinate.
Say you have a large geometry with 1,000,000 points, where each two-dimensional point is represented as longitude in ±180° and latitude in ±90°:
[-90.07231180399987,29.501753271000098],[-90.06635619599979,29.499494248000133],…
Real numbers can have arbitrary precision (in JSON; in JavaScript they are
limited by the precision of IEEE 754)
and thus an infinite number of digits.
But in practice the above is pretty typical, so say
each coordinate has 18 digits. Including extra symbols ([
, ]
and ,
), each
point takes at most 1 + 18 + 1 + 18 + 1 = 39 bytes to encode in JSON, and the
entire geometry is about 39 * 1,000,000 ≈ 39MB.
Now say we convert these real numbers to integers: both longitude and latitude are reduced to integers x and y where 0 ≤ x ≤ 99 and 0 ≤ y ≤ 99. A simple mapping between real-number points ⟨λ,φ⟩ and integer coordinates ⟨x,y⟩ is:
x = floor((λ + 180) / 360 * 100); y = floor((φ + 90) / 180 * 100);
Since each coordinate now takes at most 2 digits to encode, each point takes at most 1 + 2 + 1 + 2 + 1 = 7 bytes to encode in JSON, and the entire geometry is about 7MB; we reduced the total size by 82%.
Of course, nothing comes for free: if you remove too much information, you will
no longer be able to display the geometry accurately. The rule of thumb is that
the size of your grid should be at least twice as big as the largest expected
display size for the entire map. For example, if you’re displaying a world map
in a 960×500 pixel space, then the default 10,000×10,000 (-q 1e4
) is a
reasonable choice.
So, quantization removes information by reducing the precision of each coordinate, effectively snapping each point to a regular grid. This reduces the size of the generated TopoJSON file because each coordinate is represented as an integer (such as between 0 and 9,999) with fewer digits.
In contrast, simplification removes information by removing points, applying a heuristic that tries to measure the visual salience of each point and removing the least-noticeable points. There are many different methods of simplification, but the Visvalingam method used by the TopoJSON reference implementation is described in my Line Simplification article so I won’t repeat myself here.
While quantization and simplification address these two different types of
information mostly independently, there’s an additional complication:
quantization is applied before the
topology is constructed, whereas
simplification is necessarily applied after to
preserve the topology. Since
quantization frequently introduces coincident points
([24,62],[24,62],[24,62]…
), and coincident points are removed, quantization
can also remove points.
The reason that quantization is applied before the topology is constructed is that geometric inputs are often not topologically valid. For example, if you takes a shapefile of Nevada counties and combine it with a shapefile of Nevada’s state border, the coordinates in one shapefile might not exactly match the coordinates in the other shapefile. By quantizing the coordinates before constructing the topology, you snap the coordinates to a regular grid and can get a cleaner topology with fewer arcs, hopefully correctly identifying all shared arcs. (Of course, if you over-quantize, then you can cause too many coincident points and get self-intersecting arcs, which causes other problems.)
In a future release, maybe 1.5.0, TopoJSON will allow you to control the
quantization before the topology is constructed independently from the
quantization of the output TopoJSON file. Thus, you could use a finer grid (or
no grid at all!) to compute the topology, then simplify, then use a coarser grid
appropriate for a low-resolution screen display. For now, these are tied
together, so I recommend using a finer grid (e.g., -q 1e6
) that produces a
clean topology, at the expense of a slightly larger file. Since TopoJSON also
uses delta-encoded coordinates, you rarely pay the full price for all the digits
anyway!
Since JSON numbers are base 10, you should use a grid size that’s a power of ten to make most efficient use of the encoding. Also, unless your input is known to be topologically valid you may want to use -q 1e5 or 1e6 to produce a cleaner topology. And lastly because your browser uses antialiasing when rendering, it can use subpixel positions and so it’s beneficial to use a grid size that’s (somewhat) finer than the pixel grid.