My thinking is summarized as :
There is old data already stored with the Raw ambiguity. In lieu of having all old data be re-written, the solution is to keep that ambiguity and add new explicit types.
Any old code that comes in touch with newly serialized data will break, and will have to update their libraries.
However, the old serialized data will not need to change or have its interpretation change, regardless of assumption made on how to handle Raw previously.
Beyond that, we add new native type for Timestamp and PrivateExtensions.
My suggestion is loosely defined as:
Raw: RawRepresentation (including new Raw8 Type)
String: StringType RawRepresentation
Binary: BinaryType RawRepresentation
Timestamp: TimestampType seconds_since_epoch_as_int_or_float
Timestamp: TimestampType [ seconds_since_epoch_as_int, nanoseconds_as_int ]
Timestamp: TimestampType [ seconds_since_epoch_as_int, nanoseconds_as_int, timezone ]
PrivateExtension: PrivateExtensionType Tag(Byte) ValueAsRegularMsgpackEncodedValue
([ ... ] means array)
We know how arrays, ints, floats, Raw are currently represented in msgpack. All the new "types" just just piggy-backs on those i.e.
ExplicitString is one byte (e.g. 0xd4) + representation of Raw
ExplicitBinary is one byte (e.g. 0xd5) + Representation of Raw
Timestamp is one byte (e.g. 0xd6) + representation of a integer, float or array containing 2 or 3 elements
PrivateExtension is one byte (e.g. 0xd7) + one tag byte (e.g. 0x01 representing Point) + representation of the value (e.g. a Point represented as an array of 2 integers in regular msgpack encoding)
Updated Serializers use boolean options that configure how to operate. The default can be legacy mode. This might be a determination based on how long library has been in use. If all of these options are false, then it is legacy mode and serializers (even updated ones) keep on encoding msgpack as before.
UseStringType
UseBinaryType
UseTimestampType
UsePrivateExtensionType
It summarized in sentences as:
In msgpack, the Raw type is currently ambiguous and used to represent both strings and binary. The new spec introduces support for explicit string and binary types to resolve that ambiguity. In addition, the new spec introduces a timestamp type to allow transmission of timestamps in an interoperable manner. To accomodate private extensions, an extension type is also introduced.
With these changes, backward compatibility is preserved with the caveat that deserializers/receivers update their libraries to decode new types. Serializers/senders do not have to update their library since the legacy msgpack format is still valid.
Updated libraries are listed below: .....