MsgPack: Support timestamp as new spec-defined inter-operable extension type

Timestamp is a data type that represents an instant in time and space. It is supported by most programming languages. However, it is not defined in the msgpack spec. The lack of built-in support for timestamp is a constant issue brought up while evaluating encodings.

The proposal is to include timestamp support in the new spec as an inter-operable extension type, defined by the spec along with Binary. It will have extension tag -2.

A timestamp is composed of 3 components:

secs: signed integer representing seconds since unix epoch
nsces: unsigned integer representing fractional seconds as a nanosecond offset within secs, in the range 0 <= nsecs < 1e9
tz: signed integer representing timezone offset in minutes east of UTC, and a dst (daylight savings time) flag

Including the timezone allows us to fully decode a stored time and present the full information as defined in RFC 3339 (http://www.ietf.org/rfc/rfc3339.txt) or other time specifications like RFC 822, e.g. 2006-01-02T15:04:05.999999999Z07:00, etc.

A timestamp is encoded in a variable-length byte array, with length ranging from 1 to 15.

The first byte is the descriptor, which defines which components are encoded and how many bytes are used to encode secs and nsecs components. If secs/nsecs is 0 or tz is UTC, it is not encoded in the byte array explicitly.

Descriptor 8 bits are of the form `A B C DDD EE`:
    A:   Is secs component encoded? 1 = true
    B:   Is nsecs component encoded? 1 = true
    C:   Is tz component encoded? 1 = true
    DDD: Number of extra bytes for secs (range 0-7).
         If A = 1, secs encoded in DDD+1 bytes.
             If A = 0, secs is not encoded, and is assumed to be 0.
             If A = 1, then we need at least 1 byte to encode secs.
             DDD says the number of extra bytes beyond that 1.
             E.g. if DDD=0, then secs is represented in 1 byte.
                  if DDD=2, then secs is represented in 3 bytes.
    EE:  Number of extra bytes for nsecs (range 0-3).
         If B = 1, nsecs encoded in EE+1 bytes (similar to secs/DDD above)

Following the descriptor bytes, subsequent bytes are:

secs component encoded in `DDD + 1` bytes (if A == 1)
nsecs component encoded in `EE + 1` bytes (if B == 1)
tz component encoded in 2 bytes (if C == 1)

secs and nsecs components are integers encoded in a BigEndian 2-complement encoding format.

tz component is encoded as 2 bytes (16 bits). Most significant bit 15 to Least significant bit 0 are described below:

Timezone offset has a range of -12:00 to +14:00 (ie -720 to +840 minutes). 
Bit 15 = have\_dst: set to 1 if we set the dst flag.
Bit 14 = dst\_on: set to 1 if dst is in effect at the time, or 0 if not.
Bits 13..0 = timezone offset in minutes. It is a signed integer in Big Endian format.

This model allows the compact encoding of all timestamps, without wasting bytes. For example:

Jan 1, 1970 (unix epoch) encoded as just 1 byte (the descriptor byte alone as 0).
Apr 1, 1970 with no nanosecond component is encoded as 4 bytes (1 desc, 3 secs)
Jan 1, 2050 with 200 nanoseconds offset is encoded as 7 bytes (1 desc, 5 secs, 1 nsecs)

Many mainstream languages have native support for date/time representation with timezone offset support. For example:

Java (Calendar, which encapsulates Date and Timezone)
Go (time.Time)
Python (datetime.datetime)
Ruby (Time)

However, for languages which do not have native support for the full representation, libraries there should use a struct that encapsulates time and timezone information when decoding schema-less. As an example, a javascript library could use a { date: Date(), tz: Number() }.

Please see msgpack/msgpack#128 for background discussion which led to filing this issue.

ugorji/msgpack-timestamp-type.md