version 0.0.1-draft-1
A standard base log format that enables log messages to be easily routed and filtered while allowing applications maximum flexibility in data they are able to record.
- record format
- few required fields (just enough for coarse routing/filtering)
- "easy" serialization and deserialization
- easily shipped through Heka
- support arbitrary application data
- file (stream) format
- newline delimited
- "grepable" on disk
- serialized size
- processing speed over format flexibility
- comprehensive schemas for all applications
- How closely to Heka message format should we try to be?
- Heka "Fields" are less flexible than JSON.
- Heka natively uses protobufs, which are not flexible.
- How big of a concern is routing performance?
- Is a strict field order important?
- Should we expect a router to parse the entire message?
- Is it important for the message to use a single encoding?
- For example, if the envelope was protobuf and payload was JSON?
- I lean toward yes, a single encoding would be ideal.
- How important is nested data in a log message?
- I think its very important.
- Without nesting, ad hoc "flattening" is inevitable.
Log messages are objects (in JSON terminology) containing a small list of
required fields. Additional fields belong to the payload
field which may be of
any JSON compatible type.
The purpose of the top level fields is to uniquely identify each log message by application, location, time, and importance.
version
- string
- log format version (semver)
severity
- number
- syslog severity
type
- string
- designates the "type" of the
payload
field - optionally a JSON Schema URL
logger
- string
- application name
hostname
- string
pid
- number
- Process ID of the logging process
timestamp
- number
- microseconds since UNIX epoch (UTC)
payload
- object
_*
- optional
- string
- additional metadata fields
These fields are strictly ordered from least to most specific. A process may
choose to leave any value unparsed that it doesn't require. For example, a
router may only need to parse up to the name
in order to send the message to
the correct log aggregator. It can leave the rest of the message unparsed and
pass the entire message as its payload.
The type
value may be any string signaling additional context of the payload
value for applications that emit structured log messages. It may be simply
"object", or it may be "FMLError:v1.0.2", or a JSON Schema URL.
_*
fields are for metadata about the payload
that don't fit into the other
fields or in the payload itself. Some examples may be _appVersion
,
_threadId
, _awsRegion
Application specific payload
types may be hierarchical (even encouraged) but
their schemas are beyond the scope of this document.
{
"version":"0.0.1-draft-1",
"severity":6,
"type":"http://schemas.accounts.firefox.com/v1/request_summary.json",
"logger":"fxa-auth-server",
"hostname":"apollo",
"timestamp": 1405915343869000,
"payload": {
"code":400,
"errno":103,
"rid":"1405915831047-56717-80",
"path":"/v1/password/change/start",
"lang":"en-US,en",
"remoteAddressChain":["127.0.0.1"],
"t":1,
"email":"[email protected]"
},
"_appVersion": "0.18.0"
}
TO BE DETERMINED
- JSON
- TNetStrings
- MsgPack
Lines may be compressed. (???)
Serialized messages do not need to be directly human readable. A pretty printer application may be used to display messages in a human optimized format.
Raw log files should be easily searchable from a *nix shell, somehow
- arbitrarily nested objects
- parser skipping unknown fields (+1 for tnetstrings)
My 12factor-config package pretty much does all this already, except
type
is a string representation of thelevel
rather than describing the additional data. The additional data is always an object. I suspect it's not worth re-using it since this is slightly different, but there may be something we can steal from there. :)