Skip to content

Instantly share code, notes, and snippets.

@dannycoates
Last active August 29, 2015 14:04
Show Gist options
  • Save dannycoates/0c7d04f9aa8d3908007a to your computer and use it in GitHub Desktop.
Save dannycoates/0c7d04f9aa8d3908007a to your computer and use it in GitHub Desktop.
FxA Log Format

FxA Log Format

version 0.0.1-draft-1

Goals

A standard base log format that enables log messages to be easily routed and filtered while allowing applications maximum flexibility in data they are able to record.

  • record format
    • few required fields (just enough for coarse routing/filtering)
    • "easy" serialization and deserialization
    • easily shipped through Heka
    • support arbitrary application data
  • file (stream) format
    • newline delimited
    • "grepable" on disk

Non-Goals

  • serialized size
  • processing speed over format flexibility
  • comprehensive schemas for all applications

Open Questions

  • How closely to Heka message format should we try to be?
    • Heka "Fields" are less flexible than JSON.
    • Heka natively uses protobufs, which are not flexible.
  • How big of a concern is routing performance?
    • Is a strict field order important?
    • Should we expect a router to parse the entire message?
  • Is it important for the message to use a single encoding?
    • For example, if the envelope was protobuf and payload was JSON?
    • I lean toward yes, a single encoding would be ideal.
  • How important is nested data in a log message?
    • I think its very important.
    • Without nesting, ad hoc "flattening" is inevitable.

Record Format

Log messages are objects (in JSON terminology) containing a small list of required fields. Additional fields belong to the payload field which may be of any JSON compatible type.

Top Level Fields

The purpose of the top level fields is to uniquely identify each log message by application, location, time, and importance.

  • version
    • string
    • log format version (semver)
  • severity
  • type
    • string
    • designates the "type" of the payload field
    • optionally a JSON Schema URL
  • logger
    • string
    • application name
  • hostname
    • string
  • pid
    • number
    • Process ID of the logging process
  • timestamp
    • number
    • microseconds since UNIX epoch (UTC)
  • payload
    • object
  • _*
    • optional
    • string
    • additional metadata fields

These fields are strictly ordered from least to most specific. A process may choose to leave any value unparsed that it doesn't require. For example, a router may only need to parse up to the name in order to send the message to the correct log aggregator. It can leave the rest of the message unparsed and pass the entire message as its payload.

The type value may be any string signaling additional context of the payload value for applications that emit structured log messages. It may be simply "object", or it may be "FMLError:v1.0.2", or a JSON Schema URL.

_* fields are for metadata about the payload that don't fit into the other fields or in the payload itself. Some examples may be _appVersion, _threadId, _awsRegion

Application Data Types

Application specific payload types may be hierarchical (even encouraged) but their schemas are beyond the scope of this document.

Example

{
  "version":"0.0.1-draft-1",
  "severity":6,
  "type":"http://schemas.accounts.firefox.com/v1/request_summary.json",
  "logger":"fxa-auth-server",
  "hostname":"apollo",
  "timestamp": 1405915343869000,
  "payload": {
    "code":400,
    "errno":103,
    "rid":"1405915831047-56717-80",
    "path":"/v1/password/change/start",
    "lang":"en-US,en",
    "remoteAddressChain":["127.0.0.1"],
    "t":1,
    "email":"[email protected]"
  },
  "_appVersion": "0.18.0"
}

Serialization

TO BE DETERMINED

  • JSON
  • TNetStrings
  • MsgPack

Lines may be compressed. (???)

Human Readability

Serialized messages do not need to be directly human readable. A pretty printer application may be used to display messages in a human optimized format.

Grepability

Raw log files should be easily searchable from a *nix shell, somehow

Potential Problems

  • arbitrarily nested objects
    • parser skipping unknown fields (+1 for tnetstrings)

Prior Art

References

@chilts
Copy link

chilts commented Jul 21, 2014

My 12factor-config package pretty much does all this already, except type is a string representation of the level rather than describing the additional data. The additional data is always an object. I suspect it's not worth re-using it since this is slightly different, but there may be something we can steal from there. :)

@fmarier
Copy link

fmarier commented Jul 21, 2014

Do we want microseconds or milliseconds for the timestamps? microseconds seem overkill to me and everything else than JavaScript seems to use milliseconds since the "beginning of time"

@fmarier
Copy link

fmarier commented Jul 21, 2014

Should host just be the hostname or should if be the fully-qualified hostname?

@dannycoates
Copy link
Author

Do we want microseconds or milliseconds for the timestamps?

Ha, Heka uses NANO seconds! I know javascript only goes to milliseconds, but I thought we might want to leave room for other super fast languages to use them if they want. There's been a couple rare times in my career where microsecond precision would have helped me debug something. If you think its a foot-gun we could go back to milliseconds. Or we could put the burden on the parser to figure out the precision automatically.

@seanmonstar
Copy link

Ohh, logging. I like logging.

level

Various logging frameworks have different ideas on levels. Should we standardize on levels, or should each application just say what level numbers they're interested in when they want data?

name

Is this the name of application? Can't that be inferred by where the log is coming from? Some frameworks share a single logger throughout the app, and others have the concept of a named hierarchy; for instance, I name a new logger in each module, and that uses the name field on the record, such as fxa.routes.verify or fxa.db.memory.

data

It's quite common when logging something to just log a message, and possibly a couple arguments. I'm not sure how to map that to a key-value data object... Should we just never ever do that kind of logging, and always log some object instead?

@dannycoates
Copy link
Author

Should we standardize on levels, or should each application just say what level numbers they're interested in when they want data?

We should pick a scheme. We ought to be able to easily query for all logs at a certain level across all apps.

Is this the name of application?

Yes, it identifies the code that is running. @chilts had a good suggestion of adding the app version also.

Can't that be inferred by where the log is coming from?

Yes, I think all of these top level fields should be set by the logger, not by the caller. These top level fields are more like an envelope for routing. The data that we as programmers set with a logger API will go into data.

It's quite common when logging something to just log a message, and possibly a couple arguments.

Indeed. I expect the logger library would output something like this:

(these are just rough ideas)

{
  "version":"0.0.1-draft-1",
  "level":3,
  "name":"fxa-auth-server",
  "type":"string",
  "host":"apollo",
  "pid":2532,
  "timestamp": 1405915343869000,
  "data": "I got here!"
}

for this:

logger.info("I got here!")

Or maybe:

{
  "version":"0.0.1-draft-1",
  "level":5,
  "name":"fxa-auth-server",
  "type":"array",
  "host":"apollo",
  "pid":2532,
  "timestamp": 1405915343869000,
  "data": ["the foo count was %s", 7]
}

for:

logger.warn("the foo count was %d", 7)

However, now the semantics of what 7 means is harder for a machine to determine (and query later). An object with a printf string for display may be better. I like the way your 'intel' library handles print formatting because it allows us to pretty print without losing semantics. Maybe something like:

{
  "version":"0.0.1-draft-1",
  "level":5,
  "name":"fxa-auth-server",
  "type":"object",
  "host":"apollo",
  "pid":2532,
  "timestamp": 1405915343869000,
  "data": {
    "printf":"the foo count was %(fooCount)d",
    "fooCount":7
  }
}
logger.warn("the foo count was %(fooCount)d", {fooCount: 7})

Now we can query fooCount later and also pretty print it in a log stream.

named hierarchy

I imaging this would fit more into type but I'm not sure. Logging hierarchy is also commonly use to "curry" extra data into a log function for convenience so one doesn't have to repeatedly add requestId (for example) on every log call. If not type then perhaps anything that does this sort of thing will implicitly be an object type and each level would add something that object somehow 👋 ✨.

@dannycoates
Copy link
Author

I wonder if having type limited to a single string is too restrictive? My thinking was that a log router that doesn't care to fully parse data could use type for routing decisions, but there is almost no value in a "string" and "object" type for this. Maybe an array of tags would be better. Tags could be manually set by the programmer or automatically by the logger. For example if I call logger.info(foo) the logger could add a tag for typeof(foo) or I could manually tag a special debug line logger.debugWithTag("alertme", foo) that the router could send directly to my email. Anyway, the logger API isn't too important at this point. The question is, is an array of strings more valuable for routing than a single string, and is the additional complexity worth it?

@dannycoates
Copy link
Author

The question is, is an array of strings more valuable for routing than a single string, and is the additional complexity worth it?

Or do we not do either and force the router to parse data if it wants to make intelligent routing decisions?

@dannycoates
Copy link
Author

I've renamed some of the fields to be closer to Heka's native message naming

@seanmonstar
Copy link

Yes, it identifies the code that is running. @chilts had a good suggestion of adding the app version also.

Can't that be inferred by where the log is coming from?

Yes, I think all of these top level fields should be set by the logger, not by the caller.

I meant that the actual log pipe (fxa-auth-server.log, or whatever) would be able to infer the app name.

However, now the semantics of what 7 means is harder for a machine to determine (and query later)

Yea, I struggled a lot in intel trying to figure out how to serialize strings like that. Currently, intel's Record.toJSON does json.args = args, and figures that a machine can put them back together if desired.

Or do we not do either and force the router to parse data if it wants to make intelligent routing decisions?

I'm not sure. That's what I ended up expecting in intel. I see that hapi uses tags for server and request logging. I catch "uncaught" errors from hapi by checking tags.error && tags.implementation. Though, coming from Python, and hence what I built into intel, was the idea of routing on logger.name and level. It felt like that information was sufficient for getting the right log messages wherever.

Since we're talking about logging here, and I happen to work an logging library, I'd be interested in adjusting intel to solve all these problems easily.

@seanmonstar
Copy link

For the app name + version, how about app?

@dannycoates
Copy link
Author

From #heka irc channel:

11:10 (dcoates) so, I'd really like to be able to encode a tree structure in a heka message, and I don't see a straightforward way to do that looking at message.proto. Is there a recommended way to do it?
11:12 (trink) dcoates: the only wayto do it is flatten the structure and put the hierarchy in the key name
11:13 (trink) dcoates: util.table_to_fields
11:14 (RaFromBRC) https://github.com/mozilla-services/lua_sandbox/blob/dev/modules/util.lua#L13
11:15 (dcoates) trink: yeah, thats what i was afraid of. It means I can't just use heka messages as a "native" format and have to kind of "double encode" my message in a sense.
11:16 (dcoates) I think it would be *really nice* to make Field recursive
11:20 (trink) dcoates: that still wouldn't do what you want (since there is no concept of a map)
11:24 (dcoates) well, you got my intent ;) I'm not trying to offer a solution
11:31 (dcoates) Is supporting encoding trees in messages without flattening something y'all would be willing to consider?
11:32 (trink) dcoates: the what is clear, the intent [why] is unclear (i.e. the need for a highly structured/nested log to be used in real time analysis)
11:36 (dcoates) trees are natural structures for a lot of things. I may want to trigger on something without having to push it up to the top scope. It'd also be nice to not have to transform my data for each system it goes through.
11:41 (chance) just wondering, what kinda data in this case might be in the tree format
11:42 (chance) maybe git commits? :D
11:42 (chance) you could serialize to json, and unserialize back
11:43 (chance) not in the message fields itself, but nonetheless its there.
11:44 (dcoates) chance: indeed, but now I've got 2 encodings in one message when I could have 1, and as a string payload its harder to use in heka filters.
11:45 (chance) i dont disagree. but supporting a tree like structure in pure heka messages would likely make it harder to use *everything* in heka, not even just filters.
11:46 (chance) but its hard to say.
11:46 (chance) at least protbuf supports it im reading, so its a possibility.
11:47 (chance) probably not on the top of things we'll be working on until we get other stuff done.
11:48 (RaFromBRC) dcoates: the double encoding issue is valid, and i know json sucks in a lot of ways, but in lua filters it'd actually be easier to deal w/ tree structured data in json format than it would embedded in the message directly
11:48 (dcoates) chance: im interested in what would be harder for heka internally so that I can understand what I should and shouldn't expect heka to dofor me
11:49 (RaFromBRC) `cjson.decode(your_json)` will give you a lua table (like a javascript object) all ready for action
11:49 (chance) it's hard to say, because i haven't investigated. we use protobuf, protobuf supports this kind of data. however that isn't to say doing such a thing would be as simple as changing protobuf
11:50 (RaFromBRC) extracting data from a bunch of different message fields would be multiple `read_message(FieldName)` calls, and constructing the data structure by hand
11:54 (dcoates) RaFromBRC: awesome, that really helps!
11:55 (chance) yeah, its a good point that its probably easier for you as a user to just get the data from the payload for that specific rason
11:55 (chance) reason, even
11:55 (RaFromBRC) :)
11:55 RaFromBRC likes being helpful
11:56 (dcoates) chance: yes, that is what I was trying to understand, which would be better, all native protobuf, or something else in the payload. payload wins!
11:59 (chance) definitely. sometimes i find myself unsure as well, but im still new to heka :)

@trink
Copy link

trink commented Jul 22, 2014

Non-Goals - comprehensive schemas for all applications

This non-goal is already causing problems in Ops. If the output isn't documented it should be identifiable as such and not sent to Heka or any data warehouse. Everything else should be fully documented and versioned to prevent breaking downstream consumers. There have been several ad-hoc FxA logging changes over the past few months. From removing memory statistics that we actually alert on, an error message that is now sometimes an object instead of a string, and sporadically missing required fields.

@dannycoates
Copy link
Author

@trink I'm not saying having documented types is not a goal. I am saying it shouldn't be a prerequisite for getting any logs flowing. The purpose of this document is to get a format in place that will not fail to parse in heka for any message going through. There is a type field that should be strict enough that we can write heka filters to process them if we want, but I think it is important to allow untyped dynamic payload data to at least not get dropped on the floor, and to somewhere a developer can at least look at it to decide how fucked up it is. Strict schemas for everything makes it unnecessarily hard to discover what is important, especially for unexpected errors. Our current setup needs to improve, this is working towards that.

@rfk
Copy link

rfk commented Jul 22, 2014

Suggestion: could the _* fields be cleaner if put into another sub-object, e.g. meta to go alongside payload? It's not clear what advantage we get from having them in the top-level fields.

@dannycoates
Copy link
Author

could the _* fields be cleaner if put into another sub-object, e.g. meta to go alongside payload?

Yep, I took that convention from GELF, but a meta object might be nicer. If it were named fields it would map directly to the heka message but be less clear in its purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment