Heroku's default logging format omits user agent and referrer strings - but these are useful to have! Especially if you want to be able to run your own analysis on what kind of browsers are using your service.
A default Heroku log line looks like this:
heroku/router: at=info method=GET path="/page-on-your-site/"
host=your-site.herokuapp.com
request_id=76fe0891-38cd-4293-a082-ac075d744197
fwd="10.10.10.10" dyno=web.1 connect=1ms service=961ms
status=200 bytes=1274
It isn't possible to change the format output by the Heroku router. But it IS possible to write your own lines to the heroku logs. You can do this from your own application, but it turns out there are some solid reasons to run your own nginx server inside your Heroku dyno (for buffering large requests). If you're doing that already, it's easy to configure that nginx instance to use whatever custom logging format you like.
If you use the Heroku nginx buildpack (or one of its forks), the buildpack will install a nginx configuration file in the following location:
config/nginx.conf.erb
This happens before your app is copied from your git checkout, which means you can replace the nginx config file with your own just by storing your own file in the same location within your application's repository.
This means you can customise the log_format line from that file. Here's the format I'm using, designed to use the semi-standard key="optionally quoted value" format used by other parts of the Heroku stack:
log_format custom 'measure#nginx.service=$request_time request="$request" '
'status_code=$status request_id=$http_x_request_id '
'remote_addr="$remote_addr" forwarded_for="$http_x_forwarded_for" '
'forwarded_proto="$http_x_forwarded_proto" via="$http_via" '
'body_bytes_sent=$body_bytes_sent referer="$http_referer" '
'user_agent="$http_user_agent"';
access_log logs/nginx/access.log custom;
Just droy in a copy of the https://github.com/ryandotsmith/nginx-buildpack/blob/master/config/nginx.conf.erb file and make the above alteration to get completely custom logging in your Heroku logs, which looks something like thys:
Feb 05 13:31:55 your-site app/web.1: measure#nginx.service=0.023
request="GET /blah/ HTTP/1.1" status_code=200 request_id=8ff12bfba-2136
-472b-8b73-e8b5bee4296d remote_addr="10.10.10.10" forwarded_for="1.1.1.1"
forwarded_proto="https" via="1.1 vegur" body_bytes_sent=919 referer="-"
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
You might be questioning how reliable this key="quoted value" format style of logging actually is. What if a double quote shows up in the content of the value, for example in the user-agent string? Would that break a log parser that expects to be able to read these key value pairs?
It turns out nginx has you covered: it automatically escapes double quotes (and some other characters) when it interpolates values in the log_format string. Here's an example of a request I generated using a faked user-agent:
requests.get('http://example.com/test/', headers={
'User-Agent': 'This user-agent has "double quotes" in it'
})
The resulting nginx logline looked like this:
Feb 05 13:25:22 my-heroku-app app/web.1: measure#nginx.service=0.009
request="GET /test/ HTTP/1.1" status_code=404 request_id=8433b02c-307c-
4d07-b2c6-8d5b37c3e82b remote_addr="10.0.0.0" forwarded_for="1.1.1.1"
forwarded_proto="http" via="1.1 vegur" body_bytes_sent=90 referer="-"
user_agent="This user-agent has \x22double quotes\x22 in it"
The double quotes were escaped as \x22. I also tested special characters in the path and they get the same treatment.
Here's a regular expression for parsing data in this key=optionally-quoted-value format:
Example output:
Note that some of the values will still have their quotes on them, so code should rstrip/lstrip double quotes from values before continuing.