This document will help you to introspect complex Python data.
We need some JSON goodness to start with.
- JSONPickle: a Python module that gives you complete dump Python objects and structures as JSON
This writeup will rely on the following tools as I fancy them, but you can replace them with alternatives (would there be any you fancy more).
- My choice of JSON parser is YAJL-Ruby.
- The primary reason to love it for is the ability to parse JSON streams,
ie. a concatenated series of JSON objects (eg.
{}[]
-- most JSON parsers barf out on this because of "trailing charcters", and can't take subsequent action on{}
and[]
). - The other reason to love it for is the very lightweight and intutive API. Much more likable than that of the similar Python binding!
- The primary reason to love it for is the ability to parse JSON streams,
ie. a concatenated series of JSON objects (eg.
Of course, in turn, you need Yajl and Ruby installed for this.
- My choice of JSON formatter is underscore-cli.
-
The primary reason to love it for is that it formats the data with awareness of width. Most JSON formatters would format
{"a":1,"b":2}
either as
-
{"a": 1, "b": 2} ```
or
```json
{ "a": 1, "b": 2 } ```
depending on whether indentation was asked for or not.
_underscore-cli_ decides about formatting depending on data and screen
size, and applies one-line formatting for small data, while indented
formatting for larger data, thus achieving optimal readilbility.
- Other reasons to love it for:
- it features a functional API for manipulating and extracting information from JSON
- it features colorized output
It's written for Node.js, so you'll need to have that installed.
Add the following code to your sitecustomize.py (or usercustomize.py):
import os
import time
from gzip import GzipFile
from threading import Lock, current_thread
from string import Template
import distutils.dir_util as du
import jsonpickle
class Jlog(object):
def __init__(self, ftmp):
self.pathtemp = Template(ftmp)
self.lock = Lock()
self.jreg = {}
params = {'pid': os.getpid}
def getpath(self):
pd = {}
for k,v in self.params.iteritems():
if callable(v):
v = v()
pd[k] = v
return self.pathtemp.substitute(pd)
@property
def jhandle(self):
path = self.getpath()
if not self.jreg.get(path):
du.mkpath(os.path.dirname(path))
self.jreg[path] = GzipFile(path, 'wb')
return self.jreg[path]
def jlog(self, *a, **kw):
for i in range(len(a)):
kw["data%02d" % i] = a[i]
d = {'data': kw, 'time': time.time(), 'pid': os.getpid(), 'thread': current_thread().getName()}
with self.lock:
self.jhandle.write(jsonpickle.encode(d))
self.jhandle.flush()
ld = {}
for k,v in d['data'].iteritems():
ld[k] = [len(jsonpickle.encode(v))]
d['data'] = ld
print 'JLOG ' + jsonpickle.encode(d)
jlog = Jlog("/tmp/jlog/${pid}.json.gz").jlog
This sets up a canonical Jlog
instance (called jlog
) for any Python script
you execute. Then in your Python code you can just add
from sitecustomize import jlog
...
jlog(<args>, <keywords>)
For example:
from threading import Thread
from sitecustomize import jlog
class Rectangle(object):
def __init__(self,h,w):
self.height = h
self.width = w
def sayhirect(h,w):
jlog("hello", "shape", shape=Rectangle(h,w))
t = Thread(target=sayhirect, args=(3,4))
t.start()
sayhirect(5,6)
This will print a message to stdout to get a hint what's going on, something like:
JLOG {"pid": 19869, "data": {"shape": [60], "data00": [7], "data01": [7]}, "thread": "MainThread", "time": 1378916569.103762}
JLOG {"pid": 19869, "data": {"shape": [60], "data00": [7], "data01": [7]}, "thread": "Thread-1", "time": 1378916569.100782}
The actual log is written to /tmp/jlog/<pid>.json.gz, so in this particular example, to /tmp/jlog/19869.json.gz, as a gzipped JSON stream. The JSON stream looks like:
{"pid": 19869, "data": {"shape": {"py/object": "__main__.Rectangle", "width": 4, "height": 3}, "data00": "hello", "data01": "shape"}, "thread": "Thread-1", "time": 1378916569.100782}{"pid": 19869, "data": {"shape": {"py/object": "__main__.Rectangle", "width": 6, "height": 5}, "data00": "hello", "data01": "shape"}, "thread": "MainThread", "time": 1378916569.103762}
We can get it by zcat(1)-ing the file. But that's not the best way to view it.
So basically we want to have the JSON dump fed to underscore-cli to get a royal view; alas, it can't handle JSON streams. A little snippet of Ruby in Yajl for the rescue!
#!/usr/bin/env ruby
require 'yajl'
sel = $*.map { |i|
case i
when /\A(-?\d+)\.\.(\.?)(-?\d+)\Z/
Range.new *([$1, $3].map { |j| Integer j } << ($2 == "."))
else
Integer i
end
}
w = []
Yajl::Parser.new.parse(STDIN) { |o| w << o }
w = w.values_at *sel unless sel.empty?
Yajl::Encoder.encode w, STDOUT
Save it as jwrap.rb, place to your $PATH and set it executable (or not, but the following examples will assume that). jwrap.rb wraps the elements of the JSON stream into a single JSON array and output that to stdout. Besides:
- Passing integer arguments to it, it will select only the objects of given indices; negative indices are accepted. Thus jwrap.rb 0 1 selects the first two objects, while jwrap.rb -1 selects the last object.
- You can also have command line arguments of
Ruby range syntax, whereas
i..j
represents an inclusive range (eg.3..6
consists of 3, 4, 5, 6), whilei...j
represents an exclusive range (eg.3...6
consists of 3, 4, 5); negative range bounds are accepted. Thus jwrap.rb -10..-1 will select the last ten objects.
Beware that this code is optimized for simplicity not efficiency -- if you
happen to jlog
gigabytes of data, you'll have to come up with a smarter
version. It will do for us for everyday introspection.
Back to our above example, we can do now the following:
$ zcat /tmp/jlog/19869.json.gz | jwrap.rb | underscore print
which gives:
[
{
"pid": 19869,
"data": {
"shape": { "py/object": "__main__.Rectangle", "width": 4, "height": 3 },
"data00": "hello",
"data01": "shape"
},
"thread": "Thread-1",
"time": 1378916569.100782
},
{
"pid": 19869,
"data": {
"shape": { "py/object": "__main__.Rectangle", "width": 6, "height": 5 },
"data00": "hello",
"data01": "shape"
},
"thread": "MainThread",
"time": 1378916569.103762
}
]
or if we are interested only in the last dump entry:
$ zcat /tmp/jlog/19869.json.gz | jwrap.rb -1 | underscore print
which gives:
[
{
"pid": 19869,
"data": {
"shape": { "py/object": "__main__.Rectangle", "width": 6, "height": 5 },
"data00": "hello",
"data01": "shape"
},
"thread": "MainThread",
"time": 1378916569.103762
}
]