- What we'd like to do: debug Python software as it's customary -- ie. break into pdb, the Python Debugger and investigate the live code.
- Why we can't do that: because the way we run the code, the standard input is not a tty. Pdb assumes interaction via a terminal. (Note: thus the other way for us would be to force the code run in terminal. It's worth to explore but now we go in another way.)
- What we will do: find alternative ways of debugging and introspection that do not rely on stdin. (Note: in our case, while stdin is problematic, we can easy see the stdout. If that does not hold in your case, however you'd like to apply these techniques, replace the print statements with the kind of logging mechanism that's available for you.)
We use Python 2.7.
## Remote PdbA hack to get at a networked Pdb session, useful in the case when stdin is not a tty.
Place attached rdb.py (stolen from
here,
with some adjusments) file somewhere to your $PYTHONPATH. You can do then
import rdb; rdb.set_trace()
just like with stock pdb. It will print the port
on which the debug session is spawned like PDB listening on 6902
(if you
don't see stdout, you can try to find out the port by lsof(8) & co.). Then
you can just telnet localhost 6902
.
Issues:
- no readline support (you can add it externally with rlwrap)
- no permanent session. If you set a breakpoint and press
c
, the connection drops and the followup break will spawn on stdin, not on the network
However, allegedly it supports multiple sessions (ie., if the program hits
set_trace
multiple times, a new rdb server will spawn for each)(I haven't
tried).
Another take on remoting Pdb is Rpdb (thanks Prasanth Pai for the hint). I found it's neither perfect, has similar but slightly different issues. You can give it a try.
## Stack printingPut the following snippet into your code:
import threading,sys,traceback
def dumpstacks(signal=None, frame=None):
id2name = dict([(th.ident, th.name) for th in threading.enumerate()])
code = []
for threadId, stack in sys._current_frames().items():
code.append("\n# Thread: %s(%d)" % (id2name.get(threadId,""), threadId))
for filename, lineno, name, line in traceback.extract_stack(stack):
code.append('File: "%s", line %d, in %s' % (filename, lineno, name))
if line:
code.append(" %s" % (line.strip()))
print "\n".join(code)
then you can just call dumpstacks()
to get a stack trace printed to stdout.
Additionally, if you set
import signal
signal.signal(signal.SIGUSR1, dumpstacks)
somewhere in the main code path (ie. what's get called on program startup)
you can get a stack trace at any point by sending SIGUSR1
to your program.
The most convenient way to accomplish these is to use the
sitecustomize/usercustomize
feature of Python that allows you to specify code which is loaded in each
Python program (unless you explicitly ask not to via the -S
option of the
interpreter), ie. it's always in the main code path.
Just create the sitecustomize.py file with the above content in your Python
site dir (installation and version dependent, something like
/usr/lib/python2.7/site-packages/). Then the SIGUSR1
stack printing will
be always enabled, while in code you can get a stackdump by
from sitecustomize import dumpstacks; dumpstacks()
.
I'll provide the instructions in two flavors:
- Fedora (tested with 19)
- general instructions
On Fedora support for this feature is nicely built in. In general, you have to compile a suitable Python by yourself and make some additional adjustments.
- on Fedora:
# yum install yum-utils
# debuginfo-install python
- in general:
Python follows the standard autotools build procedure of
./configure && make && make install
. Perform the build with one change: replace the plainmake
invocation withmake OPT="-ggdb -O0"
. If you are performing the build through a package/build manager, make sure the build manager does not strip the binaries (eg. on Arch Linux, if you build using the python2 PKGBUILD, add'!strip'
to theoptions
array).
Note: on RHEL/CentOS, similarly to Fedora, a debuginfo package is available.
- on Fedora: it just works as is. You run the Python script under
Gdb (either
gdb python <script>
orgdb -p <pid-of-running-script>
and you'll have access to thepy-*
commands likepy-bt
to show a Python backtrace. - in general:
-
Make a note of the location of the Python source tree.
-
Add the following to your ~/.gdbinit:
define py-load python import sys; sys.path.insert(0, "/Tools/gdb/"); import libpython end
-
Run your Python script under Gdb, as discussed above. When you drop to the Gdb prompt for the first time, type
py-load
which will load the Python support routines. (Note: I tried to have them loaded automatically from ~/.gdbinit but then they did not work properly. Most likely they presuppose that the Python debug symbols are already available. If you load them only from the prompt, by that time this condition is fulfilled.)
-
Note: on RHEL/CentOS it seems that the py-*
commands are not integrated to
the build, so you have to follow the general instructions. You can get the
Python source if you fetch the SRPM (cf. yumdownloader(1)) of you can get
libpython.py right from the
source repository
(direct download url).
This is an older mechanism that predates Python scripting support in Gdb -- a
collection of routines written directly in Gdb's command language to extract
information from the Python VM's internal data structures. They provide the
py*
commands (ie., prefixed with "py" but no hyphen, like pystack
). They are
considered deprecated, but are of interest for us for two purposes:
- Their output contains less information. That can be advantageous if we want terse output, easy to parse for the eye.
- If we want to add some convenience commands of our own, they serve as good reference.
The routines are included in Python source repo as Misc/gdbinit (direct download url). To use them, download the file and either add their contents to ~/.gdbinit or keep it separately and pull them in with
source <path-to-downloaded-file>
At this point we have basic introspection capabilities for the Python runtime, but still we can't do things that's considered basic for a debugger, most eminently, breaking and stepping. That's what we want to achieve.
Playing around, one can see that the C function that facilitates the invocation
of Python functions is called PyEval_EvalFrameEx
. Looking into the Gdb Python
routines, we can see how to extract the function name and file from the
parameters of PyEval_EvalFrameEx
. Thus we can put together the following
command:
define pybr
if $argc == 1
break PyEval_EvalFrameEx if strcmp((char *)(*(PyStringObject*)f.f_code.co_name).ob_sval, $arg0) == 0
end
if $argc == 2
break PyEval_EvalFrameEx if strcmp((char *)(*(PyStringObject*)f.f_code.co_name).ob_sval, $arg0) == 0 && \
strcmp((char *)(*(PyStringObject*)f.f_code.co_filename).ob_sval, $arg1) == 0
end
end
document pybr
Python break
end
(This is, needless to say, suggested for inclusion in ~/.gdbinit or some
other Gdb command file you would source
.)
So the first argument of pybr
is the function to break at, the second,
optional is the name of the file that includes the function. Note that its
arguments should be passed as strings and not as identifiers, for example pybr "GET"
, or pybr "GET" "monkeyserver.py"
. Another caveat is whether to use
absolute or relative filenames -- that might depend on the way of having the
program invoked. You can discover the actual file naming convention by checking
py-bt
or pystack
's output.
Given that hitting a Python function means hitting PyEval_EvalFrameEx
in the
C runtime, I suggest you the following practice for stepping in Python code:
- when you want to start stepping, do
break PyEval_EvalFrameEx
(make a note of the index of this breakpoint) - just hit
c
(continue
) to step forward - if you want to continue in Python, disable this breakpoint by
dis <index-of-breakpoint>
and thenc
. - if you want to step in Python again, enable the breakpoint by
en <index-of-breakpoint>
.
Practically (if no other automatic breakpoint setting interferes) you can add
break PyEval_EvalFrameEx
disable 1
to your ~/.gdbinit so that the PyEval_EvalFrameEx
breakpoint will be of index 1 and
disabled on start; and then you can enable Python-stepping by en 1
, and disable
it by dis 1
.