Skip to content

Instantly share code, notes, and snippets.

@drocco007
Last active August 29, 2015 14:02
Show Gist options
  • Save drocco007/3422da1874e309421120 to your computer and use it in GitHub Desktop.
Save drocco007/3422da1874e309421120 to your computer and use it in GitHub Desktop.
Notes from my PyAtl talk, June 12, 2014

py❤gdb

http://img.izismile.com/img/img6/20130913/640/prepare_to_panic_now_640_02.jpg

tl;dr

Attach to a running Python process using gdb:

sudo gdb python <pid>

List the Python frames:

(gdb) py-bt

List Python local variables:

(gdb) py-locals

Motivating Example

My motivating example was an unknown bug in one of our systems that caused a process's CPU usage to permanently spike to 100%. Because the bug was tickled by a web request that never completed, we couldn't tell what was wrong as no log message for the request was ever written.

Later we discovered that it was a two-part failure: how we were calling PyPDF and what PyPDF does if you give it a buffer that does not, in fact, contain a PDF. Here is a small program that demonstrates the problem (pyPdf==1.3):

from StringIO import StringIO
from pyPdf import PdfFileReader

PdfFileReader(StringIO('I\'m Not a PDF. BOOM!'))

In the talk, I first sketched out the problem, then ran this program and demonstrated using the commands above to attach to it and view the execution state:

(gdb) py-bt
#5 Frame 0x7f7c97173608, for file /home/dan/.envs/3a45b9a9208e0d61/local/lib/python2.7/site-packages/pyPdf/pdf.py, line 870, in readNextEndLine (self=<PdfFileReader(flattenedPages=None, resolvedObjects={}) at remote 0x7f7c9848fd10>, stream=<StringIO(softspace=0, buflist=[], pos=0, len=20, closed=False, buf="I'm Not a PDF. BOOM!") at remote 0x7f7c9717b7e8>, line='IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII...(truncated)
    line = x + line
#8 Frame 0x201cea0, for file /home/dan/.envs/3a45b9a9208e0d61/local/lib/python2.7/site-packages/pyPdf/pdf.py, line 705, in read (self=<PdfFileReader(flattenedPages=None, resolvedObjects={}) at remote 0x7f7c9848fd10>, stream=<StringIO(softspace=0, buflist=[], pos=0, len=20, closed=False, buf="I'm Not a PDF. BOOM!") at remote 0x7f7c9717b7e8>, line='')
    line = self.readNextEndLine(stream)
#11 Frame 0x7f7c97174210, for file /home/dan/.envs/3a45b9a9208e0d61/local/lib/python2.7/site-packages/pyPdf/pdf.py, line 374, in __init__ (self=<PdfFileReader(flattenedPages=None, resolvedObjects={}) at remote 0x7f7c9848fd10>, stream=<StringIO(softspace=0, buflist=[], pos=0, len=20, closed=False, buf="I'm Not a PDF. BOOM!") at remote 0x7f7c9717b7e8>)
    self.read(stream)
#22 Frame 0x7f7c984ef208, for file /home/dan/source/sandbox/boom.py, line 4, in <module> ()
    PdfFileReader(StringIO('I\'m Not a PDF. BOOM!'))
(gdb)

Installation and Gotchas

For Ubuntu systems, gdb and the Python↔gdb tools need to be installed:

sudo apt-get install gdb python2.7-dbg

Older versions of Ubuntu (< 13.10?) need a patch to the Python tools (see here for details: https://bugs.launchpad.net/ubuntu/+source/gdb/+bug/1241668):

wget -O - http://hg.python.org/cpython/raw-file/ef4636faf8bd/Tools/gdb/libpython.py | sudo tee /usr/lib/debug/usr/bin/python2.7-gdb.py

If you see messages like this when connecting to a process (note the in ??s in the backtrace):

(gdb) py-bt
#11 (unable to read python frame information)
(gdb) bt
#0  0x00007fa3df3b5ef6 in ?? ()
#1  0x000000000048c9a8 in PyFloat_FromString (v=<optimized out>, pend=<optimized out>) at ../Objects/floatobject.c:223
#2  0x0000000000536a69 in getc_unlocked (__fp=0x5c00e82af6) at /usr/include/x86_64-linux-gnu/bits/stdio.h:65
#3  Py_UniversalNewlineFgets (buf=0x5dd4f <error: Cannot access memory at address 0x5dd4f>, n=<optimized out>, stream=0x5c00e82af6,
    fobj='IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII') at ../Objects/fileobject.c:2750
#4  0x00007fa3de6dd5f0 in ?? ()
#5  0x0000000000e400a0 in ?? ()
#6  0x0000000000f38108 in ?? ()
#7  0x0000000000f22600 in ?? ()
#8  0x0000000000000002 in ?? ()
#9  0x0000000000f2284f in ?? ()
#10 0x000000000052e672 in PyErr_Occurred () at ../Python/errors.c:80
#11 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:2308
#12 0x0000000000000000 in ?? ()
(gdb)

it probably means that the Python executable you are using is out of sync with the system executable. This can happen, for example, if you have a virtualenv that was made prior to an update to the system Python.

Alternatives and References

Rick Copeland mentioned Pyrasite, which can do some crazy stuff with a running Python process: http://pyrasite.readthedocs.org/en/latest/

There's another class of solutions that require instrumenting your code beforehand, but that might provide a "friendlier" interface; see this SO thread for a discussion: http://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application

Collected references:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment