Skip to content

Instantly share code, notes, and snippets.

@tairov
Forked from tonyc/gist:1384523
Created April 15, 2013 06:20
Show Gist options
  • Save tairov/5386094 to your computer and use it in GitHub Desktop.
Save tairov/5386094 to your computer and use it in GitHub Desktop.

Using strace and lsof to debug blocked processes

You can use strace on a specific pid to figure out what a specific process is doing, e.g.:

strace -fp <pid>

You might see something like:

select(9, [3 5 8], [], [], {0, 999999})   = 0 (Timeout)

In this case, 3, 5 and 8 are the file descriptors select() may read from, and the 9 will be ([highest FD] + 1).

{0, 999999} is a time struct which says that select will wait just under one second to timeout.

= 0 (Timeout) is the return value of select, indicating that none of the file descriptors were ready to read from.

Now to figure out what these specific file descriptors are.

As root, run:

lsof -p <pid> -ad <file_handles>

to see what it's doing, like waiting for a response over a socket. You can also separate file handles with a comma:

[root@ops-2-portal ~]# lsof -p 2947 -ad 3,5,8
COMMAND    PID  USER   FD   TYPE   DEVICE SIZE NODE NAME
mongrel_r 2947 deploy    3u  IPv4 57390385       TCP *:vcom-tunnel (LISTEN)
mongrel_r 2947 deploy    5u  IPv4 57390749       TCP ops-2-portal:42717 (LISTEN)
mongrel_r 2947 deploy    8u  IPv4 58983912       TCP ops-2-portal:35191->ops-2-websvc:7077 (ESTABLISHED)

As you can see, select() was looking for data on these file handles, and with the presence of FD 8, you can determine that this mongrel has a TCP connection established to ops-2-websvc:7077, but isn't reading any data.

Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment