Skip to content

Instantly share code, notes, and snippets.

@jorgemarsal
Last active August 29, 2015 14:16
Show Gist options
  • Save jorgemarsal/7867a195e558e07c47a8 to your computer and use it in GitHub Desktop.
Save jorgemarsal/7867a195e558e07c47a8 to your computer and use it in GitHub Desktop.

Introduction

Debugging workers and executors is hard because they are started automatically. One possible way is to sleep for a few seconds when the programs start. This gives us time to attach a debugger before the programs does anything.

Implementation

One option is to create 2 files: /tmp/r_executor_startup_sleep_secs and /tmp/r_executor_startup_sleep_secs . The first thing the workers and executors do is to check if that file exists. If it exists the processes sleep for the number of seconds specified in the file:

$ cat /tmp/r_executor_startup_sleep_secs 
30

//sleep for N secs if that file exists
FILE* f = fopen("/tmp/r_worker_startup_sleep_secs", "r");
if(f) {
  char buffer[10];
  fread(buffer, 1, 10, f);
  int secs = atoi(buffer);
  if(secs > 0 and secs < 120) {
    LOG_INFO("Sleeping for %d secs, pid: %d\n", secs, getpid());
    sleep(secs);
  }
}

When we start Distributed R we can see the following line in the worker's log:

2015-Mar-06 10:22:25.360285 [INFO] Sleeping for 30 secs, pid: 111020

At that point we can attach to the process using gdb. The worker has many threads running:

$ sudo gdb attach 111020
(gdb) info threads
  Id   Target Id         Frame 
  19   Thread 0x7f8b8dffb700 (LWP 111052) "R-worker-bin" pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
  18   Thread 0x7f8b8e7fc700 (LWP 111051) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  17   Thread 0x7f8b8effd700 (LWP 111050) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  16   Thread 0x7f8b8f7fe700 (LWP 111049) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  15   Thread 0x7f8b8ffff700 (LWP 111048) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  14   Thread 0x7f8bacff9700 (LWP 111047) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  13   Thread 0x7f8bad7fa700 (LWP 111046) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  12   Thread 0x7f8badffb700 (LWP 111045) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  11   Thread 0x7f8bae7fc700 (LWP 111044) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  10   Thread 0x7f8baeffd700 (LWP 111043) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  9    Thread 0x7f8baf7fe700 (LWP 111042) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  8    Thread 0x7f8baffff700 (LWP 111041) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  7    Thread 0x7f8bbc8fc700 (LWP 111040) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  6    Thread 0x7f8bbd0fd700 (LWP 111039) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  5    Thread 0x7f8bbd8fe700 (LWP 111038) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  4    Thread 0x7f8bbe0ff700 (LWP 111037) "R-worker-bin" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  3    Thread 0x7f8bbe900700 (LWP 111035) "R-worker-bin" 0x00007f8bc09f26a3 in epoll_wait ()
    at ../sysdeps/unix/syscall-template.S:81
  2    Thread 0x7f8bbf101700 (LWP 111034) "R-worker-bin" 0x00007f8bc09f26a3 in epoll_wait ()
    at ../sysdeps/unix/syscall-template.S:81
* 1    Thread 0x7f8bc32c07c0 (LWP 111020) "R-worker-bin" 0x00007f8bc09e4cbd in poll ()
    at ../sysdeps/unix/syscall-template.S:81
(gdb) continue

We can select different threads, set breakpoints ...:

(gdb) thread 3
(gdb) b function

The process is the same for the executor. In this case the file is /tmp/r_executor_startup_sleep_secs:

//sleep for N secs if that file exists
FILE* f = fopen("/tmp/r_executor_startup_sleep_secs", "r");
if(f) {
  char buffer[10];
  fread(buffer, 1, 10, f);
  int secs = atoi(buffer);
  if(secs > 0 and secs < 120) {
    LOG_INFO("Sleeping for %d secs, pid: %d\n", secs, getpid());
    sleep(secs);
  }
}

We can see this line in the log:

2015-Mar-06 10:22:55.392888 [INFO] Sleeping for 30 secs, pid: 111036

And we can attach to the process using gdb:

sudo gdb attach 111036

Debugging the master

Debugging the master is easy. We can attach gdb to the R session:

R> Sys.getpid()
[1] 108575

$ sudo gdb attach 108575

(gdb) info threads
Id   Target Id         Frame 
8    Thread 0x7f7be5ffb700 (LWP 111473) "R" 0x00007f7bf4b3a6a3 in epoll_wait ()
    at ../sysdeps/unix/syscall-template.S:81
7    Thread 0x7f7be67fc700 (LWP 111474) "R" 0x00007f7bf4b3a6a3 in epoll_wait ()
    at ../sysdeps/unix/syscall-template.S:81
6    Thread 0x7f7bede6d700 (LWP 111570) "R" 0x00007f7bf4b3a6a3 in epoll_wait ()
    at ../sysdeps/unix/syscall-template.S:81
5    Thread 0x7f7bed66c700 (LWP 111571) "R" 0x00007f7bf4b3a6a3 in epoll_wait ()
    at ../sysdeps/unix/syscall-template.S:81
4    Thread 0x7f7be57fa700 (LWP 111572) "R" 0x00007f7bf4b2ccbd in poll ()
    at ../sysdeps/unix/syscall-template.S:81
3    Thread 0x7f7be7fff700 (LWP 111655) "R" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
2    Thread 0x7f7be77fe700 (LWP 111656) "R" pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
* 1    Thread 0x7f7bf58307c0 (LWP 108575) "R" 0x00007f7bf4b31933 in select ()
    at ../sysdeps/unix/syscall-template.S:81
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment