Created
June 6, 2012 00:05
-
-
Save jessereynolds/2878994 to your computer and use it in GitHub Desktop.
collectd exec plugin fork freeze
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
HOSTNAME="${COLLECTD_HOSTNAME:-localhost}" | |
INTERVAL="${COLLECTD_INTERVAL:-10}" | |
while sleep "$INTERVAL"; do | |
VALUE=1.23 | |
echo "PUTVAL \"$HOSTNAME/exec-magic/gauge-magic_level\" interval=$INTERVAL N:$VALUE" | |
done |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
root@jesse-precise-desktop:~# bin/test_collectd_exec.sh | |
Wed Jun 6 09:39:24 CST 2012: starting collectd | |
Starting statistics collection and monitoring daemon: collectd,,. | |
Wed Jun 6 09:39:24 CST 2012: exec started? | |
2461 pts/2 S+ 0:00 \_ /bin/bash bin/test_collectd_exec.sh | |
2470 pts/2 S+ 0:00 \_ grep collect | |
2466 ? Ss 0:00 /usr/sbin/collectdmon -P /var/run/collectdmon.pid -- -C /etc/collectd/collectd.conf | |
2468 ? SLl 0:00 \_ collectd -C /etc/collectd/collectd.conf -f | |
Wed Jun 6 09:39:26 CST 2012: stopping collectd | |
Stopping statistics collection and monitoring daemon: collectd. | |
collectd is stopped. | |
collectd is stopped. | |
collectd is stopped. | |
collectd is stopped. | |
Wed Jun 6 09:39:30 CST 2012: killing any remaining processes | |
collectd: no process found | |
Wed Jun 6 09:39:31 CST 2012: starting collectd | |
Starting statistics collection and monitoring daemon: collectd,,. | |
Wed Jun 6 09:39:31 CST 2012: exec started? | |
2461 pts/2 S+ 0:00 \_ /bin/bash bin/test_collectd_exec.sh | |
2520 pts/2 S+ 0:00 \_ grep collect | |
2516 ? Ss 0:00 /usr/sbin/collectdmon -P /var/run/collectdmon.pid -- -C /etc/collectd/collectd.conf | |
2518 ? RLl 0:00 \_ collectd -C /etc/collectd/collectd.conf -f | |
Wed Jun 6 09:39:33 CST 2012: stopping collectd | |
Stopping statistics collection and monitoring daemon: collectd. | |
collectd (2531) is running. | |
collectd (2531) is running. | |
^C | |
root@jesse-precise-desktop:~# ps afx | grep collectd | |
2531 ? S 0:00 collectd -C /etc/collectd/collectd.conf -f | |
root@jesse-precise-desktop:~# strace -p 2531 | |
Process 2531 attached - interrupt to quit | |
futex(0x7fa983a9edb0, FUTEX_WAIT_PRIVATE, 2, NULL^C <unfinished ...> | |
Process 2531 detached |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# this starts and stops collectd and looks to see if any collectd processes remain | |
while true ; do | |
echo "`date`: starting collectd" | |
/etc/init.d/collectd start | |
echo "`date`: exec started?" | |
ps afx | grep collect | |
sleep 2 | |
echo "`date`: stopping collectd" | |
/etc/init.d/collectd stop | |
/etc/init.d/collectd status | |
sleep 1 | |
/etc/init.d/collectd status | |
sleep 1 | |
/etc/init.d/collectd status | |
sleep 1 | |
/etc/init.d/collectd status | |
echo "`date`: killing any remaining processes" | |
killall -9 collectd | |
sleep 1 | |
done |
On a slackware-ish linux running collectd 4.10.7 I get this error about one in ten times (roughly) - but it seems to go away when I comment out the ping plugin...
[/var/log/collectd.log] 2012-06-06 16:06:24 UTC exec plugin: exec_read_one: Waiting for `/usr/bin/diskmonitor' to exit.
[/var/log/collectd.log] 2012-06-06 16:06:24 UTC exec plugin: Child 7067 exited with status 15.
[/var/log/collectd.log] 2012-06-06 16:06:24 UTC exec plugin: Sent SIGTERM to 0
[/var/log/syslog] [2012-06-06 16:06:24 UTC] [INFO] daemon 127.0.0.1 collectd[7055]: exec plugin: Sent SIGTERM to 0
WIth collectd 5 this happens much more often, and again commenting out the ping plugin makes the problem go away.
I've raised a bug report here: collectd/collectd#89
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I've managed to reproduce this on a lucid vm (ubuntu 10.04.3, 32 bit) with collectd 4.8.2 but only the once, out of about fifty attempts:
Wed Jun 6 22:24:14 CST 2012: starting collectd
Starting statistics collection and monitoring daemon: collectd.
Wed Jun 6 22:24:15 CST 2012: exec started?
1518 pts/0 S+ 0:00 _ /bin/bash ./test_collectd_exec.sh
2065 pts/0 S+ 0:00 _ grep collect
2052 ? Ss 0:00 /usr/sbin/collectdmon -P /var/run/collectdmon.pid -- -C /etc/collectd/collectd.conf
2054 ? Sl 0:00 _ collectd -C /etc/collectd/collectd.conf -f
2062 ? S 0:00 _ collectd -C /etc/collectd/collectd.conf -f
Wed Jun 6 22:24:17 CST 2012: stopping collectd
Stopping statistics collection and monitoring daemon: collectd.
collectd (2062) is running.
collectd (2062) is running.
collectd (2062) is running.
collectd (2062) is running.
Wed Jun 6 22:24:21 CST 2012: killing any remaining processes
Linux lucid32 2.6.32-33-generic #70-Ubuntu SMP Thu Jul 7 21:09:46 UTC 2011 i686 GNU/Linux