heheheh
y'all are gonna love this
I figured out why logentries keeps sending log files
it's a doozy
so I strace
d multilog
for /service/auth_api-1/log/current
root@authapi-stage01:~# strace -p 4463
Process 4463 attached - interrupt to quit
read(0, "[I 160229 22:32:38 web:1946] 200"..., 1024) = 68
rt_sigprocmask(SIG_BLOCK, [TERM], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
write(6, "[I 160229 22:32:38 web:1946] 200"..., 68) = 68
rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
read(0, "[I 160229 22:36:38 web:1946] 200"..., 1024) = 68
rt_sigprocmask(SIG_BLOCK, [TERM], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
write(6, "[I 160229 22:36:38 web:1946] 200"..., 68) = 68
rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
read(0,
it writes once every 4 minutes
I also edited /usr/share/logentries/logentries/le.py
on authapi-stage01
to add some debug output
2016-02-29 22:36:38.868867: detected rename
2016-02-29 22:36:38.870385: (og pos: 0, set_pos: 406673) found lines: 923
2016-02-29 22:36:38.874427: (og pos: 65024, set_pos: -1) found lines: 932
2016-02-29 22:36:38.878591: (og pos: 130031, set_pos: -1) found lines: 936
2016-02-29 22:36:38.883452: (og pos: 194989, set_pos: -1) found lines: 956
2016-02-29 22:36:38.887654: (og pos: 259997, set_pos: -1) found lines: 954
2016-02-29 22:36:38.893517: (og pos: 325005, set_pos: -1) found lines: 946
2016-02-29 22:36:38.908130: (og pos: 389997, set_pos: -1) found lines: 89
...
2016-02-29 22:38:39.128115: detected rename
2016-02-29 22:38:39.129537: (og pos: 0, set_pos: 396002) found lines: 869
2016-02-29 22:38:39.134631: (og pos: 65024, set_pos: -1) found lines: 951
2016-02-29 22:38:39.140581: (og pos: 129990, set_pos: -1) found lines: 936
2016-02-29 22:38:39.156899: (og pos: 194954, set_pos: -1) found lines: 956
2016-02-29 22:38:39.172140: (og pos: 259911, set_pos: -1) found lines: 955
2016-02-29 22:38:39.196812: (og pos: 324919, set_pos: -1) found lines: 948
2016-02-29 22:38:39.210538: (og pos: 389883, set_pos: -1) found lines: 246
every 2 minutes it detects a rename
it turns out, because we're telling le
to follow /service/auth_api*/log/current
it's actually tailing two files (because we're running 2 instances of auth_api
)
the health checks come in every 4min at a 2min offset to each file
I confirmed this by watching stat
output:
Every 0.2s: stat /service/auth_api-1/log/current Mon Feb 29 22:49:53 2016
File: `/service/auth_api-1/log/current'
Size: 396206 Blocks: 784 IO Block: 4096 regular file
Device: ca01h/51713d Inode: 1051563 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2016-02-29 22:36:38.863352684 -0500
Modify: 2016-02-29 22:48:38.281523028 -0500
Change: 2016-02-29 22:48:38.281523028 -0500
Birth: -
mtime
wouldn't change, but it would still detect a rename
this is the reason: https://github.com/logentries/le/blob/master/src/le.py#L1484-L1500
it globs all the files, sorts by mtime
(finding the _other_ file which had been modified) and then restarts from the beginning
Transcript of @mreiferson's findings when digging into logentries agent:
https://buzzfeed.slack.com/archives/platform-infra-squad/p1456803962000917