Every Sysadmin has gotten the call before...
^ logs full again, the fix didn't work :\
I'm obviously not immune to servers with full disks. So, I log in a run df
sure enough... disk is full. This has happened numerous times on this server.
The daemon running on it produces an obscene amount of logs and the disk only
has 25G. So each time we just log in and delete the logs.
Filesystem Size Used Avail Use% Mounted on
rootfs 25G 25G 0 100% /
When I went to delete the log files there were barely any there. Only like 20M. Well then...
Next step is to find the folder that has the offending large files. Most likely some other logs or a stray uploaded file.
du -h --max-depth=1 /
...
...
11G /
Wait... only 11G taken up? Whats going on? I ran it again. Same thing. Time to ask Google whats going on.
My search led my to a serverfault post where they talk about mounting on
non-empty directories. mount didn't show anything unusual mounted. So looking
further down in the post I saw another suggestion. Sometimes processes hold
onto log in memory. Kinda like a log purgatory. They aren't deleted from the
filesystem because the process still has a lock on it. One quick run of lsof | grep "/var" | grep deleted showed that indeed all those logs we had deleted
before weren't actually being totally deleted. I bounced the process that was
running under supervisor with supervisorctl restart <process>. Check df
again and:
Filesystem Size Used Avail Use% Mounted on
rootfs 25G 11G 13G 45% /
udev 10M 0 10M 0% /dev
tmpfs 100M 164K 100M 1% /run
/dev/disk/by-uuid/3a57ffc4-0a9b-4d7a-8bbd-af849b5384a5 25G 11G 13G 45% /
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 610M 0 610M 0% /run/shm
/dev/xvda1 236M 21M 203M 10% /boot
Sweet Victory.
Cool find. Would there be any way to prevent such an event from manifesting in the first place? Or is this serverfault an inevitable random-chance event?