Skip to content

Instantly share code, notes, and snippets.

@ktilcu
Last active August 15, 2017 20:52
Show Gist options
  • Select an option

  • Save ktilcu/c4cea85ba2c263a05581 to your computer and use it in GitHub Desktop.

Select an option

Save ktilcu/c4cea85ba2c263a05581 to your computer and use it in GitHub Desktop.
kyle-Experience: Mysteriously used disk space

Every Sysadmin has gotten the call before...

^ logs full again, the fix didn't work :\

I'm obviously not immune to servers with full disks. So, I log in a run df sure enough... disk is full. This has happened numerous times on this server. The daemon running on it produces an obscene amount of logs and the disk only has 25G. So each time we just log in and delete the logs.

Filesystem                                              Size  Used Avail Use% Mounted on
rootfs                                                   25G   25G     0 100% /

When I went to delete the log files there were barely any there. Only like 20M. Well then...

Next step is to find the folder that has the offending large files. Most likely some other logs or a stray uploaded file.

du -h --max-depth=1 /
...
...
11G	/

Wait... only 11G taken up? Whats going on? I ran it again. Same thing. Time to ask Google whats going on.

link

My search led my to a serverfault post where they talk about mounting on non-empty directories. mount didn't show anything unusual mounted. So looking further down in the post I saw another suggestion. Sometimes processes hold onto log in memory. Kinda like a log purgatory. They aren't deleted from the filesystem because the process still has a lock on it. One quick run of lsof | grep "/var" | grep deleted showed that indeed all those logs we had deleted before weren't actually being totally deleted. I bounced the process that was running under supervisor with supervisorctl restart <process>. Check df again and:

Filesystem                                              Size  Used Avail Use% Mounted on
rootfs                                                   25G   11G   13G  45% /
udev                                                     10M     0   10M   0% /dev
tmpfs                                                   100M  164K  100M   1% /run
/dev/disk/by-uuid/3a57ffc4-0a9b-4d7a-8bbd-af849b5384a5   25G   11G   13G  45% /
tmpfs                                                   5.0M     0  5.0M   0% /run/lock
tmpfs                                                   610M     0  610M   0% /run/shm
/dev/xvda1                                              236M   21M  203M  10% /boot

Sweet Victory.

@tmillner
Copy link
Copy Markdown

Cool find. Would there be any way to prevent such an event from manifesting in the first place? Or is this serverfault an inevitable random-chance event?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment