Skip to content

Instantly share code, notes, and snippets.

@hopeseekr
Created February 3, 2018 04:01
Show Gist options
  • Save hopeseekr/cd2058e71d01deca5bae9f4e5a555440 to your computer and use it in GitHub Desktop.
Save hopeseekr/cd2058e71d01deca5bae9f4e5a555440 to your computer and use it in GitHub Desktop.
Putting Docker on its own pseudo filesystem

Docker on BTRFS is very buggy and can result in a fully-unusable system, in that it will completely butcher the underlying BTRFS filesystem in such a way that it uses far more disk space than it needs and can get into a state where it cannot even delete any image, requiring one to take drastic actions up to and including reformatting the entire affected BTRFS root file system.

According to the official Docker documentation:

btrfs requires a dedicated block storage device such as a physical disk. This block device must be formatted for Btrfs and mounted into /var/lib/docker/.

In my experience, you will still run into issues even if you use a dedicated partition. No, it seems it requires a standalone hard drive, which is a luxury many computers just simply cannot afford.

See Docker gradually exhausts disk space on BTRFS #27653 for details of exactly what I have run into. Also, docker does not remove btrfs subvolumes when destroying container

A pseudo filesystem is a filesystem that is contained inside an otherwise-ordinary file, that is mounted by the OS. This guide will show you how to set one up and use it exclusively for Docker images and containers in a way that will NOT cripple your BTRFS file system, but also allows you to store it in normal BTRFS subvolume snapshots.

Steps to migrate /var/lib/docker from a subdirectory to a dedicated pseudo filesystem.

System Preparation:

  1. BACKUP ANY IMPORTANT self-made Docker images! This guide will destroy all of your existing images and containers. docker save image/name -o image_name.docker; bzip2 image_name.docker
  2. Open up a terminal and run the command sudo watch -n10 df /var/lib/docker. Pay attention to the total space availabler. Because BTRFS deletes files from the system only when the disk is inactive, it is important to know when certain processes have really finished, or if they are even happening. In a BTRFS file system that is corrupted by Docker, many times no file will actually be removed from the underlying file system. If this happens, refer to the Drastic Actions section.
  3. Make a BTRFS volume snapshot!! We are messing with your core file system. It is important to make a snapshot. If all goes to Hell, refer to the Drastic Actions section for how to restore the snapshot and get quickly back to work. sudo mkdir /snaps sudo btrfs subvolume snapshot / /snaps/root-$(date '+%Y-%m-%d')-pre

Clean Up Docker /var/lib/docker files.

  1. Delete all of the docker containers: docker rm $(docker ps -aq)

    Afterwards, docker ps -aq should return nothing.

  2. Delete all of the docker images. docker rmi -f $(docker images -q) NOTE: If you do not see any activity for several minutes, it is indicative of a BTRFS meltdown. To verify for sure, run sudo du -hs /var/lib/docker. If it is still running after 3-5 minutes, refer to the Drastic Actions section.

    Afterwards, docker images -q should return nothing.

  3. Stop docker. sudo systemctl stop docker NOTE: When docker has butchered the BTRFS file system, stopping docker will many times NOT be stoppable via this step. Fortunately, a simple system reboot resolves this issue. Do that now if you encounter this problem.

3b. Ensure that docker is completely stopped. ps aux | grep docker 4. Explore the /var/lib/docker director:

sudo -s
cd /var/lib/docker
du -h --max-depth=1 | sort -h

Because you have deleted literally 100% of the files which docker stores, your /var/lib/docker should be virtually empty. Maybe a few MB max. However, if Docker has been abusing the underlying root BTRFS system, many times many GBs will still be stored. 5. Attempt to remove all of the files manually: DO NOT USE THE rm COMMAND! This will not work, and if it does, you will have irreversibly corrupted your BTRFS system. Go immediately to the Drastic Actions section if you have accidentally done so.

As discussed in nuking old and broken /var/lib/docker directories is non-trivial, the only safe way to remove broken /var/lib/docker files on BTRFS is to do the following:

for subvolume in /var/lib/docker/btrfs/subvolumes/*; do
    btrfs subvolume delete $subvolume
done
  1. Ensure that all docker BTRFS subvolumes have been destroyed: btrfs subvolume list / You should not see any entries with the path /var/lib/docker.
  2. Manually remove all the other files in /var/lib/docker: rm -r /var/lib/docker/* Ensure that it is empty by running both ls and du -h ., both of which should report 0 disk space used.

If all has gone well, you now have a BTRFS file system that is devoid of all docker-related images, containers and various metadata and caches. Congratulations!

Create the pseudo file system

  1. Ensure that you are the root user. sudo -s
  2. Create the pseudo filesystem: The best place to store file-based pseudo filesystems is in /media.

Estimate how much space you will need, or want to reserve, for Docker images. I find that 10-20 GB is far more than enough for properly functioning systems.

cd /media
fallocate -l 10G docker-volume.img
mkfs.ext4 docker-volume.img
mount -o loop -t ext4 /media/docker-volume.img /var/lib/docker
df -h
# You should see: /dev/loop0      9.8G   37M  9.3G   1% /var/lib/docker
umount /var/lib/docker
  1. Add the pesudo filesystem to the "mount on boot" config. echo "/media/docker-volume.img /var/lib/docker ext4 defaults 0 0" >> /etc/fstab
  2. Test mount it: mount /var/lib/docker
  3. Restart docker and confirm that it is using the pseudo filesystem:
systemctl start docker
systemctl stop docker
cd /media
ls /var/lib/docker    # You should see many subdirectories. 
du -h /var/lib/docker # It should report approximately 35 directories, and about 256 KB of space used.
                      # You should NOT see any mention of BTRFS subvolumes.
umount /var/lib/docker
du -h /var/lib/docker # You should see: 0	/var/lib/docker/
  1. Now reboot the system and confirm that the volume has auto-mounted and that docker is using it.

Congratulations! You have now moved Docker volumes from BTRFS to a pseudo ext4 file system, which docker supports much better!

  1. IMPORTANT: Take a new snapshot of the fixed system and remove the one we made at the beginning of this guide.
sudo btrfs subvolume snapshot / /snaps/root-$(date '+%Y-%m-%d')
sudo btrfs subvolume del /snaps/root-$(date '+%Y-%m-%d')-pre

If you ever run into a corrupted /var/lib/docker in the future, simply sudo rm /media/docker-volume.img and repeat this guide. It is much better than risking your entire BTRFS file system to docker's buggy implementation!

Drastic Actions

Attempt a BTRFS restore

Things didn't go so well? Unfortunately, this happens.

First things first, attempt to restore an older snapshot that may not be corrupted.

Follow the guide here: Using Btrfs for Easy Backup and Rollback

If that fails, restore the snapshot taken in the prep stage of this guide. That will at least get you back to the same state your system was in before you started all of this.

Attempt via a rescue disk

Mount the partition while inside a recovery system like System Rescue CD and reattempt this guide from the very beginning.

When I was in a total desperate situation where Docker had consumed so much of the file system that basic commands would not run, this method saved me.

Back up and Reformat the entire system.

In early 2017, no matter what I tried, nothing worked. If you find yourself in this unfortunate state, back up all of your important files, maybe via a resovery system, and reformat the machine. I still recommend BTRFS as it is vastly superior to all other mainstream file systems. Just don't use it with docker!

Be sure to leave your horror story on the official Docker bug reports for this issue:

@plutext
Copy link

plutext commented Oct 3, 2018

Thanks for your pseudo file system instructions!
I ran into a little hiccup on reboot;

it said
mount: /var/lib/docker: special device /bvols/\100docker/No_COW/docker-volume.img does not exist. var-lib-docker.mount: Mount process exited, code=exited status=32 Failed with result 'exit-code'. Failed to mount /var/lib/docker.

This was because my root filesystem / is cryptroot / uses LUKS

It works if I instruct that / must be mounted first, using x-systemd.requires=/

My full fstab entry:

/bvols/docker/No_COW/docker-volume.img /var/lib/docker ext4 noatime,noauto,nofail,x-systemd.automount,x-systemd.requires=/ 0 0

I had this in a subvolume named @docker; the '@' symbol had to be escaped as \100 in fstab, so I changed the subvolume name to Docker to get rid of that complexity.

Also, it seems to me to be a good idea to disable COW for this pseudo FS.

@pasikon
Copy link

pasikon commented Oct 11, 2018

Thanks man you rescued me!

@andriy-f
Copy link

andriy-f commented Mar 10, 2019

I used this command btrfs subvolume delete /var/lib/docker/btrfs/subvolumes/*. This may cause some information to be lost

@dmerillat
Copy link

YOU DIDN'T MARK THE IMAGE FILE NOCOW

You're going to fragment your filesystem into unusability really rapidly if you blindly follow this guide, since you're already running on BTRFS and you're basically running a copy-on-write for docker now.

plutext pointed this out above, but didn't say how to do it. The modified commands are below:

cd /media
touch docker-volume.img
chattr +C docker-volume.img
fallocate -l 10G docker-volume.img
mkfs.ext4 docker-volume.img
mount -o loop -t ext4 /media/docker-volume.img /var/lib/docker
df -h
# You should see: /dev/loop0      9.8G   37M  9.3G   1% /var/lib/docker
umount /var/lib/docker

That said, there's a much easier fix.
The disaster comes from the way subvolumes work on btrfs with docker, if you look into them they start as your root filesystem and are then modified. The bigger your root filesystem the heavier each docker subvolume starts as and the slower everything runs. Metadata gets duplicated over and over and you end up with tens of gigabytes of duplicate trees as each image change makes a new fork.

Most trivial solution: make /var/lib/docker/btrfs it's own subvolume that's mounted in fstab. The copy stops at the mount-point and ignores subvolumes, so each new docker sub starts empty and doesn't exponentially grow your metadata allocations.

A separate partition is also fine. They won't interfere with each other outside IO contention, and if you're wedged to the point that your disk is completely saturated you have something else to fix.

@wmutschl
Copy link

Most trivial solution: make /var/lib/docker/btrfs it's own subvolume that's mounted in fstab. The copy stops at the mount-point and ignores subvolumes, so each new docker sub starts empty and doesn't exponentially grow your metadata allocations.

@dmerillat: I have done as you propose and mounted a dedicated subvolume called docker-btrfs to /var/lib/docker/btrfs. When looking at btrfs subvolume list / I get apart from my @ and @home subvolumes the following output:

ID 5709 gen 200803 top level 5 path docker-btrfs
ID 5712 gen 200698 top level 5709 path docker-btrfs/subvolumes/7cdea62ba7445832a0c1c1e99db8c12c774e19586ec077f03e52429fbb0a83d5
ID 5713 gen 200700 top level 5709 path docker-btrfs/subvolumes/27458289e10c557697f8a68a2008ae3e3a0a61865244fbab7da8b7141614ba6c
ID 5714 gen 200702 top level 5709 path docker-btrfs/subvolumes/a55c4ec7d39c8584f9766c7ba4563523ffffe71511212845943ae643f0657219
ID 5715 gen 200703 top level 5709 path docker-btrfs/subvolumes/0af3d89d96351c728a8492dd58b8c6c70b29b2ec8732b9c74a03bd45de50a771
ID 5716 gen 200716 top level 5709 path docker-btrfs/subvolumes/2d1be8ef5fc2432116924391251cd388d694ab5903e1953bcaa80e1a35b07012
ID 5717 gen 200705 top level 5709 path docker-btrfs/subvolumes/30f09ba495d14fc4d0aa50cf02acf1a34eca633d5f3e7beb13a10c95e6c0c262
ID 5718 gen 200724 top level 5709 path docker-btrfs/subvolumes/f3e6e22d72fd044654fae6e70324068b4ed5c80939b36e0e539ef1bd5e3a66a3
ID 5719 gen 200706 top level 5709 path docker-btrfs/subvolumes/aa2342755525e9903f4d472096997790dcce71d26205e96d70d4b6cde94924fe
ID 5720 gen 200707 top level 5709 path docker-btrfs/subvolumes/10ea09561af11754e0a71f04e32854e2200e641c9bbe2c09e4171fcefda2af15
ID 5721 gen 200719 top level 5709 path docker-btrfs/subvolumes/14b8f765cd9b3e4115d6f629e1c8bb06930eb251eb5fc9c7995fa3c1528826ee
ID 5722 gen 200709 top level 5709 path docker-btrfs/subvolumes/8fb8ae0fd44189e1c16274f63b6b31aeaa89a16f09b190145892574194d98b41
ID 5723 gen 200711 top level 5709 path docker-btrfs/subvolumes/047a7751fe20f424f613667dd57537cff299b76edf7a49773b8a544804852ccd
ID 5724 gen 200783 top level 5709 path docker-btrfs/subvolumes/43aee9d5c9e91f269552108ef428bd7bcbb65ef9e4bee0cc4a48712716ecb7bb
ID 5725 gen 200713 top level 5709 path docker-btrfs/subvolumes/a8cea36a4d7e16d30a3fd819c847011bfaf7b69269ed73767b0b75e9655a0f44
ID 5726 gen 200715 top level 5709 path docker-btrfs/subvolumes/d0c338a559560e5f98d7659af09772a916884a561b6f7f154efe51c15a7f2885
ID 5727 gen 200718 top level 5709 path docker-btrfs/subvolumes/5fe5541a052d8dca3cf54d8797f760c61ae8bf4560008127e1baeb55d1fe328c
ID 5728 gen 200722 top level 5709 path docker-btrfs/subvolumes/79370d3ef7fbc988ec1197cd0f5e043f91eddbf2f4881ff92d121f886b6eec8f
ID 5729 gen 200728 top level 5709 path docker-btrfs/subvolumes/e6ab68c14669e1c65b0bd28671aa15935894b83ba9cf3999b7b881bee6e9d553
ID 5730 gen 200726 top level 5709 path docker-btrfs/subvolumes/fc7f56a3a445197bd17a933e20038434fe1a2493b25b7b619cf38552e8db7c00
ID 5731 gen 200730 top level 5709 path docker-btrfs/subvolumes/9a52439376476cac42be974cc71e66baf436abd2c64db7e82789fbd992830295
ID 5732 gen 200731 top level 5709 path docker-btrfs/subvolumes/fdacca75eb3d9a1a24f0d37711b1fdbf03695d4dd3ec345eb04c25fe871ca3d2
ID 5733 gen 200732 top level 5709 path docker-btrfs/subvolumes/52fdec39d199cbccab36ebeb992a9d8e20fca4cbbed3c602afbcb2af5d03e7ec
ID 5734 gen 200737 top level 5709 path docker-btrfs/subvolumes/e38a243e9f757bb6af01806f268009d3c2d76d758e0f81bb263d9a95515c66ff
ID 5735 gen 200734 top level 5709 path docker-btrfs/subvolumes/5c629a1662f8a1dc8101fd0f0a2943ff2b2c11d25f2bb8720741eebd0fc26d2c
ID 5736 gen 200740 top level 5709 path docker-btrfs/subvolumes/43c299cc664f7e73f3dd7cd5d48c586d44442fdb4bc894e6855b0ec9d46ec934
ID 5737 gen 200744 top level 5709 path docker-btrfs/subvolumes/662a5b72c93a421bd26554463e5ab6629425323327db4adc5fa9afb95fd29241
ID 5738 gen 200736 top level 5709 path docker-btrfs/subvolumes/60a3dd724ff4406a1793f669fccef74ea2ef0122f8d70ea8f52ca81d608af23c
ID 5739 gen 200742 top level 5709 path docker-btrfs/subvolumes/55234b26c51d6c91a4ae03f4690877cc089b23910a88215a4f314a1ad7a9feb6
ID 5740 gen 200746 top level 5709 path docker-btrfs/subvolumes/e5335915c40fc7363a646944771c2d0a42ac1549e10c179a130492df3f6a0c7d
ID 5741 gen 200749 top level 5709 path docker-btrfs/subvolumes/eef3a0a0f57e406e37a925a280a8b2c8c5d454b082479f8b714f8d103b94fcb2
ID 5742 gen 200750 top level 5709 path docker-btrfs/subvolumes/974d6a0bfafd6d41957f79bfc562e9b851934df750a763b8ba233d523780f8a8
ID 5743 gen 200784 top level 5709 path docker-btrfs/subvolumes/3f2f7fd45a41d5f06839a733581d7b5946da67d8215daa5e8c68eb673afa4e91
ID 5744 gen 200769 top level 5709 path docker-btrfs/subvolumes/7f4e32e5046239b93ea095c9135e66238f8422f98d6ddea34929d2342d52b04d
ID 5745 gen 200752 top level 5709 path docker-btrfs/subvolumes/6c1b18e8fe7e94448a3a4cbaacaf1f79184115090d1d8ce243373a58ecd42a65
ID 5746 gen 200753 top level 5709 path docker-btrfs/subvolumes/0073b6ff658bab2c663eadfd8827e3f6e9fa80d6402bb846034c9f38680210bc
ID 5747 gen 200755 top level 5709 path docker-btrfs/subvolumes/178b918f82f8176d60e9f5f0d2db7e9e46104e139c22404da05fae3da943e33b
ID 5748 gen 200756 top level 5709 path docker-btrfs/subvolumes/5cb059dcd86786f632cba93190ef4de3abaa1579ddb8257212439ebf1b21a39a
ID 5749 gen 200757 top level 5709 path docker-btrfs/subvolumes/4acc6083a101028e1e20a57248a9f0a2b24471ee17e30af0ff51eb7d2772f181
ID 5750 gen 200759 top level 5709 path docker-btrfs/subvolumes/39be4044d715951c582c5de0ed8c4bad69227653d1a7cf6b602babb9a8133aeb
ID 5751 gen 200760 top level 5709 path docker-btrfs/subvolumes/a4b624c537294819c852ce20cea4ae73f16f1180f0e40a9a5dbdb97df5157d75
ID 5752 gen 200770 top level 5709 path docker-btrfs/subvolumes/c8b5503abcda1b8e02d92ed908a603e6dd4a9a6ecb138b853d1a5876c7e33e34
ID 5753 gen 200762 top level 5709 path docker-btrfs/subvolumes/2507624706d80f1ce30ff1778f2d1fc4d14cc06daf9dd3cb4b8e1dae27753bf7
ID 5754 gen 200764 top level 5709 path docker-btrfs/subvolumes/19af58c6d555db3abda7e5c8c0f96d6b7229987b202bfd54ee78f6eb2313fd8a
ID 5755 gen 200766 top level 5709 path docker-btrfs/subvolumes/1253725b352c98112d3c573c44e19a26559838054a70471803e74308ce9da52b
ID 5756 gen 200768 top level 5709 path docker-btrfs/subvolumes/2274c963d7e94d6da58dc374036a79c7fc03e49e695ae2c285ca4320208ded3b
ID 5757 gen 200771 top level 5709 path docker-btrfs/subvolumes/ab45d4dfc2d1c482695c6d5d3eb66dea55e4e1c41458a7c5cade11a5fff360db
ID 5758 gen 200785 top level 5709 path docker-btrfs/subvolumes/b2a804a85ad78d3f79ca75f36a6349469d96d43bebd2e0a61654a1edbaef546a
ID 5759 gen 200786 top level 5709 path docker-btrfs/subvolumes/f2494e62066a239927d8df22c074fda46a48449344a68c0942ffddb364afb53a
ID 5760 gen 200774 top level 5709 path docker-btrfs/subvolumes/4034f887eb44bf0c0e08a59dd686e646eac59d529ce59f5b3664088328089ce0
ID 5761 gen 200776 top level 5709 path docker-btrfs/subvolumes/4a3b94734777b96a2c16e15a77ff8b81a831e0710f315c3f9dbaadf360fae3fe
ID 5762 gen 200778 top level 5709 path docker-btrfs/subvolumes/02f23b11704e5e32c147c3c8e01421cbba5c8328b317d33371d292f50e86fe79
ID 5763 gen 200780 top level 5709 path docker-btrfs/subvolumes/c717f87750ef92505cba71afb808184614a21577afffbd0f5d764af107f48842
ID 5764 gen 200802 top level 5709 path docker-btrfs/subvolumes/5e11804a545680000425a725c5fa9502df7ac72133254f1c95444106b1beff34
ID 5765 gen 200787 top level 5709 path docker-btrfs/subvolumes/bb24166137016c431eba203e2b548ceedf966082c111882396932117d97964e6-init
ID 5766 gen 200788 top level 5709 path docker-btrfs/subvolumes/3d2174c45e380f0bec78cf8762815a66d3aa55ecd439f93f9035eb820665fca6-init
ID 5767 gen 200789 top level 5709 path docker-btrfs/subvolumes/b88335346f9c728af4f5e37396ee346ea584106c83434a5b0b5a5d11acb3caf6-init
ID 5768 gen 200790 top level 5709 path docker-btrfs/subvolumes/61159ce428e0bef43e589ccd03aa13c3b4d5e0f620a2265736b137bcd02d6ebc-init
ID 5769 gen 200804 top level 5709 path docker-btrfs/subvolumes/bb24166137016c431eba203e2b548ceedf966082c111882396932117d97964e6
ID 5770 gen 200805 top level 5709 path docker-btrfs/subvolumes/3d2174c45e380f0bec78cf8762815a66d3aa55ecd439f93f9035eb820665fca6
ID 5771 gen 200806 top level 5709 path docker-btrfs/subvolumes/b88335346f9c728af4f5e37396ee346ea584106c83434a5b0b5a5d11acb3caf6
ID 5772 gen 200806 top level 5709 path docker-btrfs/subvolumes/61159ce428e0bef43e589ccd03aa13c3b4d5e0f620a2265736b137bcd02d6ebc
ID 5773 gen 200803 top level 5709 path docker-btrfs/subvolumes/9c746954e2b33ab9fb571eb3c0fa9e72e33ad1164f9ee4f23dea5f93fd58d0eb-init
ID 5774 gen 200806 top level 5709 path docker-btrfs/subvolumes/9c746954e2b33ab9fb571eb3c0fa9e72e33ad1164f9ee4f23dea5f93fd58d0eb

Just for clarification, is there a problem that there is still a large number of subvolumes in /var/lib/docker/btrfs/subvolumes? Or is this okay?

@plantroon
Copy link

plantroon commented Aug 21, 2021

Just for clarification, is there a problem that there is still a large number of subvolumes in /var/lib/docker/btrfs/subvolumes? Or is this okay?

That by itself is not a problem. Especially now that you have it on its own volume (if I understand this whole thread correctly).

Still I'd be interested to know, if this can pose a problem in regards to data integrity and what statistics (of btrfs, RAM usage, disks, IO pressure, whatever) should I watch out for. I have a very small deployment (my personal cloud) on btrfs on a slow machine at home, but with SSDs. What do I watch out for not to lose data?

@dmerillat
Copy link

I should take a moment to remind people that docker containers are ephemeral and anything stored inside them is gone when they restart. Persistent storage is with volumes, and those should be backed up on a regular schedule.

As for the large number of subvolumes, no, it is not really an issue. There is a subvolume for each layer of a container and one for each running image for deduplication reasons. If there has been a lot of updates to containers you may have stale subvolumes, which can be cleaned via docker container prune and docker image prune. I had not run it in a while so I tested just now for this reply and it removed about half of the subvolumes and 5gb of space.

@plantroon
Copy link

@dmerillat good points. My question about data loss was more aimed towards having the filesystem wrecked by Docker in some way (due to running out of space, etc) which would render the running system unusable and any new writes being lost till that is resolved.

I compared docker images + containers on btrfs vs overlay2 and it was like 50 GB vs 5 GB - is there some way to measure btrfs usage in such a way that takes CoW into consideration, so that it produces similar numbers as running "du" on /var/lib/docker with overlay2?

@dmerillat
Copy link

dmerillat commented Aug 21, 2021

it's been years since the original post and both BTRFS and Docker have improved enormously since then. If you have docker on its own partition then a simple 'df' will tell you how much is used.

Unfortunately if it's a shared partition then there's no easy way to tell. Theoretically it can be done by enabling quotas on btrfs but the last time I tried there was an exponential CPU load with the interaction of deeply-cloned subvolumes and quota and I haven't tried since. This may also be fixed now, it has been quite some time since I last tried.

The easiest method is via docker image ls which will give you the total usage of each image on your system, which is approximately the sum of all the intermediates.

btrfs quota support has improved dramatically and the answer below is much better than mine.

@wmutschl
Copy link

I compared docker images + containers on btrfs vs overlay2 and it was like 50 GB vs 5 GB - is there some way to measure btrfs usage in such a way that takes CoW into consideration, so that it produces similar numbers as running "du" on /var/lib/docker with overlay2?

You might want to check out https://ownyourbits.com/2017/12/06/check-disk-space-of-your-btrfs-snapshots-with-btrfs-du/

@plantroon
Copy link

plantroon commented Aug 22, 2021

I tried one last thing which freed about 10 GB for me (basically it should find duplicates and reflink them):
chrt -i 0 duperemove -A -h -d -r -v -b4k --dedupe-options=noblock,same --lookup-extents=yes --io-threads=1
(source: https://wiki.tnonline.net/w/Btrfs/Deduplication/Duperemove)

But in the end I switched to overlay2. It will make overall btrfs management easier.

@eriteric
Copy link

thank you so much

@pavel-perina
Copy link

pavel-perina commented Mar 28, 2023

I would like to ask if it's a real problem or people are just confused (including me). I have OpenSuse Leap 15.5 beta.

What I found is that /var/lib/docker/btrfs occupied 43GB of space and went down to 38GB when I deleted all containers and unused images.

df -h reports 52GB used total. Later after reading this and a few horor stories, I found that in my home directory I have old nextcloud volume and it's tarball backup which are like 2x17GB. 38+2x17 is 72GB which is already 20GB more than what df reports for whole disk and likely 10GB is in other directories.

docker system df says I have 4.1GB in 5 images (yes, one of them has 2GB), 0GB in containers, 600MB in volumes (nextcloud is archived) which is realistic

btrfs fi du -s /var/lib/docker/*
     Total   Exclusive  Set shared  Filename
  35.23GiB    11.76MiB     3.85GiB  /var/lib/docker/btrfs
  72.00KiB    72.00KiB       0.00B  /var/lib/docker/buildkit
 188.00KiB   188.00KiB       0.00B  /var/lib/docker/containerd
     0.00B       0.00B       0.00B  /var/lib/docker/containers
   8.04MiB     8.04MiB       0.00B  /var/lib/docker/image
  80.00KiB    80.00KiB       0.00B  /var/lib/docker/network
     0.00B       0.00B       0.00B  /var/lib/docker/plugins
     0.00B       0.00B       0.00B  /var/lib/docker/runtimes
     0.00B       0.00B       0.00B  /var/lib/docker/swarm
     0.00B       0.00B       0.00B  /var/lib/docker/tmp
     0.00B       0.00B       0.00B  /var/lib/docker/trust
 566.00MiB   566.00MiB       0.00B  /var/lib/docker/volumes

Now this is a bit weird. First, image dir is nearly empty. Second, btrfs directory says that "Set shared" 3.85GB matches almost exactly 4.1GB reported by docker system df command. 35.23GiB is 37.8GB which matches 38G reported by df command. I guess btrfs contains both real data and snapshots.

I'm very new to using docker, but my rough understanding is that that every subvolume/id is a snapshot created during image build process.

@knirch
Copy link

knirch commented Apr 30, 2023

might be well to mention docker builder prune which cleared up a lot of what I thought was "dumb bug leftovers", as the guide goes into nuking the site from orbit approach, I think for idio^wnewbies like me some of the low-hanging fruit could be listed :)

@dim-geo
Copy link

dim-geo commented May 14, 2023

I had also some problems with slow storage and mounted ext4 like this:
mount -o loop,noatime,commit=60,barrier=0 -t ext4 /media/docker-volume.img /var/lib/docker
This can corrupt ext4 filesystem, but it's a risk I am willing to take.

@BradenM
Copy link

BradenM commented Apr 11, 2024

As of linux68, using:
mount -o loop,noatime,commit=60,barrier=0 -t ext4 /media/docker-volume.img /var/lib/docker

began failing to mount with:

mount: /docker: /dev/loop0 already mounted or mount point busy.
       dmesg(1) may have more information after failed mount system call.

The loop device is then automatically detached from what I can gather once this happens.

I suspect this kernel change is related:
https://lore.kernel.org/lkml/20240105-vfs-super-4092d802972c@brauner/

From my scan, implements a new safety mechanism relating to writing to block devices.

Note that this effectively only prevents modification of the particular block
device's page cache by other writers. The actual device content can still be
modified by other means - e.g. by issuing direct scsi commands, by doing
writes through devices lower in the storage stack (e.g. in case loop devices,
DM, or MD are involved) etc. But blocking direct modifications of the block
device page cache is enough to give filesystems a chance to perform data
validation when loading data from the underlying storage and thus prevent
kernel crashes.

Not sure if (assuming this is the cause) preventing loop mount like this was intentional or not, but nevertheless you can still manually setup the loop device + mount post-boot with no issue:

$ sudo losetup -fP --show /mnt/storage/@docker/media/docker-volume.img
/dev/loop1
$ sudo mount -o noatime,commit=60,barrier=0 -t ext4 /dev/loop1 /docker

Simple enough to set this up as pre-exec for the systemd service or something of the like.

Just running:
mount -o loop,noatime,commit=60,barrier=0 -t ext4 /media/docker-volume.img /var/lib/docker post-boot fails with the same error in case you were wondering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment