Skip to content

Instantly share code, notes, and snippets.

@kousu
Last active November 28, 2021 18:36
Show Gist options
  • Save kousu/4fb94f8a5f0301e9ef2b722022959da3 to your computer and use it in GitHub Desktop.
Save kousu/4fb94f8a5f0301e9ef2b722022959da3 to your computer and use it in GitHub Desktop.
Linux du(1) de-duplication bug

I've found a confusing behaviour in Linux's version of du(1). du can take a single folder to recurse into, or list of files/folders either via --files0-from or in argv. But its behaviour is inconsistent if this list contains parents and children.

  1. A child given before its parent makes the parent count as empty
  2. With -s (or equivalently -d 0), a parent given before its child (vice versa) skips the child

OpenBSD's du conversely has consistent behaviour: it always outputs one line (or block of lines, if not given -s) per input, and always counts everything as full-sized, parent and child.

Make some sample data

user@host:~$ mkdir -p test/parent/child; cd test
user@host:~/test$ seq 20 | while read i; do dd if=/dev/urandom of=parent/child/$i.bin bs=1M count=100; done
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.636299 s, 165 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.605005 s, 173 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.604265 s, 174 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.637298 s, 165 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.637734 s, 164 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.610857 s, 172 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.611513 s, 171 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.611749 s, 171 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.63855 s, 164 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.610356 s, 172 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.628912 s, 167 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.610145 s, 172 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.606974 s, 173 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.610318 s, 172 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.629691 s, 167 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.611301 s, 172 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.635625 s, 165 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.607789 s, 173 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.609169 s, 172 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.646711 s, 162 MB/s

Test du(1)

user@host:~/test$ du -h parent/   # this makes sense
2.0G	parent/child
2.0G	parent/
user@host:~/test$ du -h parent/ parent/child/ # this makes some sense, but I would expect to see child counted twice
2.0G	parent/child
2.0G	parent/
user@host:~/test$ du -h parent/child/ parent/ # swapping the order makes parent have 0 size?? (4k is "empty": it's the size of the inode)
2.0G	parent/child/
4.0K	parent/
user@host:~/test$ du -h -s parent/ parent/child/ # -s makes child not get counted at all?
2.0G	parent/
user@host:~/test$ du -h -s parent/child/ parent/ # but swapping the order *does* count it, but with 0 size
2.0G	parent/child/
4.0K	parent/

Platform

user@host:~/test$ du --version
du (GNU coreutils) 8.30
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbjorn Granlund, David MacKenzie, Paul Eggert,
and Jim Meyering.
user@host:~/test$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Make some sample data

host$ mkdir -p test/parent/child; cd test
host$ jot 20 | while read i; do dd if=/dev/urandom of=parent/child/$i.bin bs=1M count=100; done           
100+0 records in
100+0 records out
104857600 bytes transferred in 2.512 secs (41729464 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.497 secs (41981236 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.451 secs (42780264 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.392 secs (43832256 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.427 secs (43199366 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.332 secs (44949068 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.415 secs (43417095 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.419 secs (43341548 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.360 secs (44420466 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.398 secs (43715317 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.416 secs (43390310 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.423 secs (43264876 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.337 secs (44854528 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.396 secs (43760865 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.481 secs (42259990 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.543 secs (41233413 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.464 secs (42551547 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.565 secs (40877326 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.798 secs (37471918 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.566 secs (40854706 bytes/sec)

Test du(1)

host$ du -h parent/
2.0G	parent/child
2.0G	parent/
host$ du -h parent/ parent/child
2.0G	parent/child
2.0G	parent/
2.0G	parent/child
host$ du -h parent/child parent/     
2.0G	parent/child
2.0G	parent/child
2.0G	parent/
host$ du -h -s parent/ parent/child
2.0G	parent/
2.0G	parent/child
host$ du -h -s  parent/child parent/
2.0G	parent/child
2.0G	parent/

Platform

host$ uname -a
OpenBSD host.example.com 6.9 GENERIC#5 amd64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment