Here is the link to the git repository in case you desire none of my ramblings: https://github.com/KSXGitHub/parallel-disk-usage.
About dust
In the past few months, I have always used dust
to visualize disk usages of heavy directories. It displays an intuitive bottom-up tree from heavier items to lighter ones. Every item is attached with a percentage bar that allow me to compare the relative size 2 sibling items as well as their parent. I quite like it.
I would soon discover its limits however.
Sometimes I want to compare 2 files relatively, thus I type the following command:
dust /bin/ls /bin/docker
And this is what dust
(v0.5.4) gives:
140K ┌── ls │██████████████████████████████████████████████████████████ │ 100%
51M ┌── docker│██████████████████████████████████████████████████████████ │ 100%
Both ls
and docker
have the exact same percentage and bar length. This is not useful.
Unlike the above limitation which was discovered during my usage of dust
, this one was discovered when I was skimming dust
's code in an attempt of open source contribution (in order to amend the above limitation). I've noticed that although dust
has crossbeam-channel in its Cargo.toml
file, it is not used in the codebase, at least not in the way that I know of.
(pdu
later proves to be faster than dust
, so I guess my assumption of crossbeam-channel not being used is true)
As I have already mentioned, I tried to contribute to dust
to add the functionality I desired, but I encountered a few obstacles:
- The integration tests fail when I run it on my machine.
- I don't understand the way to code is structured.
Furthermore, even if I overcome the aforementioned obstacles, I would still have to wait for my pull request to be merged, and a new version to be released. This is not to mention the performance limitation which cannot be resolved without significantly refactor the whole codebase.
So I decided that I would create my own disk usage visualizer. It was named "dirt" initially. Then I realize the name has no relation to the actual functionality. Not only that, the name "dirt" was picked after "dust", which means that my tool shall forever lives in the shadow of dust
. This fact does not sit well before my unbridled vanity and my astronomical arrogance. Besides, "dirt" sounds kinda derpy. So after some thinking I decided to go with "Parallel Disk Usage" (I would have taken "pdu" if not for that fact that the name was occupied on crates.io).
I picture a directory tree as a nested tree (obviously!). A directory may contain files and subdirectories which in turn contain other subdirectories. Disk usage can as such be summarized from children to parent. This is such a perfect use case for rayon. The disk usage data is also a tree, because I need to visualize it.
❯ pdu /bin/ls /bin/docker --min-ratio=0
142K ┌──ls │ │ 0%
54M ├──docker│████████████████████████████████████████████████████████████│100%
54M ┌─┴(total) │████████████████████████████████████████████████████████████│100%
(The above figure compares the miniscule size of ls
to the supermassive black hole that is docker
thereby demonstrating the immoral inefficiency of Go as opposed to the immoral efficiency of C)
As you can see, the graph above is far more useful than that of dust
.
Bonus
❯ pdu /bin/{yay,paru} --min-ratio=0
7M ┌──paru │ ████████████████████████████│ 45%
8M ├──yay │ ██████████████████████████████████│ 55%
15M ┌─┴(total)│██████████████████████████████████████████████████████████████│100%
Moral zero-cost abstraction with generics: 1
Immoral mediocre #lolnogeneric garbage collector: 0
This is a benchmark sample of pdu
(v0.0.0) against dust
(v0.5.4), dutree
(v0.12.5), and du
(measured by GitHub CI after deployment):
As you can see, pdu
easily beat both dust
and dutree
by a large margin. This does not surprise me since, on my machine (Arch Linux btw), pdu
's debug build already beats dust
by a small margin.
What surprises me, however, is that pdu
also beats du
by a small amount, despite pdu
's release build loses to du
on my machine.
The pdu
binary itself is not extensible. What is extensible is the crate. parallel-disk-usage is both a binary crate and a library crate. One may use the library to build one's own pdu
's alternative with extra functionalities.
Finally, you may go to the GitHub repository to read more about pdu
. You may also sponsor me via Patreon because whilst I might be skilled in many domains such as programming, trash talking, bolstering my own vanity and arrogance, etc. making money was not one of them.