Last active
May 29, 2021 15:48
Revisions
-
KSXGitHub revised this gist
May 29, 2021 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -41,7 +41,7 @@ As I have already mentioned, I tried to contribute to `dust` to add the function 1. The integration tests fail when I run it on my machine. 2. I don't understand the way to code is structured. Furthermore, even if I overcome the aforementioned obstacles, I would still have to wait for my pull request to be merged, and a new version to be released. This is not to mention the [performance limitation](#performance-limitation) which cannot be resolved without significantly refactor the whole codebase. ## The making of Parallel Disk Usage (pdu) -
KSXGitHub revised this gist
May 28, 2021 . 1 changed file with 13 additions and 11 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,12 +1,14 @@ # Announcement: Parallel Disk Usage (pdu) β A highly parallelized, blazing fast disk usage visualizer Here is the link to the git repository in case you desire none of my ramblings: https://github.com/KSXGitHub/parallel-disk-usage. ## About [dust](https://github.com/bootandy/dust) In the past few months, I have always used `dust` to visualize disk usages of heavy directories. It displays an intuitive bottom-up tree from heavier items to lighter ones. Every item is attached with a percentage bar that allow me to compare the relative size 2 sibling items as well as their parent. I quite like it. I would soon discover its limits however. ### Functionality limitation Sometimes I want to compare 2 files relatively, thus I type the following command: @@ -26,13 +28,13 @@ And this is what `dust` (v0.5.4) gives: Both `ls` and `docker` have the exact same percentage and bar length. This is not useful. #### Performance limitation Unlike the above limitation which was discovered during my usage of `dust`, this one was discovered when I was skimming `dust`'s code in an attempt of open source contribution (in order to amend [the above limitation](#functionality-limitation)). I've noticed that although `dust` has [crossbeam-channel](https://crates.io/crates/crossbeam-channel) in [its `Cargo.toml` file](https://github.com/bootandy/dust/blob/1b3d0b272401b2eaa196891766d03a7667ea6917/Cargo.toml#L31), it is not used in the codebase, at least not in the way that I know of. _(`pdu` later proves to be faster than `dust`, so I guess my assumption of crossbeam-channel not being used is true)_ ### Obstacle that prevents me from contributing to dust As I have already mentioned, I tried to contribute to `dust` to add the functionality I desired, but I encountered a few obstacles: @@ -41,19 +43,19 @@ As I have already mentioned, I tried to contribute to `dust` to add the function Furthermore, even if I overcome the aforementioned obstacles, I would still have to wait for my pull request to be merged, and a new version to be released. This is not to mentioned the [performance limitation](#performance-limitation) which cannot be resolved without significantly refactor the whole codebase. ## The making of Parallel Disk Usage (pdu) ### Naming So I decided that I would create my own disk usage visualizer. It was named ["dirt"](https://github.com/KSXGitHub/dirt) initially. Then I realize the name has no relation to the actual functionality. Not only that, the name "dirt" was picked after "dust", which means that my tool shall forever lives in the shadow of `dust`. This fact does not sit well before my unbridled vanity and my astronomical arrogance. Besides, "dirt" sounds kinda derpy. So after some thinking I decided to go with "Parallel Disk Usage" (I would have taken "pdu" if not for that fact that the name was occupied on [crates.io](https://crates.io)). ### Implementation I picture a directory tree as a nested tree (obviously!). A directory may contain files and subdirectories which in turn contain other subdirectories. Disk usage can as such be summarized from children to parent. This is such a perfect use case for [rayon](https://crates.io/crates/rayon). The disk usage data is also a tree, because I need to visualize it. ### The results #### I finally have the functionality that I desired ``` β― pdu /bin/ls /bin/docker --min-ratio=0 @@ -81,7 +83,7 @@ _Immoral mediocre #lolnogeneric garbage collector: 0_ </details> #### And it's fast This is a benchmark sample of `pdu` (v0.0.0) against `dust` (v0.5.4), `dutree` (v0.12.5), and `du` (measured by GitHub CI after deployment): @@ -93,7 +95,7 @@ As you can see, `pdu` easily beat both `dust` and `dutree` by a large margin. Th What surprises me, however, is that `pdu` also beats `du` by a small amount, despite `pdu`'s release build loses to `du` on my machine. #### And it's extensible The `pdu` binary itself is not extensible. What is extensible is the crate. [parallel-disk-usage](https://crates.io/crates/parallel-disk-usage) is both a binary crate and a library crate. One may use the library to build one's own `pdu`'s alternative with extra functionalities. -
KSXGitHub created this gist
May 28, 2021 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,102 @@ Here is the link to the git repository in case you desire none of my ramblings: https://github.com/KSXGitHub/parallel-disk-usage. # About [`dust`](https://github.com/bootandy/dust) In the past few months, I have always used `dust` to visualize disk usages of heavy directories. It displays an intuitive bottom-up tree from heavier items to lighter ones. Every item is attached with a percentage bar that allow me to compare the relative size 2 sibling items as well as their parent. I quite like it. I would soon discover its limits however. ## Functionality limitation Sometimes I want to compare 2 files relatively, thus I type the following command: ```sh dust /bin/ls /bin/docker ``` And this is what `dust` (v0.5.4) gives: <a id="dust-two-separate-files" name="dust-two-separate-files"></a> ```text 140K βββ ls βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β 100% 51M βββ dockerβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β 100% ``` Both `ls` and `docker` have the exact same percentage and bar length. This is not useful. ## Performance limitation Unlike the above limitation which was discovered during my usage of `dust`, this one was discovered when I was skimming `dust`'s code in an attempt of open source contribution (in order to amend [the above limitation](#functionality-limitation)). I've noticed that although `dust` has [crossbeam-channel](https://crates.io/crates/crossbeam-channel) in [its `Cargo.toml` file](https://github.com/bootandy/dust/blob/1b3d0b272401b2eaa196891766d03a7667ea6917/Cargo.toml#L31), it is not used in the codebase, at least not in the way that I know of. _(`pdu` later proves to be faster than `dust`, so I guess my assumption of crossbeam-channel not being used is true)_ ## Obstacle that prevents me from contributing to `dust` As I have already mentioned, I tried to contribute to `dust` to add the functionality I desired, but I encountered a few obstacles: 1. The integration tests fail when I run it on my machine. 2. I don't understand the way to code is structured. Furthermore, even if I overcome the aforementioned obstacles, I would still have to wait for my pull request to be merged, and a new version to be released. This is not to mentioned the [performance limitation](#performance-limitation) which cannot be resolved without significantly refactor the whole codebase. # The making of Parallel Disk Usage (`pdu`) ## Naming So I decided that I would create my own disk usage visualizer. It was named ["dirt"](https://github.com/KSXGitHub/dirt) initially. Then I realize the name has no relation to the actual functionality. Not only that, the name "dirt" was picked after "dust", which means that my tool shall forever lives in the shadow of `dust`. This fact does not sit well before my unbridled vanity and my astronomical arrogance. Besides, "dirt" sounds kinda derpy. So after some thinking I decided to go with "Parallel Disk Usage" (I would have taken "pdu" if not for that fact that the name was occupied on [crates.io](https://crates.io)). ## Implementation I picture a directory tree as a nested tree (obviously!). A directory may contain files and subdirectories which in turn contain other subdirectories. Disk usage can as such be summarized from children to parent. This is such a perfect use case for [rayon](https://crates.io/crates/rayon). The disk usage data is also a tree, because I need to visualize it. ## Results ### I finally have the functionality that I desired ``` β― pdu /bin/ls /bin/docker --min-ratio=0 142K βββls β β 0% 54M βββdockerββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ100% 54M βββ΄(total) ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ100% ``` _(The above figure compares the miniscule size of `ls` to the supermassive black hole that is `docker` thereby demonstrating the immoral inefficiency of Go as opposed to the immoral efficiency of C)_ As you can see, the graph above is far more useful than [that of `dust`](#dust-two-separate-files). <details><summary>Bonus</summary> ``` β― pdu /bin/{yay,paru} --min-ratio=0 7M βββparu β βββββββββββββββββββββββββββββ 45% 8M βββyay β βββββββββββββββββββββββββββββββββββ 55% 15M βββ΄(total)ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ100% ``` _Moral zero-cost abstraction with generics: 1_ _Immoral mediocre #lolnogeneric garbage collector: 0_ </details> ### And it's fast This is a benchmark sample of `pdu` (v0.0.0) against `dust` (v0.5.4), `dutree` (v0.12.5), and `du` (measured by GitHub CI after deployment):  [_(there's more)_](https://ksxgithub.github.io/parallel-disk-usage-0.0.0-benchmarks/tmp.benchmark-report.CHARTS.html) As you can see, `pdu` easily beat both `dust` and `dutree` by a large margin. This does not surprise me since, on my machine (Arch Linux btw), `pdu`'s debug build already beats `dust` by a small margin. What surprises me, however, is that `pdu` also beats `du` by a small amount, despite `pdu`'s release build loses to `du` on my machine. ### And it's extensible The `pdu` binary itself is not extensible. What is extensible is the crate. [parallel-disk-usage](https://crates.io/crates/parallel-disk-usage) is both a binary crate and a library crate. One may use the library to build one's own `pdu`'s alternative with extra functionalities. --- Finally, you may go to [the GitHub repository](https://github.com/KSXGitHub/parallel-disk-usage) to read more about `pdu`. You may also sponsor me via [Patreon](https://patreon.com/khai96_) because whilst I might be skilled in many domains such as programming, trash talking, bolstering my own vanity and arrogance, etc. making money was not one of them.