Skip to content

Instantly share code, notes, and snippets.

Revisions

  1. KSXGitHub revised this gist May 29, 2021. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion 2021-05-28-announce-parallel-disk-usage-pdu.blog.md
    Original file line number Diff line number Diff line change
    @@ -41,7 +41,7 @@ As I have already mentioned, I tried to contribute to `dust` to add the function
    1. The integration tests fail when I run it on my machine.
    2. I don't understand the way to code is structured.

    Furthermore, even if I overcome the aforementioned obstacles, I would still have to wait for my pull request to be merged, and a new version to be released. This is not to mentioned the [performance limitation](#performance-limitation) which cannot be resolved without significantly refactor the whole codebase.
    Furthermore, even if I overcome the aforementioned obstacles, I would still have to wait for my pull request to be merged, and a new version to be released. This is not to mention the [performance limitation](#performance-limitation) which cannot be resolved without significantly refactor the whole codebase.

    ## The making of Parallel Disk Usage (pdu)

  2. KSXGitHub revised this gist May 28, 2021. 1 changed file with 13 additions and 11 deletions.
    24 changes: 13 additions & 11 deletions 2021-05-28-announce-parallel-disk-usage-pdu.blog.md
    Original file line number Diff line number Diff line change
    @@ -1,12 +1,14 @@
    # Announcement: Parallel Disk Usage (pdu) β€” A highly parallelized, blazing fast disk usage visualizer

    Here is the link to the git repository in case you desire none of my ramblings: https://github.com/KSXGitHub/parallel-disk-usage.

    # About [`dust`](https://github.com/bootandy/dust)
    ## About [dust](https://github.com/bootandy/dust)

    In the past few months, I have always used `dust` to visualize disk usages of heavy directories. It displays an intuitive bottom-up tree from heavier items to lighter ones. Every item is attached with a percentage bar that allow me to compare the relative size 2 sibling items as well as their parent. I quite like it.

    I would soon discover its limits however.

    ## Functionality limitation
    ### Functionality limitation

    Sometimes I want to compare 2 files relatively, thus I type the following command:

    @@ -26,13 +28,13 @@ And this is what `dust` (v0.5.4) gives:

    Both `ls` and `docker` have the exact same percentage and bar length. This is not useful.

    ## Performance limitation
    #### Performance limitation

    Unlike the above limitation which was discovered during my usage of `dust`, this one was discovered when I was skimming `dust`'s code in an attempt of open source contribution (in order to amend [the above limitation](#functionality-limitation)). I've noticed that although `dust` has [crossbeam-channel](https://crates.io/crates/crossbeam-channel) in [its `Cargo.toml` file](https://github.com/bootandy/dust/blob/1b3d0b272401b2eaa196891766d03a7667ea6917/Cargo.toml#L31), it is not used in the codebase, at least not in the way that I know of.

    _(`pdu` later proves to be faster than `dust`, so I guess my assumption of crossbeam-channel not being used is true)_

    ## Obstacle that prevents me from contributing to `dust`
    ### Obstacle that prevents me from contributing to dust

    As I have already mentioned, I tried to contribute to `dust` to add the functionality I desired, but I encountered a few obstacles:

    @@ -41,19 +43,19 @@ As I have already mentioned, I tried to contribute to `dust` to add the function

    Furthermore, even if I overcome the aforementioned obstacles, I would still have to wait for my pull request to be merged, and a new version to be released. This is not to mentioned the [performance limitation](#performance-limitation) which cannot be resolved without significantly refactor the whole codebase.

    # The making of Parallel Disk Usage (`pdu`)
    ## The making of Parallel Disk Usage (pdu)

    ## Naming
    ### Naming

    So I decided that I would create my own disk usage visualizer. It was named ["dirt"](https://github.com/KSXGitHub/dirt) initially. Then I realize the name has no relation to the actual functionality. Not only that, the name "dirt" was picked after "dust", which means that my tool shall forever lives in the shadow of `dust`. This fact does not sit well before my unbridled vanity and my astronomical arrogance. Besides, "dirt" sounds kinda derpy. So after some thinking I decided to go with "Parallel Disk Usage" (I would have taken "pdu" if not for that fact that the name was occupied on [crates.io](https://crates.io)).

    ## Implementation
    ### Implementation

    I picture a directory tree as a nested tree (obviously!). A directory may contain files and subdirectories which in turn contain other subdirectories. Disk usage can as such be summarized from children to parent. This is such a perfect use case for [rayon](https://crates.io/crates/rayon). The disk usage data is also a tree, because I need to visualize it.

    ## Results
    ### The results

    ### I finally have the functionality that I desired
    #### I finally have the functionality that I desired

    ```
    ❯ pdu /bin/ls /bin/docker --min-ratio=0
    @@ -81,7 +83,7 @@ _Immoral mediocre #lolnogeneric garbage collector: 0_

    </details>

    ### And it's fast
    #### And it's fast

    This is a benchmark sample of `pdu` (v0.0.0) against `dust` (v0.5.4), `dutree` (v0.12.5), and `du` (measured by GitHub CI after deployment):

    @@ -93,7 +95,7 @@ As you can see, `pdu` easily beat both `dust` and `dutree` by a large margin. Th

    What surprises me, however, is that `pdu` also beats `du` by a small amount, despite `pdu`'s release build loses to `du` on my machine.

    ### And it's extensible
    #### And it's extensible

    The `pdu` binary itself is not extensible. What is extensible is the crate. [parallel-disk-usage](https://crates.io/crates/parallel-disk-usage) is both a binary crate and a library crate. One may use the library to build one's own `pdu`'s alternative with extra functionalities.

  3. KSXGitHub created this gist May 28, 2021.
    102 changes: 102 additions & 0 deletions 2021-05-28-announce-parallel-disk-usage-pdu.blog.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,102 @@
    Here is the link to the git repository in case you desire none of my ramblings: https://github.com/KSXGitHub/parallel-disk-usage.

    # About [`dust`](https://github.com/bootandy/dust)

    In the past few months, I have always used `dust` to visualize disk usages of heavy directories. It displays an intuitive bottom-up tree from heavier items to lighter ones. Every item is attached with a percentage bar that allow me to compare the relative size 2 sibling items as well as their parent. I quite like it.

    I would soon discover its limits however.

    ## Functionality limitation

    Sometimes I want to compare 2 files relatively, thus I type the following command:

    ```sh
    dust /bin/ls /bin/docker
    ```

    And this is what `dust` (v0.5.4) gives:

    <a id="dust-two-separate-files" name="dust-two-separate-files"></a>

    ```text
    140K β”Œβ”€β”€ ls β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 100%
    51M β”Œβ”€β”€ dockerβ”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 100%
    ```


    Both `ls` and `docker` have the exact same percentage and bar length. This is not useful.

    ## Performance limitation

    Unlike the above limitation which was discovered during my usage of `dust`, this one was discovered when I was skimming `dust`'s code in an attempt of open source contribution (in order to amend [the above limitation](#functionality-limitation)). I've noticed that although `dust` has [crossbeam-channel](https://crates.io/crates/crossbeam-channel) in [its `Cargo.toml` file](https://github.com/bootandy/dust/blob/1b3d0b272401b2eaa196891766d03a7667ea6917/Cargo.toml#L31), it is not used in the codebase, at least not in the way that I know of.

    _(`pdu` later proves to be faster than `dust`, so I guess my assumption of crossbeam-channel not being used is true)_

    ## Obstacle that prevents me from contributing to `dust`

    As I have already mentioned, I tried to contribute to `dust` to add the functionality I desired, but I encountered a few obstacles:

    1. The integration tests fail when I run it on my machine.
    2. I don't understand the way to code is structured.

    Furthermore, even if I overcome the aforementioned obstacles, I would still have to wait for my pull request to be merged, and a new version to be released. This is not to mentioned the [performance limitation](#performance-limitation) which cannot be resolved without significantly refactor the whole codebase.

    # The making of Parallel Disk Usage (`pdu`)

    ## Naming

    So I decided that I would create my own disk usage visualizer. It was named ["dirt"](https://github.com/KSXGitHub/dirt) initially. Then I realize the name has no relation to the actual functionality. Not only that, the name "dirt" was picked after "dust", which means that my tool shall forever lives in the shadow of `dust`. This fact does not sit well before my unbridled vanity and my astronomical arrogance. Besides, "dirt" sounds kinda derpy. So after some thinking I decided to go with "Parallel Disk Usage" (I would have taken "pdu" if not for that fact that the name was occupied on [crates.io](https://crates.io)).

    ## Implementation

    I picture a directory tree as a nested tree (obviously!). A directory may contain files and subdirectories which in turn contain other subdirectories. Disk usage can as such be summarized from children to parent. This is such a perfect use case for [rayon](https://crates.io/crates/rayon). The disk usage data is also a tree, because I need to visualize it.

    ## Results

    ### I finally have the functionality that I desired

    ```
    ❯ pdu /bin/ls /bin/docker --min-ratio=0
    142K β”Œβ”€β”€ls β”‚ β”‚ 0%
    54M β”œβ”€β”€dockerβ”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚100%
    54M β”Œβ”€β”΄(total) β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚100%
    ```

    _(The above figure compares the miniscule size of `ls` to the supermassive black hole that is `docker` thereby demonstrating the immoral inefficiency of Go as opposed to the immoral efficiency of C)_

    As you can see, the graph above is far more useful than [that of `dust`](#dust-two-separate-files).

    <details><summary>Bonus</summary>

    ```
    ❯ pdu /bin/{yay,paru} --min-ratio=0
    7M β”Œβ”€β”€paru β”‚ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ 45%
    8M β”œβ”€β”€yay β”‚ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ 55%
    15M β”Œβ”€β”΄(total)β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚100%
    ```

    _Moral zero-cost abstraction with generics: 1_

    _Immoral mediocre #lolnogeneric garbage collector: 0_

    </details>

    ### And it's fast

    This is a benchmark sample of `pdu` (v0.0.0) against `dust` (v0.5.4), `dutree` (v0.12.5), and `du` (measured by GitHub CI after deployment):

    ![Chads pdu and du T-posing before virgins dust and dutree](https://ksxgithub.github.io/parallel-disk-usage-0.0.0-benchmarks/tmp.benchmark-report.competing.blksize.svg)

    [_(there's more)_](https://ksxgithub.github.io/parallel-disk-usage-0.0.0-benchmarks/tmp.benchmark-report.CHARTS.html)

    As you can see, `pdu` easily beat both `dust` and `dutree` by a large margin. This does not surprise me since, on my machine (Arch Linux btw), `pdu`'s debug build already beats `dust` by a small margin.

    What surprises me, however, is that `pdu` also beats `du` by a small amount, despite `pdu`'s release build loses to `du` on my machine.

    ### And it's extensible

    The `pdu` binary itself is not extensible. What is extensible is the crate. [parallel-disk-usage](https://crates.io/crates/parallel-disk-usage) is both a binary crate and a library crate. One may use the library to build one's own `pdu`'s alternative with extra functionalities.

    ---

    Finally, you may go to [the GitHub repository](https://github.com/KSXGitHub/parallel-disk-usage) to read more about `pdu`. You may also sponsor me via [Patreon](https://patreon.com/khai96_) because whilst I might be skilled in many domains such as programming, trash talking, bolstering my own vanity and arrogance, etc. making money was not one of them.