Skip to content

Instantly share code, notes, and snippets.

@paul-d-ray
Last active June 18, 2025 03:05
Show Gist options
  • Save paul-d-ray/e2c8498df4bf1cf15177f72c54492222 to your computer and use it in GitHub Desktop.
Save paul-d-ray/e2c8498df4bf1cf15177f72c54492222 to your computer and use it in GitHub Desktop.
Nushell File Size Distribution

Nushell code to get a file size distribution

# 20250412_2141
# https://rosettacode.org/wiki/File_size_distribution
# File Size Distribution


 [d:/work d:/drivers d:/images ] | each { ls ($"($in)/**/*" | into glob) |
  each { |it|
    let size = ($it.size | into int)
    {
        name: $it.name
        size: $size
        SizeCategory: (match $size {
            0..1_000 => "Under 1 KB"
            1_001..5_000 => "1 KB to 5 KB"
            5_001..10_000 => "5 KB to 10 KB"
            10_001..25_000 => "10 KB to  25 KB"
            25_001..50_000 => "25 KB to 50 KB"
            50_001..1_000_000 => "50 KB to  1 MB"
            1_000_001..5_000_000 => "1 MB to 5 MB"
            5_000_001..10_000_000 => "5 MB to 10 MB"
            10_000_001..25_000_000 => "10 MB to 25 MB"
            25_000_001..50_000_000 => "25 MB to 50 MB"
            50_000_001..100_000_000 => "50 MB to 100 MB"
            100_000_001..500_000_000 => "100 MB to 500 MB"
            500_000_001..1_000_000_000 => "500 MB to 1 GB"
            1_000_000_001..3_000_000_000 => "1 GB to 3 GB"
            _ => "Over 3 GB"
        })
    } } } | flatten | histogram SizeCategory

Example of my data drive

My data drive is 83% under one million bytes.

╭────┬──────────────────┬───────┬──────────┬────────────┬─────────────────────────────────────────────╮
│  # │   SizeCategory   │ count │ quantile │ percentage │                  frequency                  │
├────┼──────────────────┼───────┼──────────┼────────────┼─────────────────────────────────────────────┤
│  0 │ 50 KB to  1 MB   │  4199 │     0.44 │ 43.88%     │ ******************************************* │
│  1 │ 1 KB to 5 KB     │  1978 │     0.21 │ 20.67%     │ ********************                        │
│  2 │ Under 1 KB       │  1491 │     0.16 │ 15.58%     │ ***************                             │
│  3 │ 5 KB to 10 KB    │   444 │     0.05 │ 4.64%      │ ****                                        │
│  4 │ 1 MB to 5 MB     │   293 │     0.03 │ 3.06%      │ ***                                         │
│  5 │ 10 KB to  25 KB  │   260 │     0.03 │ 2.72%      │ **                                          │
│  6 │ 25 KB to 50 KB   │   231 │     0.02 │ 2.41%      │ **                                          │
│  7 │ 1 GB to 3 GB     │   178 │     0.02 │ 1.86%      │ *                                           │
│  8 │ 100 MB to 500 MB │   131 │     0.01 │ 1.37%      │ *                                           │
│  9 │ 10 MB to 25 MB   │   108 │     0.01 │ 1.13%      │ *                                           │
│ 10 │ 50 MB to 100 MB  │    96 │     0.01 │ 1.00%      │ *                                           │
│ 11 │ 5 MB to 10 MB    │    78 │     0.01 │ 0.82%      │                                             │
│ 12 │ 25 MB to 50 MB   │    50 │     0.01 │ 0.52%      │                                             │
│ 13 │ 500 MB to 1 GB   │    28 │     0.00 │ 0.29%      │                                             │
│ 14 │ Over 3 GB        │     4 │     0.00 │ 0.04%      │                                             │
├────┼──────────────────┼───────┼──────────┼────────────┼─────────────────────────────────────────────┤
│  # │   SizeCategory   │ count │ quantile │ percentage │                  frequency                  │
╰────┴──────────────────┴───────┴──────────┴────────────┴─────────────────────────────────────────────╯
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment