Skip to content

Instantly share code, notes, and snippets.

@costa
Last active August 20, 2024 08:34
Show Gist options
  • Select an option

  • Save costa/b9570f214c7b7cfdf7d09069f16d7ca8 to your computer and use it in GitHub Desktop.

Select an option

Save costa/b9570f214c7b7cfdf7d09069f16d7ca8 to your computer and use it in GitHub Desktop.
basic hierarchical file system analysis script
#!/usr/bin/env ruby
# NOTE just a thin presentation layer over `du`
# NOTE on MacOS, recommended: DU_IGNORES=Volumes
# FROM https://gist.github.com/costa/b9570f214c7b7cfdf7d09069f16d7ca8
ignore_masks = ENV['DU_IGNORES'].to_s.split(':')
SUPER_SUB_YIELD_DIV = 10
PART_RESOLUTION_RANGE = 1..10000
PART_RESOLUTION_DEFAULT = 100
part_resolution = PART_RESOLUTION_DEFAULT
abort "USAGE [DU_IGNORES=<DU:IGNORE:MASKS>] du_sum.rb [<PART-RESOLUTION|#{PART_RESOLUTION_DEFAULT}>] # run when needed, contemplate output" unless
ARGV.empty? ||
(ARGV.length == 1 &&
PART_RESOLUTION_RANGE.include?(part_resolution = ARGV[0].to_i))
KILO_ETC = "KMGTP"
# NOTE functionality obvious
def humanise_K(number)
fail "Humans don't get non-natural numbers!" unless
number > 0
suf_i = 0
while number > 100
number /= 1000.0
suf_i += 1
end
res = ''
if number > 1
if number > 10
int = number.to_i / 10
res << int.to_s
number -= int * 10
end
int = number.to_i
res << (int / 1).to_s
number -= int * 1
end
res << '.'
while res.length < 3
number *= 10
int = number.to_i
res << int.to_s
number -= int
end
res + KILO_ETC[suf_i]
end
# NOTE a "smart" (du-analysis-oriented) summary generating procedure
# NOTE sorted_path_dus must have '/' chomped from paths
def yield_all_sub_path_du(sorted_path_dus, du_min, super_path = '', super_du = nil, depth = 0, yielded_already = [], &blk)
fail "Literally, block needed" unless
block_given?
super_path = super_path.chomp('/')
super_to_yield = !!super_du
sorted_path_dus.each do |path, du|
break unless
du > du_min
next unless
path.start_with?(super_path + '/') &&
!yielded_already.include?(path)
yielded_already << path
super_to_yield &&=
super_du - du > du_min / SUPER_SUB_YIELD_DIV &&
false.tap do # NOTE I'm not sure how popular this kind of logic expression
yield depth, super_path, super_du
depth += 1
end
yield_all_sub_path_du sorted_path_dus, du_min, path, du, depth, yielded_already, &blk
end
yield depth, super_path, super_du if
super_to_yield
end
# NOTE returns a list of all sub-path - disk usage (K) pairs for the given path
def du_K(path: '', ignores: [])
path = path.chomp('/')
cmd_path = ignores.reduce(path + '/'){ |s,i| "-I '#{i}' " + s }
warn "du #{cmd_path}" # NOTE any du errors are left for the user...
`sudo du -k #{cmd_path} 2> /dev/null`.lines.map do |line|
du_K_s, path_ = line.chomp.split("\t")
[path_.chomp('/'), du_K_s.to_i * 1024 / 1000] # NOTE I prefer K to Ki here...
end.tap do |l|
warn "#{l.size} du"
end
end
pre = []
sorted_path_du_Ks = # NOTE yeah, there can be trouble with really long names
du_K(ignores: ignore_masks).sort_by{ |p, s| -s * 1000 + p.length }
yield_all_sub_path_du sorted_path_du_Ks[1..-1], sorted_path_du_Ks[0][1] / part_resolution do |depth, path, du_K|
puts "#{pre[depth] ||= '| ' * depth}#{humanise_K du_K} #{path}"
end

The most ancient problem of disk space -- and an also ancient problem of sharing scripts

Please disregard this public post if

  • you think that you have not had any disk space problems in recent years and that the author quite frankly might just be behind on personal computing technology since everything is done "off the cloud" these days, or --
  • you think the problem is too advanced for you to even try anything (like simply analysing your file system hierarchically, on your own) and you trust that "common" (commercial) solutions are your best options -- whenever it happens

As for us, mere mortals, the problem pops up now and then, especially when you're working with media or other "big data" on your personal machine.

The inevitable problem of disk space

Yes, the goal is always to automate disk space management -- and you absolutely should rely on secure software (not your home-brewed scripts) for your routine maintenance -- and when you inevitably need more space, that software should politely notify you of new resource requirements -- in advance.

But, as you quickly discover, this goal may sometimes be reached, and then, it may get away again. In other words, before another data flow involving your personal machine is automated, or in order to deal with some non-standard (yet apparently plausible) disk space situation -- sometimes, you just have to get your hands dirty...

...well, as dirty as a geek's hands can get, so, not very dirty, and to keep them cleaner yet, you write scripts -- "micro" in size and functionality "by definition" (by following good practice; See my note on programming vs scripting).

This is an example of a (n under-100-lines or under-two-pages-of-code) hierarchical-file-system-analysis-helper script: ...

It uses an already excellent du and prints the heaviest file-system objects hierarchically, and that's it: ...

The contemporary problem of sharing any kind of code, peer-to-peer*

Well that script's not been much, but it might be useful to you nevertheless. And if you -- my trusted professional peer* -- happen to write such a script, I would like -- very much -- to see it, and possibly, to copy it.

* one of the dozens fellow software engineers I worked at some time with, and whose day-to-day professional problems and their solutions to them tend to be similar to mine

Now, let's think of the ways you would go about sharing or publishing such a script:

  • yes, you may write an elaborate blog post like this, but realistically, no one (of your followers) will think to organise a bookmark or anything -- either they'll have a (quite rare) disk space problem right then, or the info will be lost to the search engines and social networks which are too, very real-time-feed-oriented and will probably not help me find -- or collaborate on that script efficiently -- when I need it;
  • or, in addition to writing a post, you may share the code within one of the open-source social networks, but, to me, it is marginally more efficient in terms of realistic accessibility than in the first case, plus, when open-sourcing that script you will have to pass the (quite sensible) barrier of entering that public space, with its protocols, community standards, etc, which might be inappropriate for such a small piece of code;
  • and, what I'd really like is for all such generic (as in non-project-specific) code of my peers -- whether it's of a library, a package, or a script they've been working on -- with a help of a simple sharing mechanism -- to be just available on my development machine (NB code doesn't take much disk space), so when needed, I'll just try and locate that script of yours in my local code base, and then, collaborate on it in the same ad-hoc fashion.

Do you agree with my definition of the casual code sharing problem?

How do you solve it with your peers?

Would you like to see my pragmatic solution to the problem?

Please share and comment if you've made it this far already.

@costa

costa commented Feb 16, 2023

Copy link
Copy Markdown
Author

@costa

costa commented Feb 16, 2023

Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment