This is a response/extension to the Deep directory structure vs. flat directory structure to store millions of files on ext4 article on Medium. Here, I'm benchmarking not only a nesting level of two, but different levels.
This benchmark currently runs on an otherwise idle physical server with this specs:
- HDD: spinning, encrypted (RAID 10)
- CPU: Intel i7-2600 (4 cores + HT @ 3.4GHz)
- RAM: 16GB
For reference, this is the result for a test run with for a depth of 5 and 100k files (output truncated and sorted):
$ ruby benchmark-deep-vs-flat-directories.rb 5 100000
Ruby 2.5.3 x86_64-linux, depth 5, iterations 100000
user system total real
write-5 1.100000 3.390000 4.490000 (242.845647)
write-4 1.000000 3.070000 4.070000 (238.195257)
write-3 1.050000 2.910000 3.960000 (214.014290)
write-2 1.070000 3.160000 4.230000 (414.781391)
write-1 0.810000 1.640000 2.450000 ( 5.496074)
write-0 0.720000 1.580000 2.300000 ( 3.912904)
read-5 0.890000 0.680000 1.570000 ( 1.895553)
read-4 0.670000 0.730000 1.400000 ( 20.789874)
read-3 0.790000 0.910000 1.700000 ( 21.961692)
read-2 0.560000 0.590000 1.150000 ( 1.400197)
read-1 0.570000 0.580000 1.150000 ( 6.023490)
read-0 0.550000 0.470000 1.020000 ( 1.259735)
I'll update this gist with the results, once the benchmark is finished (with 10m files).
Haha, 2^40 is indeed way bigger than 16.7M. Who knows testing filesystems will be that hard! :)
Added a shoutout to your fork on the original article.
Did you notice issues when trying with more 10 millions files? Found out there is a limit at around 10.2M files per directory because of ext4 directory index. Was getting
ext4_dx_add_entry:2184: Directory (ino: 384830) index full, reach max htree level :2
errors.