Created
October 3, 2014 09:06
-
-
Save anjackson/cc831f0d2245799f7a45 to your computer and use it in GitHub Desktop.
Comparing Perl file reading methods for hash calcuation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
opf:perl andy$ time perl sha256-asfile.pl ~/Downloads/ubuntu-12.10-desktop-amd64.iso | |
256a2cc652ec86ff366907fd7b878e577b631cc6c6533368c615913296069d80 /Users/andy/Downloads/ubuntu-12.10-desktop-amd64.iso | |
real 0m8.825s | |
user 0m8.102s | |
sys 0m0.479s | |
opf:perl andy$ time perl sha256-slurp.pl ~/Downloads/ubuntu-12.10-desktop-amd64.iso | |
256a2cc652ec86ff366907fd7b878e577b631cc6c6533368c615913296069d80 /Users/andy/Downloads/ubuntu-12.10-desktop-amd64.iso | |
real 0m20.203s | |
user 0m13.245s | |
sys 0m3.424s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The scripts are here: https://github.com/anjackson/keeping-codes/tree/gh-pages/experiments/checksum-benchmarking/perl
It seems that the
read_file
method (a.k.a. slurp) performs badly for large files. If it reads the whole file into memory, then that additional memory management may be responsible. If it only looks like a binary array, but is implemented using data streams, the fault may perhaps lie with the way the content is buffered or with other aspects ofsysread
.I tried to understand what the slurp code is doing, and it does seem to be loading the whole file into memory. If this relies on the Perl engine transparently growing arrays as required, then there's probably a lot of
malloc
andmemcpy
going on.