context: shells want to leave the seek position at the right offset so external commands can continue reading from the same fd
i saw this somewhat surprising tally of syscalls in bash. let's investigate
one iteration:
- unblock all signals
- check if the fd is a tty
- check if the fd is seekable
- read 4k
- reposition seek position after the end of the line
then
- unblock all signals again for good measure
- if the fd is now magically a tty?
- maybe it's now seekable?
- read 4k
- seek to next line
and then
- better unblock those signals again
- what if the fd had become a tty tho?
- i really better check if it's seekable again
- read
- seek
and then...
dash doesn't like seeking back so it just reads one byte at a time
i've discovered the 70% rule: shells must do something stupid 70% of the time. it's probably a fundamental principle of thermodynamics or something
dash wastes 70% of the total runtime purely due to syscall overhead
somehow bash manages to do ??? in userspace instead
mksh is back to reading one byte at a time
zsh also reads one byte at a time, but it really wants to make sure that the signals don't do anything funny
block sigchld and sigwinch. ok now re block sigchld. unblock it. block it again. block it again again. unblock it. block it again. unblock it. read
and both end up slower than bash
surprisingly low numbers from ksh93u+m
turns out it reads 64k at a time and... doesn't do anything too stupid?! ok it's still a bit stupid for all the wasted lseeks, but the total syscall overhead was 0.5% so it gets a pass. good job ksh
python, a notoriously fast programming language, for comparison
next: revolutionary invention
previous: short bash quiz