Skip to content

Instantly share code, notes, and snippets.

@izabera
Last active January 6, 2025 17:44
Show Gist options
  • Save izabera/447c162219f25041a5aa4abb3dc255bd to your computer and use it in GitHub Desktop.
Save izabera/447c162219f25041a5aa4abb3dc255bd to your computer and use it in GitHub Desktop.
reading a file line by line in your shell

context: shells want to leave the seek position at the right offset so external commands can continue reading from the same fd


i saw this somewhat surprising tally of syscalls in bash. let's investigate

surprise

one iteration:

  • unblock all signals
  • check if the fd is a tty
  • check if the fd is seekable
  • read 4k
  • reposition seek position after the end of the line

then

  • unblock all signals again for good measure
  • if the fd is now magically a tty?
  • maybe it's now seekable?
  • read 4k
  • seek to next line

and then

  • better unblock those signals again
  • what if the fd had become a tty tho?
  • i really better check if it's seekable again
  • read
  • seek

and then...

iteration

dash doesn't like seeking back so it just reads one byte at a time

dash

i've discovered the 70% rule: shells must do something stupid 70% of the time. it's probably a fundamental principle of thermodynamics or something

dash wastes 70% of the total runtime purely due to syscall overhead

somehow bash manages to do ??? in userspace instead

70% rule

mksh is back to reading one byte at a time

mksh

zsh also reads one byte at a time, but it really wants to make sure that the signals don't do anything funny

block sigchld and sigwinch. ok now re block sigchld. unblock it. block it again. block it again again. unblock it. block it again. unblock it. read

zsh

and both end up slower than bash

slow

surprisingly low numbers from ksh93u+m

ksh

turns out it reads 64k at a time and... doesn't do anything too stupid?! ok it's still a bit stupid for all the wasted lseeks, but the total syscall overhead was 0.5% so it gets a pass. good job ksh

ksh strace

python, a notoriously fast programming language, for comparison

python


next: revolutionary invention

previous: short bash quiz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment