Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save thapakazi/2152717ef453e52d338d7a10a9cd457b to your computer and use it in GitHub Desktop.
Save thapakazi/2152717ef453e52d338d7a10a9cd457b to your computer and use it in GitHub Desktop.
sample random lines from file in bash, benchmark
#!/bin/bash
FILENAME="/tmp/random-lines.$$.tmp"
NUMLINES=10000000
seq -f 'line %.0f' $NUMLINES > $FILENAME;
echo "10 random lines with nl:"
$(which time) -v nl -ba $filename | sort -r | sed 's/.*[0-9]\t//' | head > /dev/null
echo "10 random lines with shuf:"
$(which time) -v shuf $FILENAME -n10 | head > /dev/null
echo "10 random lines with rl:"
$(which time) -v rl $FILENAME | head > /dev/null
echo "10 random lines with perl:"
$(which time) -v cat $FILENAME | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);' | head > /dev/null
echo "10 random lines with python:"
$(which time) -v python -c "import random, sys; lines = open(sys.argv[1]).readlines(); random.shuffle(lines); print ''.join(lines[:10])," $FILENAME > /dev/null
rm -rf $FILENAME
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment