Skip to content

Instantly share code, notes, and snippets.

@nylander
Last active October 13, 2025 08:18
Show Gist options
  • Save nylander/1b4a04ef09500d274f488bc14cfe0321 to your computer and use it in GitHub Desktop.
Save nylander/1b4a04ef09500d274f488bc14cfe0321 to your computer and use it in GitHub Desktop.

Loops

  • Last modified: Mon Oct 13, 2025
  • Sign: Johan Nylander

Description

Examples of doing the same thing on many input files. Often called "loops" (since after being done with the first file, we return to the binning of our commands and do it all over again).

for-loop for creating files

First, create a folder for the task and an (empty) example file.

$ mkdir -p for-loops/data
$ touch for-loops/empty.txt
$ cd for-loops/data

Next we will use seq to generate a series of integers from 1 to 4 (inclusively), and use a for loop to assign and print the variable i to the screen. See man seq for more. The manual for for is hidden in the output from man bash.

Try seq first

$ seq 1 4
1
2
3
4

Then use this list of integers to populate the variable i in a "loop". Note that commands written on one line needs the command-separator operator (;).

$ for i in $(seq 1 4) ; do echo "${i}" ; done
1
2
3
4

We can accomplish the same loop by hitting Return after each line. Note that the greater-than signs (>) should not be typed in this example.

$ for i in $(seq 1 4)
> do
> echo "${i}"
> done

Common in many documents describing a for loop is to write on separate lines. For example (indentation in this example helps for readability but is not important for the shell).

for i in $(seq 1 4)
do
  echo "${i}"
done

Or if planned specifically for writing on the command line

$ for i in $(seq 1 4) ; do
    echo "${i}"
  done

Finally, create some files with the integer i as part of the file name.

$ for i in $(seq 1 10) ; do touch "${i}.txt" ; done

List files

List the file names using a wild card.

$ ls *.txt
1.txt  2.txt  3.txt  4.txt

List the file names on separate lines. Note that -1 is a "one", not lower case L.

$ ls -1 *.txt
1.txt
2.txt
3.txt
4.txt

Have ls list all .txt files in current working directory (just as the previous example), but use the variable $PWD to provide the full path.

$ ls -1 $PWD/*.txt
/home/nylander/tmp/for-loops/data/1.txt
/home/nylander/tmp/for-loops/data/2.txt
/home/nylander/tmp/for-loops/data/3.txt
/home/nylander/tmp/for-loops/data/4.txt

List all file names ending in .txt using find. See man find for more.

$ find . -type f -name '*.txt'
./2.txt
./1.txt
./4.txt
./3.txt

List those names using find, and sort the output by piping to sort.

$ find . -type f -name '*.txt' | sort
./1.txt
./2.txt
./3.txt
./4.txt

Change directory (to the for-loops directory) and repeat the same command. Note that find searches recursively.

$ cd ..
$ find . -type f -name '*.txt' | sort
./data/1.txt
./data/2.txt
./data/3.txt
./data/4.txt
./empty.txt

Have find search only in the for-loops folder.

$ find . -maxdepth 1 -type f -name '*.txt'
./empty.txt

Have find search in a specific folder

$ find $PWD/data -maxdepth 1 -type f -name '*.txt'
/home/nylander/tmp/for-loops/data/2.txt
/home/nylander/tmp/for-loops/data/1.txt
/home/nylander/tmp/for-loops/data/4.txt
/home/nylander/tmp/for-loops/data/3.txt

If we know that we only have .txt files in a specific folders, no subfolders, and no folders ending in ".txt" etc, we can simplify the command.

$ find /home/nylander/Documents/NRM/projects/Gabriella_Bjorklund/Bioinformatics/loops/for-loops/data -name '*.txt'
/home/nylander/tmp/for-loops/data/2.txt
/home/nylander/tmp/for-loops/data/1.txt
/home/nylander/tmp/for-loops/data/4.txt
/home/nylander/tmp/for-loops/data/3.txt

Do some things on found files

The for loop

Find files using a wildcard. A good habit is to use echo and ls on complicated expressions before running the final loop.

$ for f in data/*.txt ; do
    ls "${f}"
    echo "${f}.cpy"
    #cp "${f}" "${f}.cpy"
  done

$ for f in data/*.txt ; do
    #ls "${f}"
    #echo "${f}.cpy"
    cp "${f}" "${f}.cpy"
  done

Use find instead of the wild card. $() will return whatever the command inside the parentheses will return. Note: it is often recommended to combine find with while-loop (intead of for). See example below.

$ for f in $(find data -name '*.cpy') ; do
    echo "${f}"
    echo "${f%.cpy}.copy"
    #mv "${f}" "${f%.cpy}.copy"
  done

$ for f in $(find data -name '*.cpy') ; do
    #echo "${f}"
    #echo "${f%.cpy}.copy"
    mv "${f}" "${f%.cpy}.copy"
  done

Tip: Utilize the basename command for handling paths and file ending. See man basename (and man dirname) for more. See also man mkdir for the -p option for mkdir.

$ for f in data/*.copy ; do
    ls "${f}"
    b=$(basename "${f}" .txt.copy)
    echo "${b}"
    d="${b}.copy"
    echo "${d}/${b}.copy"
    #mkdir -p "${d}"
    #mv "${f}" "${d}/${b}.copy"
  done

$ for f in data/*.copy ; do
    #ls "${f}"
    b=$(basename "${f}" .txt.copy)
    #echo "${b}"
    d="${b}.copy"
    #echo "${d}/${b}.copy"
    mkdir -p "${d}"
    mv "${f}" "${d}/${b}.copy"
  done

Put file names in a file (and use that file later in a loop).

$ find $PWD -type f -name '*.copy' > infile.txt

The while loop

Use the while loop to read the infile.txt line by line. See man bash (and search for "while") for more.

$ while IFS= read -r line ; do
    echo "line is ${line}"
    bn=$(basename "${line}")
    echo "bn is ${bn}"
    dn=$(dirname "${line}")
    echo "dn is ${dn}"
  done < infile.txt

Combine find with while. This is the recommended way of executing commands on the output of find (https://www.shellcheck.net/wiki/SC2044). The example below will search for files ending in two alternative suffixes. In addition, we take care of the situation where the files might be symbolic links (-L).

$ while IFS= read -r -d '' infile ; do
    echo "found infile \"$infile\""
  done < <(find -L "$infolder" -maxdepth 1 \( -name '*.fastq' -o -name '*.fastq.gz' \) -print0)

Use find directly

Do things directly in find (without an external loop). See man find for (much) more. Note that the special variable {} is the found file string. The actual command you want to execute comes after the -exec and ends before \;. Furthermore, the syntax to find could be quite tricky, so be careful!

$ find . -type f -name '*.copy' -exec ls -l {} \;
$ find . -type f -name '*.copy' -execdir rm -i {} \;
$ find . -type d -name '*.copy' -execdir rmdir {} \;

More complex example where we send a shell (sh) script to find. Here we also strip the file suffix by manipulating the shell variable $1 (the first argument to the shell script).

$ find $PWD -name '*.fas' -exec sh -c '
    echo "found infile $1"
    echo "infile without suffix: ${1%.fas}"
    ' sh {} \;

Execute in parallel

Use GNU parallel (https://www.gnu.org/software/parallel/)!

Use, e.g., find to find the files, then run your commands in parallel. GNU parallel is a very sophisticated package that allows parallel computation of single (or other parallel!) tasks. It comes with its own syntax. For example, it extends the special variable {} to include many more variants than used by the program find. For example, {.} is the file name without extension. {/} is the basename, {.} is the basename without extension, and {/.} is their combination. You can find more examples here: https://www.gnu.org/software/parallel/parallel_examples.html.

$ find data -name '*.txt' | parallel 'echo {/}'

Example: Compress by starting one process per found file. GNU parallel will take each file found by find and store them in the special variable {}, one by one. And instead of running each "loop" consecutively, it will run them on all available compute units. If there are more input files than units, files will be added to a queue.

$ find data -name '*.txt' | parallel 'gzip {}'
$ find data -name '*.txt.gz' | parallel 'gunzip {}'

The syntax for parallel could be challenging for more elaborate shell commands. One solution is to create your own bash function, and then apply that function to your files in parallel.

checkNtaxaInFasta () {
  # Function for checking and removing fasta files with less than N taxa
  # N can be given as an argument. E.g., "parallel checkNtaxaInFasta {} 10"
  f=$1      # First argument to the function
  n=${2:-4} # Second argument to the function. If not used, assign a default value (4) to n.
  b=$(basename "${f}")
  ntax=$(grep -c '>' "${f}")
  if [[ "${ntax}" -lt $n ]] ; then
    echo -e "${b} have less than ${n} taxa: (${ntax}). Removing!"
    rm -v "${f}"
  fi
}
export -f checkNtaxaInFasta

inputfoler="/path/to/some/folder"
suffix="fasta"
min=8

# Here we search for files with a specific suffix,
# then apply our function for parallel execution.
# Note the important use of single- and double quotes.
find "${inputfolder}" -type f -name "*${suffix}" | \
    parallel 'checkNtaxaInFasta {} '"${min}"''
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment