- Last modified: Mon Oct 13, 2025
- Sign: Johan Nylander
Examples of doing the same thing on many input files. Often called "loops" (since after being done with the first file, we return to the binning of our commands and do it all over again).
First, create a folder for the task and an (empty) example file.
$ mkdir -p for-loops/data
$ touch for-loops/empty.txt
$ cd for-loops/data
Next we will use seq to generate a series of integers from 1 to 4
(inclusively), and use a for loop to assign and print the variable i to the
screen. See man seq for more. The manual for for is hidden in the output
from man bash.
Try seq first
$ seq 1 4
1
2
3
4
Then use this list of integers to populate the variable i in a "loop". Note
that commands written on one line needs the command-separator operator (;).
$ for i in $(seq 1 4) ; do echo "${i}" ; done
1
2
3
4
We can accomplish the same loop by hitting Return after each line. Note that
the greater-than signs (>) should not be typed in this example.
$ for i in $(seq 1 4)
> do
> echo "${i}"
> done
Common in many documents describing a for loop is to write on separate lines. For example (indentation in this example helps for readability but is not important for the shell).
for i in $(seq 1 4)
do
echo "${i}"
doneOr if planned specifically for writing on the command line
$ for i in $(seq 1 4) ; do
echo "${i}"
done
Finally, create some files with the integer i as part of the file name.
$ for i in $(seq 1 10) ; do touch "${i}.txt" ; done
List the file names using a wild card.
$ ls *.txt
1.txt 2.txt 3.txt 4.txt
List the file names on separate lines. Note that -1 is a "one", not lower
case L.
$ ls -1 *.txt
1.txt
2.txt
3.txt
4.txt
Have ls list all .txt files in current working directory (just as the previous
example), but use the variable $PWD to provide the full path.
$ ls -1 $PWD/*.txt
/home/nylander/tmp/for-loops/data/1.txt
/home/nylander/tmp/for-loops/data/2.txt
/home/nylander/tmp/for-loops/data/3.txt
/home/nylander/tmp/for-loops/data/4.txt
List all file names ending in .txt using find. See man find for more.
$ find . -type f -name '*.txt'
./2.txt
./1.txt
./4.txt
./3.txt
List those names using find, and sort the output by piping to sort.
$ find . -type f -name '*.txt' | sort
./1.txt
./2.txt
./3.txt
./4.txt
Change directory (to the for-loops directory) and repeat the same command.
Note that find searches recursively.
$ cd ..
$ find . -type f -name '*.txt' | sort
./data/1.txt
./data/2.txt
./data/3.txt
./data/4.txt
./empty.txt
Have find search only in the for-loops folder.
$ find . -maxdepth 1 -type f -name '*.txt'
./empty.txt
Have find search in a specific folder
$ find $PWD/data -maxdepth 1 -type f -name '*.txt'
/home/nylander/tmp/for-loops/data/2.txt
/home/nylander/tmp/for-loops/data/1.txt
/home/nylander/tmp/for-loops/data/4.txt
/home/nylander/tmp/for-loops/data/3.txt
If we know that we only have .txt files in a specific folders, no subfolders,
and no folders ending in ".txt" etc, we can simplify the command.
$ find /home/nylander/Documents/NRM/projects/Gabriella_Bjorklund/Bioinformatics/loops/for-loops/data -name '*.txt'
/home/nylander/tmp/for-loops/data/2.txt
/home/nylander/tmp/for-loops/data/1.txt
/home/nylander/tmp/for-loops/data/4.txt
/home/nylander/tmp/for-loops/data/3.txt
Find files using a wildcard. A good habit is to use echo and ls
on complicated expressions before running the final loop.
$ for f in data/*.txt ; do
ls "${f}"
echo "${f}.cpy"
#cp "${f}" "${f}.cpy"
done
$ for f in data/*.txt ; do
#ls "${f}"
#echo "${f}.cpy"
cp "${f}" "${f}.cpy"
done
Use find instead of the wild card. $() will return whatever the
command inside the parentheses will return.
Note: it is often recommended to combine find with while-loop
(intead of for). See example below.
$ for f in $(find data -name '*.cpy') ; do
echo "${f}"
echo "${f%.cpy}.copy"
#mv "${f}" "${f%.cpy}.copy"
done
$ for f in $(find data -name '*.cpy') ; do
#echo "${f}"
#echo "${f%.cpy}.copy"
mv "${f}" "${f%.cpy}.copy"
done
Tip: Utilize the basename command for handling paths and file ending. See
man basename (and man dirname) for more. See also man mkdir for the -p
option for mkdir.
$ for f in data/*.copy ; do
ls "${f}"
b=$(basename "${f}" .txt.copy)
echo "${b}"
d="${b}.copy"
echo "${d}/${b}.copy"
#mkdir -p "${d}"
#mv "${f}" "${d}/${b}.copy"
done
$ for f in data/*.copy ; do
#ls "${f}"
b=$(basename "${f}" .txt.copy)
#echo "${b}"
d="${b}.copy"
#echo "${d}/${b}.copy"
mkdir -p "${d}"
mv "${f}" "${d}/${b}.copy"
done
Put file names in a file (and use that file later in a loop).
$ find $PWD -type f -name '*.copy' > infile.txt
Use the while loop to read the infile.txt line by line. See man bash (and
search for "while") for more.
$ while IFS= read -r line ; do
echo "line is ${line}"
bn=$(basename "${line}")
echo "bn is ${bn}"
dn=$(dirname "${line}")
echo "dn is ${dn}"
done < infile.txt
Combine find with while. This is the recommended
way of executing commands on the output of
find (https://www.shellcheck.net/wiki/SC2044).
The example below will search for files ending in two alternative
suffixes. In addition, we take care of the situation where the files
might be symbolic links (-L).
$ while IFS= read -r -d '' infile ; do
echo "found infile \"$infile\""
done < <(find -L "$infolder" -maxdepth 1 \( -name '*.fastq' -o -name '*.fastq.gz' \) -print0)
Do things directly in find (without an external loop). See man find for
(much) more. Note that the special variable {} is the found file string. The
actual command you want to execute comes after the -exec and ends before
\;. Furthermore, the syntax to find could be quite tricky, so be careful!
$ find . -type f -name '*.copy' -exec ls -l {} \;
$ find . -type f -name '*.copy' -execdir rm -i {} \;
$ find . -type d -name '*.copy' -execdir rmdir {} \;
More complex example where we send a shell (sh) script to find. Here
we also strip the file suffix by manipulating the shell variable $1 (the
first argument to the shell script).
$ find $PWD -name '*.fas' -exec sh -c '
echo "found infile $1"
echo "infile without suffix: ${1%.fas}"
' sh {} \;
Use GNU parallel (https://www.gnu.org/software/parallel/)!
Use, e.g., find to find the files, then run your commands in parallel. GNU
parallel is a very sophisticated package that allows parallel computation of
single (or other parallel!) tasks. It comes with its own syntax. For example,
it extends the special variable {} to include many more variants than used by
the program find. For example, {.} is the file name without extension.
{/} is the basename, {.} is the basename without extension, and {/.} is
their combination. You can find more examples here:
https://www.gnu.org/software/parallel/parallel_examples.html.
$ find data -name '*.txt' | parallel 'echo {/}'
Example: Compress by starting one process per found file. GNU parallel will
take each file found by find and store them in the special variable {}, one
by one. And instead of running each "loop" consecutively, it will run them on
all available compute units. If there are more input files than units, files
will be added to a queue.
$ find data -name '*.txt' | parallel 'gzip {}'
$ find data -name '*.txt.gz' | parallel 'gunzip {}'
The syntax for parallel could be challenging for more elaborate shell commands. One solution is to create your own bash function, and then apply that function to your files in parallel.
checkNtaxaInFasta () {
# Function for checking and removing fasta files with less than N taxa
# N can be given as an argument. E.g., "parallel checkNtaxaInFasta {} 10"
f=$1 # First argument to the function
n=${2:-4} # Second argument to the function. If not used, assign a default value (4) to n.
b=$(basename "${f}")
ntax=$(grep -c '>' "${f}")
if [[ "${ntax}" -lt $n ]] ; then
echo -e "${b} have less than ${n} taxa: (${ntax}). Removing!"
rm -v "${f}"
fi
}
export -f checkNtaxaInFasta
inputfoler="/path/to/some/folder"
suffix="fasta"
min=8
# Here we search for files with a specific suffix,
# then apply our function for parallel execution.
# Note the important use of single- and double quotes.
find "${inputfolder}" -type f -name "*${suffix}" | \
parallel 'checkNtaxaInFasta {} '"${min}"''