This is the first of a series of write-ups that demonstrate using Bash tricks to solve an Informatics Problem.
The format is the Overview section briefly describes the Problem, and describes pitfalls and difficulties.
The Method section describes any relevant background and CS theory, note it may be blank.
The Solution section describes how to solve the described problem.
This problem involves one approach to fix a malformed fasta file. Briefly, a fasta file must have the format:
>some_header Spaces allowed
AATTCCGGAACCGGAACCAA
In the event that the '>' is not present in the header, this can be replaced using perl or sed. The one-liner using perl will be described here, but note that sed usage will be very similar.
# STRING is the string that will be matched using regex
# inputFile.fasta is the file that perl will parse
# outputFile.fasta is the modified output file
perl -pe 's/^STRING/>STRING/' inputFile.fasta > outputFile.fasta
For example:
# input
Mamu-A1*001
AACCTTGGAACCAATTGG
# command
perl -pe 's/^Mamu-*/>Mamu-/' inputFile.fasta > outputFile.fasta
# output
>Mamu-A1*001
AACCTTGGAACCAATTGG