Tips and Tricks using Bash to solve Informatics problems

Overview

This is the first of a series of write-ups that demonstrate using Bash tricks to solve an Informatics Problem.
The format is the Overview section briefly describes the Problem, and describes pitfalls and difficulties. The Method section describes any relevant background and CS theory, note it may be blank. The Solution section describes how to solve the described problem.

This problem involves one approach to fix a malformed fasta file. Briefly, a fasta file must have the format:

    >some_header Spaces allowed
    AATTCCGGAACCGGAACCAA

In the event that the '>' is not present in the header, this can be replaced using perl or sed. The one-liner using perl will be described here, but note that sed usage will be very similar.

Methods

    # STRING is the string that will be matched using regex
    # inputFile.fasta is the file that perl will parse
    # outputFile.fasta is the modified output file
    perl -pe 's/^STRING/>STRING/' inputFile.fasta > outputFile.fasta

For example:

    # input
    Mamu-A1*001
    AACCTTGGAACCAATTGG
    
    # command
    perl -pe 's/^Mamu-*/>Mamu-/' inputFile.fasta > outputFile.fasta
    
    # output
    >Mamu-A1*001
    AACCTTGGAACCAATTGG

disulfidebond/bash_black_magic_trickery_2.md

Overview

Methods