Skip to content

Instantly share code, notes, and snippets.

@disulfidebond
Created May 29, 2019 18:41
Show Gist options
  • Save disulfidebond/50b85f4337787c914ad508846258af72 to your computer and use it in GitHub Desktop.
Save disulfidebond/50b85f4337787c914ad508846258af72 to your computer and use it in GitHub Desktop.
Tips and Tricks using Bash to solve Informatics problems

Overview

This is the first of a series of write-ups that demonstrate using Bash tricks to solve an Informatics Problem.
The format is the Overview section briefly describes the Problem, and describes pitfalls and difficulties. The Method section describes any relevant background and CS theory, note it may be blank. The Solution section describes how to solve the described problem.

This problem involves one approach to fix a malformed fasta file. Briefly, a fasta file must have the format:

    >some_header Spaces allowed
    AATTCCGGAACCGGAACCAA

In the event that the '>' is not present in the header, this can be replaced using perl or sed. The one-liner using perl will be described here, but note that sed usage will be very similar.

Methods

    # STRING is the string that will be matched using regex
    # inputFile.fasta is the file that perl will parse
    # outputFile.fasta is the modified output file
    perl -pe 's/^STRING/>STRING/' inputFile.fasta > outputFile.fasta

For example:

    # input
    Mamu-A1*001
    AACCTTGGAACCAATTGG
    
    # command
    perl -pe 's/^Mamu-*/>Mamu-/' inputFile.fasta > outputFile.fasta
    
    # output
    >Mamu-A1*001
    AACCTTGGAACCAATTGG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment