Skip to content

Instantly share code, notes, and snippets.

@iracooke
Last active August 20, 2016 00:49
Show Gist options
  • Save iracooke/44b224917d3a7895b27f04b6196c09f1 to your computer and use it in GitHub Desktop.
Save iracooke/44b224917d3a7895b27f04b6196c09f1 to your computer and use it in GitHub Desktop.
Rename fasta identifiers

Rename Fasta IDs

This script is needed for programs like secretomep that truncate fasta ids.
We need to be able to uniquely identify each fasta entry so this script renames the ids with a numeric scheme It also produces a mapping file from old to new ids so the original ids can be recovered later

Use it like this

  ./rename_fasta.rb yourfasta.fasta

If you get an error about permissions you may need to do this first

  chmod u+x rename_fasta.rb
#!/usr/bin/env ruby
# ARGV holds command line arguments. In this case it will be the name of our input fasta
fastafile=ARGV.shift
# Open two files for output.
new_fasta = File.open("#{fastafile}_rename.fasta","w+") #One for the new fasta file
new_old_mapping = File.open("#{fastafile}_mapping.txt","w+") # One for the mapping
# A counter for creating the numeric new ids
seqnum = 0
# Loop over the original fasta file
File.foreach(fastafile) do |line|
# Use a regex to match fasta description lines
id_match = line.match(/^>([^ ]*)/)
# If it is a description line then do extra stuff
if id_match
seqnum += 1
# Extract the original id
old_id = id_match.captures[0]
new_id = "S#{seqnum}"
# Create a new line with the new id
line.sub!(old_id,new_id)
new_old_mapping.write "#{old_id}\t#{new_id}\n"
end
new_fasta.write line
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment