Skip to content

Instantly share code, notes, and snippets.

@mdshw5
Last active March 9, 2020 15:51
Show Gist options
  • Save mdshw5/6c06a272a7fde2a5288214080210c8cf to your computer and use it in GitHub Desktop.
Save mdshw5/6c06a272a7fde2a5288214080210c8cf to your computer and use it in GitHub Desktop.
biostars 265811
from pyfaidx import Fasta
# positions.txt:
# chr1 1 T
# chr1 100 C
# chr2 10 G
# ...
with open('positions.txt') as mut_table:
# mutable Fasta modifies input file in-place
# make sure you're editing a copy of the original file
with Fasta('input.fasta', mutable=True) as fasta:
for line in mut_table:
rname, pos, base = line.rstrip().split()
# convert 1-based to 0-based coordinates
fasta[rname][int(pos) - 1] = base
@biancajaydiaz
Copy link

Hello!

I am trying to implement this code to edit a fasta file that will convert C's to T's in expected base edited sites in order to build a reference index file. I've been trying this code on a smaller set of sequence but for some reason the code is not working. I am not getting the expected edit at the expected position.

@mdshw5
Copy link
Author

mdshw5 commented Feb 26, 2020

Hey @biancajaydiaz. I was originally trying to answer a question where the posted didn't specify enough information about their FASTA file, and so I didn't know if they had multiple entries (chromosomes) or not: https://www.biostars.org/p/265811. I've updated the code to work on a file that has 3 columns: chromosome name, 1-based position, and base to substitute. This might solve your issue, or if you post a small example I could help more. Good luck!

@biancajaydiaz
Copy link

Thanks @mdshw5 for getting back to me. I actually have a slightly different problem.

`>ACVR1_R206H_1
CACCGCTGGCGAGCCACTGTTCTTTAACAGTGTAATCTGGCGAGCCACTGTTCTTTGTACCAGAAGAATTC

ACVR1_R206H_2
CACCGTCTGGCGAGCCACTGTTCTTCAACAGTGTAATCTGGCGAGCCACTGTTCTTTGTACCAGAGAATTC
ACVR1_S290L_3
CACCGATCGTTGTACGACTATCTTTCATGAAATGGGATCGTTGTACGACTATCTTCAGCTTACTAGAATTC`

I have a fasta index of around 6000 guides like above where i'd like to mutate a C in a specific position to a T. I thought I could use the pyfaidx package but the output does not change from the original input.

@mdshw5
Copy link
Author

mdshw5 commented Mar 9, 2020

Sorry to hear that you're having trouble. I've tried to reproduce your issue using the example you provided, however I do see that the sites are mutated when running the code in this gist.

$ cat input.fasta
>ACVR1_R206H_1
CACCGCTGGCGAGCCACTGTTCTTTAACAGTGTAATCTGGCGAGCCACTGTTCTTTGTACCAGAAGAATTC
>ACVR1_R206H_2
CACCGTCTGGCGAGCCACTGTTCTTCAACAGTGTAATCTGGCGAGCCACTGTTCTTTGTACCAGAGAATTC
>ACVR1_S290L_3
CACCGATCGTTGTACGACTATCTTTCATGAAATGGGATCGTTGTACGACTATCTTCAGCTTACTAGAATTC
$ cat positions.txt 
ACVR1_R206H_1 1 T
ACVR1_R206H_2 1 T
ACVR1_S290L_3 1 T
ACVR1_R206H_1 4 T
ACVR1_R206H_2 4 T
ACVR1_S290L_3 4 T
$ cat mutate_guides.py
from pyfaidx import Fasta

# positions.txt:
# chr1 1 T
# chr1 100 C
# chr2 10 G
# ...

with open('positions.txt') as mut_table:
  # mutable Fasta modifies input file in-place
  # make sure you're editing a copy of the original file
  with Fasta('input.fasta', mutable=True) as fasta:
    for line in mut_table:
      rname, pos, base = line.rstrip().split()
      # convert 1-based to 0-based coordinates
      fasta[rname][int(pos) - 1] = base
$ python --version
Python 3.6.5
$ pip freeze
pyfaidx==0.5.8
six==1.14.0

Running the script produces the expected output:

$ python mutate_guides.py
$ cat input.fasta
>ACVR1_R206H_1
TACTGCTGGCGAGCCACTGTTCTTTAACAGTGTAATCTGGCGAGCCACTGTTCTTTGTACCAGAAGAATTC
>ACVR1_R206H_2
TACTGTCTGGCGAGCCACTGTTCTTCAACAGTGTAATCTGGCGAGCCACTGTTCTTTGTACCAGAGAATTC
>ACVR1_S290L_3
TACTGATCGTTGTACGACTATCTTTCATGAAATGGGATCGTTGTACGACTATCTTCAGCTTACTAGAATTC

Hopefully this can give you a starting point to figure out what's going wrong. Do note that the Fasta(mutable=True) does mean that the FASTA file will be edited in-place, and so you'll want to run the script on a copy of your guide sequence file so that you can compare the changes to the original.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment