-
-
Save mdshw5/6c06a272a7fde2a5288214080210c8cf to your computer and use it in GitHub Desktop.
from pyfaidx import Fasta | |
# positions.txt: | |
# chr1 1 T | |
# chr1 100 C | |
# chr2 10 G | |
# ... | |
with open('positions.txt') as mut_table: | |
# mutable Fasta modifies input file in-place | |
# make sure you're editing a copy of the original file | |
with Fasta('input.fasta', mutable=True) as fasta: | |
for line in mut_table: | |
rname, pos, base = line.rstrip().split() | |
# convert 1-based to 0-based coordinates | |
fasta[rname][int(pos) - 1] = base |
Thanks @mdshw5 for getting back to me. I actually have a slightly different problem.
`>ACVR1_R206H_1
CACCGCTGGCGAGCCACTGTTCTTTAACAGTGTAATCTGGCGAGCCACTGTTCTTTGTACCAGAAGAATTC
ACVR1_R206H_2
CACCGTCTGGCGAGCCACTGTTCTTCAACAGTGTAATCTGGCGAGCCACTGTTCTTTGTACCAGAGAATTC
ACVR1_S290L_3
CACCGATCGTTGTACGACTATCTTTCATGAAATGGGATCGTTGTACGACTATCTTCAGCTTACTAGAATTC`
I have a fasta index of around 6000 guides like above where i'd like to mutate a C in a specific position to a T. I thought I could use the pyfaidx package but the output does not change from the original input.
Sorry to hear that you're having trouble. I've tried to reproduce your issue using the example you provided, however I do see that the sites are mutated when running the code in this gist.
$ cat input.fasta
>ACVR1_R206H_1
CACCGCTGGCGAGCCACTGTTCTTTAACAGTGTAATCTGGCGAGCCACTGTTCTTTGTACCAGAAGAATTC
>ACVR1_R206H_2
CACCGTCTGGCGAGCCACTGTTCTTCAACAGTGTAATCTGGCGAGCCACTGTTCTTTGTACCAGAGAATTC
>ACVR1_S290L_3
CACCGATCGTTGTACGACTATCTTTCATGAAATGGGATCGTTGTACGACTATCTTCAGCTTACTAGAATTC
$ cat positions.txt
ACVR1_R206H_1 1 T
ACVR1_R206H_2 1 T
ACVR1_S290L_3 1 T
ACVR1_R206H_1 4 T
ACVR1_R206H_2 4 T
ACVR1_S290L_3 4 T
$ cat mutate_guides.py
from pyfaidx import Fasta
# positions.txt:
# chr1 1 T
# chr1 100 C
# chr2 10 G
# ...
with open('positions.txt') as mut_table:
# mutable Fasta modifies input file in-place
# make sure you're editing a copy of the original file
with Fasta('input.fasta', mutable=True) as fasta:
for line in mut_table:
rname, pos, base = line.rstrip().split()
# convert 1-based to 0-based coordinates
fasta[rname][int(pos) - 1] = base
$ python --version
Python 3.6.5
$ pip freeze
pyfaidx==0.5.8
six==1.14.0
Running the script produces the expected output:
$ python mutate_guides.py
$ cat input.fasta
>ACVR1_R206H_1
TACTGCTGGCGAGCCACTGTTCTTTAACAGTGTAATCTGGCGAGCCACTGTTCTTTGTACCAGAAGAATTC
>ACVR1_R206H_2
TACTGTCTGGCGAGCCACTGTTCTTCAACAGTGTAATCTGGCGAGCCACTGTTCTTTGTACCAGAGAATTC
>ACVR1_S290L_3
TACTGATCGTTGTACGACTATCTTTCATGAAATGGGATCGTTGTACGACTATCTTCAGCTTACTAGAATTC
Hopefully this can give you a starting point to figure out what's going wrong. Do note that the Fasta(mutable=True)
does mean that the FASTA file will be edited in-place, and so you'll want to run the script on a copy of your guide sequence file so that you can compare the changes to the original.
Hey @biancajaydiaz. I was originally trying to answer a question where the posted didn't specify enough information about their FASTA file, and so I didn't know if they had multiple entries (chromosomes) or not: https://www.biostars.org/p/265811. I've updated the code to work on a file that has 3 columns: chromosome name, 1-based position, and base to substitute. This might solve your issue, or if you post a small example I could help more. Good luck!