Skip to content

Instantly share code, notes, and snippets.

@knmkr
Created April 28, 2015 08:42
Show Gist options
  • Select an option

  • Save knmkr/8dbecf690a9210fef937 to your computer and use it in GitHub Desktop.

Select an option

Save knmkr/8dbecf690a9210fef937 to your computer and use it in GitHub Desktop.
[bioinfo] Get FASTA sequence of rs ID

Download all FASTA records (stored in multi-FASTA files) from FTP for current dbSNP build.

$ wget -r -l 0 ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/rs_fasta

Parse multi-FASTA file, using Biopython ($ pip install biopython).

In [1]: from Bio import SeqIO

In [2]: import gzip

In [3]: handle = gzip.open("ftp.ncbi.nih.gov/snp/organisms/human_9606/rs_fasta/rs_ch1.fas.gz", "rU")

In [4]: record = SeqIO.parse(handle, "fasta").next()

In [5]: record
Out[5]: SeqRecord(seq=Seq('tcattgatggacatttgggttggttccaggtctttgctattgcgagtagtgcca...att', SingleLetterAlphabet()), id='gnl|dbSNP|rs171', name='gnl|dbSNP|rs171', description='gnl|dbSNP|rs171 rs=171|pos=500|len=702|taxid=9606|mol="genomic"|class=1|alleles="A/G"|build=138|suspect=?', dbxrefs=[])

In [6]: record.seq
Out[6]: Seq('tcattgatggacatttgggttggttccaggtctttgctattgcgagtagtgcca...att', SingleLetterAlphabet())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment