Skip to content

Instantly share code, notes, and snippets.

View ammaraziz's full-sized avatar
💤
Blue Shell

Ammar Aziz ammaraziz

💤
Blue Shell
  • Victorian Infectious Disease Reference Laboratory (VIDRL)
View GitHub Profile
@ammaraziz
ammaraziz / parseSNPs.py
Created November 18, 2020 13:59 — forked from peterk87/parseSNPs.py
Python: Parse SNPs from one or more multiple sequence alignments in multifasta format and output a concatenated SNP fasta, a basic SNP report, and/or [binarized] SNP table.
import argparse
import textwrap
import os
import sys
from datetime import timedelta, datetime
# function for reading a multifasta file
# returns a dictionary with sequence headers and nucleotide sequences
def get_seqs_from_fasta(filepath):
"""
This code pulls data from the WHO's influenza surveillance database:
https://apps.who.int/flumart/Default?ReportNo=12
This website is pretty tricky to parse; you must pass realistic headers to the POST requests, and you must also
issue 3 total requests: 1) a GET request, 2) a POST request, and 3) another POST request. All 3 of these requests,
in order, are required to actually collect the underlying data that's displayed in the table. See `get_table_data`
for more documentation on this process.