Skip to content

Instantly share code, notes, and snippets.

@dceoy
Last active September 29, 2025 06:01
Show Gist options
  • Select an option

  • Save dceoy/99d976a2c01e7f0ba1c813778f9db744 to your computer and use it in GitHub Desktop.

Select an option

Save dceoy/99d976a2c01e7f0ba1c813778f9db744 to your computer and use it in GitHub Desktop.
[Python] Read VCF (variant call format) as pandas.DataFrame
#!/usr/bin/env python
import io
import os
import pandas as pd
def read_vcf(path):
with open(path, 'r') as f:
lines = [l for l in f if not l.startswith('##')]
return pd.read_csv(
io.StringIO(''.join(lines)),
dtype={'#CHROM': str, 'POS': int, 'ID': str, 'REF': str, 'ALT': str,
'QUAL': str, 'FILTER': str, 'INFO': str},
sep='\t'
).rename(columns={'#CHROM': 'CHROM'})
@YoavEtzioni
Copy link
Copy Markdown

Nice. Very useful.

@dharbi
Copy link
Copy Markdown

dharbi commented Nov 19, 2018

Really convenient!

@hansonglee
Copy link
Copy Markdown

Oh thank you

@pdorsaint
Copy link
Copy Markdown

Hi,

Thank you so much for this script! I am trying to run this script on a vcf file.
Do you run the script like this "python read_vcf.py vcf_filename" ?

Thanks!

@dceoy
Copy link
Copy Markdown
Author

dceoy commented Jul 13, 2019

I developed pdbio package. Please use it. @pdorsaint

https://github.com/dceoy/pdbio

This package is a Pandas-based data handling tool and supports the use from a command-line.

Example of VCF data handling:

$ pdbio vcf2csv --tsv ./test/example.vcf

@DouglasAbrams
Copy link
Copy Markdown

DouglasAbrams commented May 7, 2020

a way of doing it that will use all fields on any vcf using pyvcf https://pyvcf.readthedocs.io/en/v0.4.6/INTRO.html

import pandas as pd
import vcf

def read(f):
    reader = vcf.Reader(open(f))
    df = pd.DataFrame([vars(r) for r in reader])
    out = df.merge(pd.DataFrame(df.INFO.tolist()),
                   left_index=True, right_index=True)
    return out

run read(your_vcf)

@sbslee
Copy link
Copy Markdown

sbslee commented May 4, 2021

If anyone's interested, I was looking for a way to do this too and ended up writing the pyvcf submodule:

A quick example of pyvcf.VcfFrame:

data = {
    'CHROM': ['chr1', 'chr2'],
    'POS': [100, 101],
    'ID': ['.', '.'],
    'REF': ['G', 'T'],
    'ALT': ['A', 'C'],
    'QUAL': ['.', '.'],
    'FILTER': ['.', '.'],
    'INFO': ['.', '.'],
    'FORMAT': ['GT', 'GT'],
    'Steven': ['0/1', '1/1']
}
vf = pyvcf.VcfFrame.from_dict([], data)
vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT Steven
0  chr1  100  .   G   A    .      .    .     GT    0/1
1  chr2  101  .   T   C    .      .    .     GT    1/1

To read a VCF file into VcfFrame:

vf = pyvcf.VcfFrame.from_file('example.vcf')

@Vicbuz
Copy link
Copy Markdown

Vicbuz commented Jun 21, 2021

This was so so useful. Thank you very much @dceoy

@upendrak
Copy link
Copy Markdown

It works great. Thanks

@Mohammed-Alfayyadh
Copy link
Copy Markdown

Hi,
Did you find a solution for not finding the result after you use the python script ? I am facing the same issue

@SciNanda
Copy link
Copy Markdown

SciNanda commented Nov 7, 2022

This was all I need for now. Thank you very much!! :)

@NajlaAbassi
Copy link
Copy Markdown

That was indeed usefull! Thank you very much!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment