Skip to content

Instantly share code, notes, and snippets.

@dceoy
Last active May 16, 2026 14:44
Show Gist options
  • Select an option

  • Save dceoy/99d976a2c01e7f0ba1c813778f9db744 to your computer and use it in GitHub Desktop.

Select an option

Save dceoy/99d976a2c01e7f0ba1c813778f9db744 to your computer and use it in GitHub Desktop.
[Python] Read VCF (variant call format) as pandas.DataFrame
#!/usr/bin/env python
import io
import os
import pandas as pd
def read_vcf(path):
with open(path, 'r') as f:
lines = [l for l in f if not l.startswith('##')]
return pd.read_csv(
io.StringIO(''.join(lines)),
dtype={'#CHROM': str, 'POS': int, 'ID': str, 'REF': str, 'ALT': str,
'QUAL': str, 'FILTER': str, 'INFO': str},
sep='\t'
).rename(columns={'#CHROM': 'CHROM'})
@YoavEtzioni

Copy link
Copy Markdown

Nice. Very useful.

@dharbi

dharbi commented Nov 19, 2018

Copy link
Copy Markdown

Really convenient!

@hansonglee

Copy link
Copy Markdown

Oh thank you

@pdorsaint

Copy link
Copy Markdown

Hi,

Thank you so much for this script! I am trying to run this script on a vcf file.
Do you run the script like this "python read_vcf.py vcf_filename" ?

Thanks!

@dceoy

dceoy commented Jul 13, 2019

Copy link
Copy Markdown
Author

I developed pdbio package. Please use it. @pdorsaint

https://github.com/dceoy/pdbio

This package is a Pandas-based data handling tool and supports the use from a command-line.

Example of VCF data handling:

$ pdbio vcf2csv --tsv ./test/example.vcf

@DouglasAbrams

DouglasAbrams commented May 7, 2020

Copy link
Copy Markdown

a way of doing it that will use all fields on any vcf using pyvcf https://pyvcf.readthedocs.io/en/v0.4.6/INTRO.html

import pandas as pd
import vcf

def read(f):
    reader = vcf.Reader(open(f))
    df = pd.DataFrame([vars(r) for r in reader])
    out = df.merge(pd.DataFrame(df.INFO.tolist()),
                   left_index=True, right_index=True)
    return out

run read(your_vcf)

@sbslee

sbslee commented May 4, 2021

Copy link
Copy Markdown

If anyone's interested, I was looking for a way to do this too and ended up writing the pyvcf submodule:

A quick example of pyvcf.VcfFrame:

data = {
    'CHROM': ['chr1', 'chr2'],
    'POS': [100, 101],
    'ID': ['.', '.'],
    'REF': ['G', 'T'],
    'ALT': ['A', 'C'],
    'QUAL': ['.', '.'],
    'FILTER': ['.', '.'],
    'INFO': ['.', '.'],
    'FORMAT': ['GT', 'GT'],
    'Steven': ['0/1', '1/1']
}
vf = pyvcf.VcfFrame.from_dict([], data)
vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT Steven
0  chr1  100  .   G   A    .      .    .     GT    0/1
1  chr2  101  .   T   C    .      .    .     GT    1/1

To read a VCF file into VcfFrame:

vf = pyvcf.VcfFrame.from_file('example.vcf')

@Vicbuz

Vicbuz commented Jun 21, 2021

Copy link
Copy Markdown

This was so so useful. Thank you very much @dceoy

@upendrak

Copy link
Copy Markdown

It works great. Thanks

@Mohammed-Alfayyadh

Copy link
Copy Markdown

Hi,
Did you find a solution for not finding the result after you use the python script ? I am facing the same issue

@SciNanda

SciNanda commented Nov 7, 2022

Copy link
Copy Markdown

This was all I need for now. Thank you very much!! :)

@NajlaAbassi

Copy link
Copy Markdown

That was indeed usefull! Thank you very much!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment