Skip to content

Instantly share code, notes, and snippets.

@cth
Created January 13, 2016 14:35
Show Gist options
  • Save cth/9adab9f8fa37ab9df10b to your computer and use it in GitHub Desktop.
Save cth/9adab9f8fa37ab9df10b to your computer and use it in GitHub Desktop.
#!/home/fng514/bin/ruby
#$ -S /home/fng514/bin/ruby
#$ -cwd
# Script to calculate INFO (approx R-squared) and Allele frequency from a VCF file (with dosages).
# Like the ones produced with https://github.com/cth/vcf-misc-tools
# Christian Theil Have, 2016.
# Use as you see fit, but no warranties.
# Usage:
# qsub yuvaraj_info.rb my.vcf my.info
# where my.vcf is input and my.info is output
# you can install these libraries by typing
# gem install bio-samtools
# gem install zlib
require 'bio-samtools'
require 'zlib'
module Variance
def sum(&blk)
map(&blk).inject { |sum, element| sum + element }
end
def mean
(sum.to_f / size.to_f)
end
def variance
m = mean
sum { |i| ( i - m )**2 } / size
end
def std_dev
Math.sqrt(variance)
end
end
Array.send :include, Variance
info_lines = [[ "CHROM", "POS","FREQ", "INFO" ]]
variants_processed=0
File.open(ARGV[0]) do |file|
file.each do |line|
next if line =~ /^#.*/
variants_processed = variants_processed + 1
puts "variants processed: #{variants_processed}" if variants_processed % 10 == 0
vcfline=Bio::DB::Vcf.new(line)
dosages = []
vcfline.samples.each { |_,geno| dosages << geno["DS"].to_f }
p = dosages.mean / 2.0
p_all = 2.0 * p * (1.0 - p)
rsq = "%.3f" % (dosages.variance / p_all)
allele_freq ="%.3f" % (dosages.sum / (2*vcfline.samples.size))
info_lines << [ vcfline.chrom, vcfline.pos, allele_freq, rsq ]
end
end
File.open(ARGV[1], "w") do |file|
info_lines.each do |fields|
file.puts fields.join("\t")
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment