A Julia script to extract regions/individuals from imputed data in vcf format as produced by vcf-misc-tools to a dosage table format convenient for asssociation analyses.
The computionally/IO heavy parts are submitted via cluster grid engine via qsub.
To install type:
git clone https://gist.github.com/cth/a2f14830bc88a668931b0c08a9c82ac7 myworkdir
This depends on my other tools:
- vcf-misc-tools (https://github.com/cth/vcf-misc-tools)
- ClusterSubmitExternal (https://github.com/cth/ClusterSubmitExternal.jl)
As well as some Julia packages: DataFrames, GZip and Glob
These are automatically installed by the script,.
The dependencies which are not automatically installed are
- julia (>= 0.4)
- vcftools (>= 0.1.15)
- ruby (>= ver 1.9)
Edit extract_dosage_table and change settings in the beginning of your file to fit your needs.
################################################################
### SETTINGS (you may need to change these)
# The library where the exomchup imputations reside
imputation_directory="/emc/cbmr/san7/fng514/imputation/exomchip"
# A bedfile containing all the regions/snps to extract
# Note that the output files will match the name of this
# file, i.e., they will be called bp_snps.info and bp_snps.dosage
bedfile = "bp_snps.bed"
# List of individuals ids to keep in final files
individual_ids="health06_particids.txt"
################ END OF SETTINGS ###############################
within workdir
run:
julia extract_dosage_table
or
nohup julia extract_dosage_table &
if you are not inclined to watch it finish (it may take a while since the script waits for queued jobs to finished before exiting).