Skip to content

Instantly share code, notes, and snippets.

@arq5x
Last active December 17, 2015 19:09
Show Gist options
  • Save arq5x/5658157 to your computer and use it in GitHub Desktop.
Save arq5x/5658157 to your computer and use it in GitHub Desktop.
CBW 2013 - Structural Variation Practical Session.
##fileformat=VCFv4.1
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends">
##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=DUP:TANDEM,Description="Tandem Duplication">
##source=hydra
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 2911212 hydra101 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,277;CIEND=0,234;END=2911877;SVLEN=-666
chr1 2918797 hydra102 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,243;CIEND=0,264;END=2919390;SVLEN=-594
chr1 7569891 hydra103 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,183;CIEND=0,156;END=7571532;SVLEN=-1642
chr1 9595220 hydra105 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,419;CIEND=0,278;END=9597644;SVLEN=-2425
chr1 10482252 hydra106 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,255;CIEND=0,287;END=10483768;SVLEN=-1517
chr1 11051257 hydra441 C <DUP:TANDEM> . PASS SVTYPE=DUP;IMPRECISE;CIPOS=0,271;CIEND=0,308;END=11054309;SVLEN=3053
chr1 12947432 hydra107 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,230;CIEND=0,281;END=12948048;SVLEN=-617
chr1 13419049 hydra108 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,106;CIEND=0,264;END=13419600;SVLEN=-552
chr1 13639775 hydra109 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,195;CIEND=0,125;END=13640419;SVLEN=-645
chr1 16834408 hydra111 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,160;CIEND=0,213;END=16836334;SVLEN=-1927
chr1 17278158 hydra113 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,294;CIEND=0,147;END=17280522;SVLEN=-2365
chr1 20931686 hydra114 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,284;CIEND=0,205;END=20932702;SVLEN=-1017
chr1 26489572 hydra117 G <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,252;CIEND=0,271;END=26490140;SVLEN=-569
chr1 31574851 hydra119 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,240;CIEND=0,229;END=31577335;SVLEN=-2485
chr1 35101156 hydra120a T T[chr1:35112001[ . PASS SVTYPE=BND;MATEID=hydra120b;IMPRECISE;CIPOS=0,268
chr1 35112001 hydra120b T ]chr1:35101156]T . PASS SVTYPE=BND;MATEID=hydra120a;IMPRECISE;CIPOS=0,208
chr1 36733040 hydra1a C C]chr1:36734508] . PASS SVTYPE=BND;MATEID=hydra1b;IMPRECISE;CIPOS=0,209
chr1 36733606 hydra606a C [chr1:36734500[C . PASS SVTYPE=BND;MATEID=hydra606b;IMPRECISE;CIPOS=0,211
chr1 36734500 hydra606b T [chr1:36733606[T . PASS SVTYPE=BND;MATEID=hydra606a;IMPRECISE;CIPOS=0,142
chr1 36734508 hydra1b G G]chr1:36733040] . PASS SVTYPE=BND;MATEID=hydra1a;IMPRECISE;CIPOS=0,114
chr1 39970103 hydra444 T <DUP:TANDEM> . PASS SVTYPE=DUP;IMPRECISE;CIPOS=0,141;CIEND=0,163;END=39970323;SVLEN=221
chr1 39997894 hydra122 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,315;CIEND=0,205;END=40001270;SVLEN=-3377
chr1 43857216 hydra124 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,197;CIEND=0,201;END=43857833;SVLEN=-618
chr1 44059028 hydra2a A A]chr1:44059661] . PASS SVTYPE=BND;MATEID=hydra2b;IMPRECISE;CIPOS=0,273
chr1 44059661 hydra2b A A]chr1:44059028] . PASS SVTYPE=BND;MATEID=hydra2a;IMPRECISE;CIPOS=0,208
chr1 46207014 hydra126 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,268;CIEND=0,300;END=46207594;SVLEN=-581
chr1 53594954 hydra128 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,174;CIEND=0,320;END=53595597;SVLEN=-644
chr1 56887385 hydra131 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,302;CIEND=0,240;END=56888021;SVLEN=-637
chr1 58743638 hydra134 G <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,283;CIEND=0,323;END=58744819;SVLEN=-1182
chr1 60048406 hydra135 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,238;CIEND=0,224;END=60049656;SVLEN=-1251
chr1 62390360 hydra136 G <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,250;CIEND=0,274;END=62390942;SVLEN=-583
chr1 63151557 hydra137 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,251;CIEND=0,252;END=63152152;SVLEN=-596
chr1 64839517 hydra614a G [chr1:64854687[G . PASS SVTYPE=BND;MATEID=hydra614b;IMPRECISE;CIPOS=0,288
chr1 64854687 hydra614b T [chr1:64839517[T . PASS SVTYPE=BND;MATEID=hydra614a;IMPRECISE;CIPOS=0,235
chr1 66263660 hydra140 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,231;CIEND=0,200;END=66264269;SVLEN=-610
chr1 68007709 hydra142 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,269;CIEND=0,378;END=68008814;SVLEN=-1106
chr1 69806757 hydra143 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,208;CIEND=0,222;END=69807317;SVLEN=-561
chr1 69943440 hydra144 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,271;CIEND=0,264;END=69944050;SVLEN=-611
chr1 70912011 hydra446 C <DUP:TANDEM> . PASS SVTYPE=DUP;IMPRECISE;CIPOS=0,175;CIEND=0,133;END=70912279;SVLEN=269
chr1 71741080 hydra147 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,240;CIEND=0,293;END=71741648;SVLEN=-569
chr1 72766054 hydra318a T T[chr1:72811844[ . PASS SVTYPE=BND;MATEID=hydra318b;IMPRECISE;CIPOS=0,281
chr1 72811844 hydra318b G ]chr1:72766054]G . PASS SVTYPE=BND;MATEID=hydra318a;IMPRECISE;CIPOS=0,240
chr1 73753333 hydra148 G <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,269;CIEND=0,346;END=73753908;SVLEN=-576
chr1 73768566 hydra149 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,234;CIEND=0,378;END=73769141;SVLEN=-576
chr1 77100140 hydra151 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,191;CIEND=0,229;END=77100706;SVLEN=-567
chr1 79401261 hydra152 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,269;CIEND=0,290;END=79401827;SVLEN=-567
chr1 80113878 hydra153 G <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,303;CIEND=0,327;END=80114512;SVLEN=-635
chr1 81660344 hydra3 G <INV> . PASS SVTYPE=INV;IMPRECISE;CIPOS=189,238;CIEND=14,180;END=81661384;SVLEN=1041
chr1 81687538 hydra156 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,259;CIEND=0,283;END=81688151;SVLEN=-614
chr1 82963020 hydra157 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,164;CIEND=0,173;END=82963615;SVLEN=-596
chr1 83125748 hydra159 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,192;CIEND=0,258;END=83127572;SVLEN=-1825
chr1 84517670 hydra160 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,256;CIEND=0,232;END=84524633;SVLEN=-6964
chr1 86740758 hydra162 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,229;CIEND=0,217;END=86741691;SVLEN=-934
chr1 89475727 hydra163 G <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,209;CIEND=0,303;END=89478617;SVLEN=-2891
chr1 91913855 hydra164 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,248;CIEND=0,446;END=91914548;SVLEN=-694
chr1 92231830 hydra166 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,237;CIEND=0,273;END=92233334;SVLEN=-1505
chr1 94287984 hydra167 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,398;CIEND=0,262;END=94291248;SVLEN=-3265
chr1 95690178 hydra168 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,336;CIEND=0,248;END=95690821;SVLEN=-644
chr1 102892510 hydra617a C [chr1:102970097[C . PASS SVTYPE=BND;MATEID=hydra617b;IMPRECISE;CIPOS=0,164
chr1 102970097 hydra617b T [chr1:102892510[T . PASS SVTYPE=BND;MATEID=hydra617a;IMPRECISE;CIPOS=0,125
chr1 105218411 hydra172 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,235;CIEND=0,309;END=105219463;SVLEN=-1053
chr1 105866500 hydra174 G <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,243;CIEND=0,285;END=105867047;SVLEN=-548
chr1 108402754 hydra175 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,225;CIEND=0,228;END=108405488;SVLEN=-2735
chr1 108733113 hydra176 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,220;CIEND=0,187;END=108737264;SVLEN=-4152
chr1 109572968 hydra178 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,271;CIEND=0,253;END=109575172;SVLEN=-2205
chr1 110186899 hydra180 G <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,285;CIEND=0,285;END=110191392;SVLEN=-4494
chr1 114039524 hydra182 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,317;CIEND=0,181;END=114045927;SVLEN=-6404
chr1 116229156 hydra183 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,327;CIEND=0,245;END=116232835;SVLEN=-3680
chr1 142567236 hydra226 G <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,184;CIEND=0,145;END=142568515;SVLEN=-1280
chr1 142726285 hydra9a T T]chr1:142812429] . PASS SVTYPE=BND;MATEID=hydra9b;IMPRECISE;CIPOS=0,239
chr1 142727210 hydra227 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,222;CIEND=0,267;END=142728082;SVLEN=-873
chr1 142812429 hydra9b A A]chr1:142726285] . PASS SVTYPE=BND;MATEID=hydra9a;IMPRECISE;CIPOS=0,255
chr1 142946555 hydra5a T T]chr1:142954649] . PASS SVTYPE=BND;MATEID=hydra5b;IMPRECISE;CIPOS=0,206
chr1 142954649 hydra5b A A]chr1:142946555] . PASS SVTYPE=BND;MATEID=hydra5a;IMPRECISE;CIPOS=0,281
chr1 143222801 hydra229 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,164;CIEND=0,191;END=143223953;SVLEN=-1153
chr1 143343018 hydra231a G G[chr1:143359390[ . PASS SVTYPE=BND;MATEID=hydra231b;IMPRECISE;CIPOS=0,171
chr1 143359390 hydra231b C ]chr1:143343018]C . PASS SVTYPE=BND;MATEID=hydra231a;IMPRECISE;CIPOS=0,212
chr1 143541917 hydra232 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,187;CIEND=0,295;END=143543633;SVLEN=-1717
chr1 144893951 hydra233 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,208;CIEND=0,168;END=144895221;SVLEN=-1271
chr1 144954318 hydra234 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,291;CIEND=0,225;END=144955359;SVLEN=-1042
chr1 145026454 hydra235 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,285;CIEND=0,279;END=145027048;SVLEN=-595
chr1 145092633 hydra236 G <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,326;CIEND=0,389;END=145097079;SVLEN=-4447
chr1 145124549 hydra325a T T[chr1:145183396[ . PASS SVTYPE=BND;MATEID=hydra325b;IMPRECISE;CIPOS=0,248
chr1 145183396 hydra325b T ]chr1:145124549]T . PASS SVTYPE=BND;MATEID=hydra325a;IMPRECISE;CIPOS=0,290
chr1 145294684 hydra238 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,215;CIEND=0,176;END=145301190;SVLEN=-6507
chr1 147752835 hydra616a A [chr1:147775519[A . PASS SVTYPE=BND;MATEID=hydra616b;IMPRECISE;CIPOS=0,148
chr1 147775519 hydra616b T [chr1:147752835[T . PASS SVTYPE=BND;MATEID=hydra616a;IMPRECISE;CIPOS=0,299
chr1 148222635 hydra239a A A[chr1:148243292[ . PASS SVTYPE=BND;MATEID=hydra239b;IMPRECISE;CIPOS=0,305
chr1 148243292 hydra239b C ]chr1:148222635]C . PASS SVTYPE=BND;MATEID=hydra239a;IMPRECISE;CIPOS=0,300
chr1 149328727 hydra241 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,202;CIEND=0,234;END=149332775;SVLEN=-4049
chr1 150691177 hydra242 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,276;CIEND=0,286;END=150691781;SVLEN=-605
chr1 152246235 hydra615a A [chr1:152267416[A . PASS SVTYPE=BND;MATEID=hydra615b;IMPRECISE;CIPOS=0,206
chr1 152267416 hydra615b T [chr1:152246235[T . PASS SVTYPE=BND;MATEID=hydra615a;IMPRECISE;CIPOS=0,203
chr1 152555325 hydra247a A A[chr1:152587736[ . PASS SVTYPE=BND;MATEID=hydra247b;IMPRECISE;CIPOS=0,216
chr1 152587736 hydra247b G ]chr1:152555325]G . PASS SVTYPE=BND;MATEID=hydra247a;IMPRECISE;CIPOS=0,229
chr1 153043768 hydra490a A ]chr1:153066416]A . PASS SVTYPE=BND;MATEID=hydra490b;IMPRECISE;CIPOS=0,170
chr1 153066416 hydra490b T T[chr1:153043768[ . PASS SVTYPE=BND;MATEID=hydra490a;IMPRECISE;CIPOS=0,186
chr1 158858186 hydra329a T T[chr1:158924985[ . PASS SVTYPE=BND;MATEID=hydra329b;IMPRECISE;CIPOS=0,155
chr1 158867412 hydra250 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,117;CIEND=0,169;END=158870014;SVLEN=-2603
chr1 158924985 hydra329b A ]chr1:158858186]A . PASS SVTYPE=BND;MATEID=hydra329a;IMPRECISE;CIPOS=0,145
chr1 158961128 hydra251 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,210;CIEND=0,230;END=158966209;SVLEN=-5082
chr1 160934757 hydra253 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,272;CIEND=0,296;END=160935333;SVLEN=-577
chr1 162052379 hydra254 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,274;CIEND=0,385;END=162052980;SVLEN=-602
chr1 180292713 hydra258 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,254;CIEND=0,214;END=180293324;SVLEN=-612
chr1 180749480 hydra259 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,295;CIEND=0,263;END=180755387;SVLEN=-5908
chr1 181043899 hydra610a C [chr1:181044117[C . PASS SVTYPE=BND;MATEID=hydra610b;IMPRECISE;CIPOS=0,171
chr1 181044117 hydra610b A [chr1:181043899[A . PASS SVTYPE=BND;MATEID=hydra610a;IMPRECISE;CIPOS=0,267
chr1 184936807 hydra261 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,125;CIEND=0,204;END=184937355;SVLEN=-549
chr1 185009475 hydra262 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,337;CIEND=0,243;END=185011738;SVLEN=-2264
chr1 185372856 hydra263 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,245;CIEND=0,232;END=185373440;SVLEN=-585
chr1 187464546 hydra6a T T]chr1:187466517] . PASS SVTYPE=BND;MATEID=hydra6b;IMPRECISE;CIPOS=0,289
chr1 187466485 hydra611a A [chr1:187466747[A . PASS SVTYPE=BND;MATEID=hydra611b;IMPRECISE;CIPOS=0,178
chr1 187466517 hydra6b A A]chr1:187464546] . PASS SVTYPE=BND;MATEID=hydra6a;IMPRECISE;CIPOS=0,207
chr1 187466747 hydra611b T [chr1:187466485[T . PASS SVTYPE=BND;MATEID=hydra611a;IMPRECISE;CIPOS=0,194
chr1 187715822 hydra264 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,290;CIEND=0,275;END=187722535;SVLEN=-6714
chr1 188539127 hydra265 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,326;CIEND=0,217;END=188540225;SVLEN=-1099
chr1 189704248 hydra334a G G[chr1:189783351[ . PASS SVTYPE=BND;MATEID=hydra334b;IMPRECISE;CIPOS=0,262
chr1 189783351 hydra334b T ]chr1:189704248]T . PASS SVTYPE=BND;MATEID=hydra334a;IMPRECISE;CIPOS=0,252
chr1 197756804 hydra7 A <INV> . PASS SVTYPE=INV;IMPRECISE;CIPOS=301,226;CIEND=263,7;END=197757996;SVLEN=1193
chr1 205178265 hydra8a T T]chr1:205178637] . PASS SVTYPE=BND;MATEID=hydra8b;IMPRECISE;CIPOS=0,288
chr1 205178637 hydra8b A A]chr1:205178265] . PASS SVTYPE=BND;MATEID=hydra8a;IMPRECISE;CIPOS=0,140
chr1 207292123 hydra271 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,239;CIEND=0,175;END=207293189;SVLEN=-1067
chr1 207541690 hydra272 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,341;CIEND=0,340;END=207545567;SVLEN=-3878
chr1 211964890 hydra273 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,173;CIEND=0,216;END=211965839;SVLEN=-950
chr1 212470861 hydra274 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,299;CIEND=0,281;END=212472617;SVLEN=-1757
chr1 214769868 hydra275 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,206;CIEND=0,208;END=214770468;SVLEN=-601
chr1 217753826 hydra277 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,269;CIEND=0,293;END=217754392;SVLEN=-567
chr1 222373893 hydra282 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,250;CIEND=0,194;END=222380550;SVLEN=-6658
chr1 224392557 hydra284 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,261;CIEND=0,274;END=224393129;SVLEN=-573
chr1 225344165 hydra285 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,229;CIEND=0,217;END=225344725;SVLEN=-561
chr1 228023782 hydra287 G <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,324;CIEND=0,259;END=228024406;SVLEN=-625
chr1 229373140 hydra290 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,258;CIEND=0,193;END=229373782;SVLEN=-643
chr1 229812306 hydra291 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,236;CIEND=0,271;END=229820841;SVLEN=-8536
chr1 232775257 hydra294 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,273;CIEND=0,255;END=232775849;SVLEN=-593
chr1 233961984 hydra296 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,212;CIEND=0,192;END=233963591;SVLEN=-1608
chr1 234318405 hydra297 G <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,240;CIEND=0,206;END=234319744;SVLEN=-1340
chr1 234586930 hydra298 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,268;CIEND=0,236;END=234587520;SVLEN=-591
chr1 236054356 hydra300 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,226;CIEND=0,299;END=236054941;SVLEN=-586
chr1 236594635 hydra303 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,296;CIEND=0,253;END=236595267;SVLEN=-633
chr1 240640959 hydra304 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,272;CIEND=0,267;END=240641583;SVLEN=-625
chr1 241149570 hydra305 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,302;CIEND=0,262;END=241150178;SVLEN=-609
chr1 241360225 hydra306 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,317;CIEND=0,318;END=241360862;SVLEN=-638
chr1 241585932 hydra307 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,236;CIEND=0,265;END=241586491;SVLEN=-560
chr1 243782473 hydra309 A <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,254;CIEND=0,172;END=243783843;SVLEN=-1371
chr1 247027629 hydra314 T <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,208;CIEND=0,215;END=247028198;SVLEN=-570
chr1 247850181 hydra315 C <DEL> . PASS SVTYPE=DEL;IMPRECISE;CIPOS=0,293;CIEND=0,301;END=247856502;SVLEN=-6322
  1. Characterize the fragment size distribution.

We must first characterize the fragment size distribution before we can decide what a discordant alignment is::

$ pwd
/home/ubuntu/CourseData/HT_data/CEU_trio

$ samtools view bam/NA12878_chr1.bam | \
      pairend_distro.pl -rl 101 -X 6 -N 100000 -o discordants/NA12878.hist
# mean:292.582794172058  stdev:58.4585893394333

$ samtools view bam/NA12891_chr1.bam | \
      pairend_distro.pl -rl 101 -X 6 -N 100000 -o discordants/NA12891.hist
# mean:294.551264487355  stdev:60.6832604136104

$ samtools view bam/NA12892_chr1.bam | \
      pairend_distro.pl -rl 101 -X 6 -N 100000 -o discordants/NA12892.hist
# mean:292.632233677663  stdev:59.5918212222082
  1. Sort the BAM files by read name.

Before we can extract alignments, we must sort the BAM files by query name so that the pairDiscordants.py script has the alignments for each end of each pair grouped together.::

$ cd /home/ubuntu/CourseData/HT_data/CEU_trio/bam

$ samtools sort -n -@ 4 NA12878_chr1.bam NA12878_chr1.querysort 

$ samtools sort -n -@ 4 NA12891_chr1.bam NA12891_chr1.querysort 

$ samtools sort -n -@ 4 NA12892_chr1.bam NA12892_chr1.querysort 
  1. Extract discordant alignments (~25 minutes).

Now that we understand the shape of each sample's fragment size distribution, we can extract discordant alignments based on the mean and standard deviation of the distributions. We will call a paired-end alignment "too big" (i.e., suggests a deletion) if it is >= 6 standard deviations from the mean. The -z parameter is the maximum +/- alignment that will be considered "concordant". Each number reflects the mean + 6*sd for each sample::

$ cd /home/ubuntu/CourseData/HT_data/CEU_trio/

$ bedtools bamtobed -i bam/NA12878_chr1.querysort.bam -tag NM | \
      pairDiscordants.py -i stdin -m hydra -z 643 \
      > discordants/NA12878.discordant.bedpe
      
$ bedtools bamtobed -i bam/NA12891_chr1.querysort.bam -tag NM | \
      pairDiscordants.py -i stdin -m hydra -z 659 \
      > discordants/NA12891.discordant.bedpe
      
$ bedtools bamtobed -i bam/NA12892_chr1.querysort.bam -tag NM | \
      pairDiscordants.py -i stdin -m hydra -z 650 \
      > discordants/NA12892.discordant.bedpe
  1. Run HYDRA.

We can now run HYDRA. The -ms 4 option states that there must be at least 4 discordant alignments supporting the putative event before it will be called.::

$ cd /home/ubuntu/CourseData/HT_data/CEU_trio/

$ hydra -in discordants/NA12878.discordant.bedpe -out hydra/NA12878.hydra.svs -ms 4 -mld 348 -mno 989
  1. Remove artifacts.

Eliminate variants at centromere, gaps::

$ bedtools pairtobed -a hydra/NA12878.hydra.svs.final -b annotations/centromeres.hg19.5Mbslop.bed -type neither | \
      bedtools pairtobed -a - -b annotations/gaps.hg19.bed -type neither | \
          bedtools pairtobed -a - -b annotations/simplerepeats.hg19.bed -type neither | \
              bedtools pairtobed -a - -b annotations/microsatellites.hg19.bed -type neither | \
                  awk '$6-$2 < 100000' \
> hydra/NA12878.hydra.svs.final.filtered
  1. Basic analyses.

What fraction of the SVs overlap exons?::

What fraction of the deletions were observed by the 1000 Genomes Project? What is your expectation?::

What is the size distribution of the deletions?::

  1. Convert the HYDRA calls to VCF.

Viewers such as IGV and Savant do not support the BEDPE format that Hydra produces. Therefore we must use a script to convert the BEDPE output to VCF format::

# convert BEDPE to VCF
$ hydra_to_vcf.py hydra/NA12878.hydra.svs.final.filtered annotations/hg19.2bit

# sort the VCF by chromosome and start position
$ (grep ^# hydra/NA12878.hydra.svs.final.vcf; grep -v ^# hydra/NA12878.hydra.svs.final.vcf | sort -k1,1 -k 2,2n) \
  > hydra/NA12878.hydra.svs.final.sorted.vcf
  
# bgzip and tabix for IGV.
$ bgzip hydra/NA12878.hydra.svs.final.sorted.vcf
$ tabix -p vcf hydra/NA12878.hydra.svs.final.sorted.vcf.gz
  1. Visualize the SVs along with the BAM alignments using IGV.

First, we need to convert the SV calls from BEDPE format to BED12 format so that IGV can display them.::

$ bedpeToBed12.py -i hydra/NA12878.hydra.svs.final.filtered -n NA12878_SVS > hydra/NA12878.hydra.svs.final.filtered.bed

TODO

  1. Script to convert Hydra to VCF for Marc Fiume (Use Python script from @chapmanb -- requires bx-python)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment