Created
May 25, 2016 05:31
-
-
Save sdwfrost/14c4579e7909d8b391769c1961abadea to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "First make two files; one with the name of the paired end read files, the other with a list of kmer sizes." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 6, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "C200012.fq.uniq.trim.good_1.dc_clean.fq.sync\r\n", | |
| "C200012.fq.uniq.trim.good_2.dc_clean.fq.sync\r\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!cat C200012.listoffiles.txt" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "32\r\n", | |
| "16\r\n", | |
| "8\r\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!cat listofkmers.txt" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Make kmers using multi-dsk. This does not compile with clang on OSX, so I use g++-5 from Homebrew and add ```-DOSX``` to the CFLAGS in the makefile." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 7, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "File C200012.listoffiles.txt starts with character \"C\", hence is interpreted as a list of file names\n", | |
| "Reading 2 read files\n", | |
| "Reestimating partitions sizes and number of passes based on 2-mers \n", | |
| "no of partitions before 6 and after 5 passes 2 \n", | |
| "Sequentially counting ~1040 MB of kmers with 3 partition(s) and 2 passes using 1 thread(s), ~2048 MB of memory and ~1024 MB of disk space\n", | |
| "| First step: Converting input file into Binary format |\n", | |
| "[----------------------------------------------------------------------------------------------------]\n", | |
| "| Counting kmers |\n", | |
| "Storing k-mers in partition files between 0 and 2 \n", | |
| "31 % elapsed: 0 min 21 sec estimated remaining: 0 min 46 sec Storing k-mers in partition files between 3 and 5 \n", | |
| "100 % elapsed: 0 min 31 sec estimated remaining: 0 min 0 sec \n", | |
| "\n", | |
| "Saved 397982 solid kmers\n", | |
| "-------------------Counted kmers time Wallclock 31.6461 s\n", | |
| "\n", | |
| "------------------ Counted kmers and kept those with abundance >=1, \n", | |
| "-------------------Total time Wallclock 33.3587 s\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!dsk-1.5655/multi-dsk C200012.listoffiles.txt listofkmers.txt -m 2048 -d 1024" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "This generates files ```C200012.listoffiles.solid_kmers_binary.8``` etc. De-compress the output of multi-dsk." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 8, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "!python dsk-1.5655/parse_results.py C200012.listoffiles.solid_kmers_binary.32 > C200012.32" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Generate FASTQ file of all reads" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 15, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "C200012.fasta.uniq.trim.good_1.dc_clean.fq.sync\n", | |
| "Fasta file written in C200012.fasta.uniq.trim.good_1.dc_clean.fq.sync\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!perl fastqtofasta.pl C200012.fq.uniq.trim.good_1.dc_clean.fq.sync" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 16, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "C200012.fasta.uniq.trim.good_2.dc_clean.fq.sync\n", | |
| "Fasta file written in C200012.fasta.uniq.trim.good_2.dc_clean.fq.sync\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!perl fastqtofasta.pl C200012.fq.uniq.trim.good_2.dc_clean.fq.sync" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 17, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "!cat C200012.fasta.uniq.trim.good_1.dc_clean.fq.sync C200012.fasta.uniq.trim.good_2.dc_clean.fq.sync > C200012.fasta" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Generate the De Bruijn graph. Output is a graph file (```*.dbg```) that contains pairs of kmers in the graph." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 19, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Could not remove C200012.dbg \n", | |
| "Counting kmer counts from the kmer file\n", | |
| "Kmer counts loaded\n", | |
| "Creating graph \n", | |
| "processed 1000 reads in 1464073940\n", | |
| "processed 2000 reads in 1464073942\n", | |
| "processed 3000 reads in 1464073943\n", | |
| "processed 4000 reads in 1464073944\n", | |
| "processed 5000 reads in 1464073945\n", | |
| "processed 6000 reads in 1464073946\n", | |
| "processed 7000 reads in 1464073947\n", | |
| "processed 8000 reads in 1464073949\n", | |
| "processed 9000 reads in 1464073950\n", | |
| "processed 10000 reads in 1464073951\n", | |
| "processed 11000 reads in 1464073952\n", | |
| "processed 12000 reads in 1464073954\n", | |
| "processed 13000 reads in 1464073956\n", | |
| "processed 14000 reads in 1464073957\n", | |
| "processed 15000 reads in 1464073958\n", | |
| "processed 16000 reads in 1464073959\n", | |
| "processed 17000 reads in 1464073961\n", | |
| "processed 18000 reads in 1464073962\n", | |
| "processed 19000 reads in 1464073963\n", | |
| "processed 20000 reads in 1464073965\n", | |
| "processed 21000 reads in 1464073966\n", | |
| "processed 22000 reads in 1464073967\n", | |
| "processed 23000 reads in 1464073969\n", | |
| "processed 24000 reads in 1464073970\n", | |
| "processed 25000 reads in 1464073972\n", | |
| "processed 26000 reads in 1464073973\n", | |
| "processed 27000 reads in 1464073974\n", | |
| "processed 28000 reads in 1464073975\n", | |
| "processed 29000 reads in 1464073977\n", | |
| "processed 30000 reads in 1464073978\n", | |
| "processed 31000 reads in 1464073980\n", | |
| "processed 32000 reads in 1464073982\n", | |
| "processed 33000 reads in 1464073983\n", | |
| "processed 34000 reads in 1464073984\n", | |
| "processed 35000 reads in 1464073986\n", | |
| "processed 36000 reads in 1464073987\n", | |
| "processed 37000 reads in 1464073989\n", | |
| "processed 38000 reads in 1464073990\n", | |
| "processed 39000 reads in 1464073991\n", | |
| "processed 40000 reads in 1464073992\n", | |
| "processed 41000 reads in 1464073994\n", | |
| "processed 42000 reads in 1464073995\n", | |
| "processed 43000 reads in 1464073996\n", | |
| "processed 44000 reads in 1464073998\n", | |
| "processed 45000 reads in 1464073999\n", | |
| "processed 46000 reads in 1464074000\n", | |
| "processed 47000 reads in 1464074002\n", | |
| "processed 48000 reads in 1464074003\n", | |
| "processed 49000 reads in 1464074004\n", | |
| "processed 50000 reads in 1464074005\n", | |
| "processed 51000 reads in 1464074007\n", | |
| "processed 52000 reads in 1464074008\n", | |
| "processed 53000 reads in 1464074009\n", | |
| "processed 54000 reads in 1464074011\n", | |
| "processed 55000 reads in 1464074012\n", | |
| "processed 56000 reads in 1464074013\n", | |
| "processed 57000 reads in 1464074014\n", | |
| "processed 58000 reads in 1464074016\n", | |
| "processed 59000 reads in 1464074017\n", | |
| "processed 60000 reads in 1464074018\n", | |
| "processed 61000 reads in 1464074019\n", | |
| "processed 62000 reads in 1464074021\n", | |
| "processed 63000 reads in 1464074022\n", | |
| "processed 64000 reads in 1464074023\n", | |
| "processed 65000 reads in 1464074024\n", | |
| "processed 66000 reads in 1464074026\n", | |
| "processed 67000 reads in 1464074027\n", | |
| "processed 68000 reads in 1464074028\n", | |
| "processed 69000 reads in 1464074029\n", | |
| "processed 70000 reads in 1464074030\n", | |
| "processed 71000 reads in 1464074032\n", | |
| "processed 72000 reads in 1464074033\n", | |
| "processed 73000 reads in 1464074034\n", | |
| "processed 74000 reads in 1464074035\n", | |
| "processed 75000 reads in 1464074037\n", | |
| "processed 76000 reads in 1464074038\n", | |
| "processed 77000 reads in 1464074039\n", | |
| "processed 78000 reads in 1464074040\n", | |
| "processed 79000 reads in 1464074042\n", | |
| "processed 80000 reads in 1464074043\n", | |
| "processed 81000 reads in 1464074044\n", | |
| "processed 82000 reads in 1464074045\n", | |
| "processed 83000 reads in 1464074046\n", | |
| "processed 84000 reads in 1464074047\n", | |
| "processed 85000 reads in 1464074049\n", | |
| "processed 86000 reads in 1464074050\n", | |
| "processed 87000 reads in 1464074051\n", | |
| "processed 88000 reads in 1464074052\n", | |
| "processed 89000 reads in 1464074053\n", | |
| "processed 90000 reads in 1464074055\n", | |
| "processed 91000 reads in 1464074056\n", | |
| "processed 92000 reads in 1464074057\n", | |
| "processed 93000 reads in 1464074058\n", | |
| "processed 94000 reads in 1464074060\n", | |
| "processed 95000 reads in 1464074061\n", | |
| "processed 96000 reads in 1464074062\n", | |
| "processed 97000 reads in 1464074063\n", | |
| "processed 98000 reads in 1464074065\n", | |
| "processed 99000 reads in 1464074066\n", | |
| "processed 100000 reads in 1464074067\n", | |
| "processed 101000 reads in 1464074068\n", | |
| "processed 102000 reads in 1464074069\n", | |
| "processed 103000 reads in 1464074070\n", | |
| "processed 104000 reads in 1464074072\n", | |
| "processed 105000 reads in 1464074073\n", | |
| "processed 106000 reads in 1464074074\n", | |
| "processed 107000 reads in 1464074075\n", | |
| "processed 108000 reads in 1464074076\n", | |
| "processed 109000 reads in 1464074078\n", | |
| "processed 110000 reads in 1464074079\n", | |
| "processed 111000 reads in 1464074080\n", | |
| "processed 112000 reads in 1464074081\n", | |
| "processed 113000 reads in 1464074083\n", | |
| "processed 114000 reads in 1464074084\n", | |
| "processed 115000 reads in 1464074085\n", | |
| "processed 116000 reads in 1464074086\n", | |
| "processed 117000 reads in 1464074087\n", | |
| "processed 118000 reads in 1464074089\n", | |
| "processed 119000 reads in 1464074090\n", | |
| "processed 120000 reads in 1464074091\n", | |
| "processed 121000 reads in 1464074092\n", | |
| "processed 122000 reads in 1464074094\n", | |
| "processed 123000 reads in 1464074095\n", | |
| "processed 124000 reads in 1464074096\n", | |
| "processed 125000 reads in 1464074097\n", | |
| "processed 126000 reads in 1464074098\n", | |
| "processed 127000 reads in 1464074100\n", | |
| "processed 128000 reads in 1464074101\n", | |
| "processed 129000 reads in 1464074102\n", | |
| "processed 130000 reads in 1464074103\n", | |
| "processed 131000 reads in 1464074104\n", | |
| "processed 132000 reads in 1464074106\n", | |
| "processed 133000 reads in 1464074107\n", | |
| "processed 134000 reads in 1464074108\n", | |
| "processed 135000 reads in 1464074109\n", | |
| "processed 136000 reads in 1464074110\n", | |
| "processed 137000 reads in 1464074112\n", | |
| "processed 138000 reads in 1464074113\n", | |
| "processed 139000 reads in 1464074114\n", | |
| "processed 140000 reads in 1464074115\n", | |
| "processed 141000 reads in 1464074116\n", | |
| "processed 142000 reads in 1464074118\n", | |
| "processed 143000 reads in 1464074119\n", | |
| "processed 144000 reads in 1464074120\n", | |
| "processed 145000 reads in 1464074121\n", | |
| "processed 146000 reads in 1464074122\n", | |
| "processed 147000 reads in 1464074123\n", | |
| "processed 148000 reads in 1464074125\n", | |
| "processed 149000 reads in 1464074126\n", | |
| "processed 150000 reads in 1464074127\n", | |
| "processed 151000 reads in 1464074128\n", | |
| "processed 152000 reads in 1464074129\n", | |
| "processed 153000 reads in 1464074131\n", | |
| "processed 154000 reads in 1464074132\n", | |
| "processed 155000 reads in 1464074133\n", | |
| "processed 156000 reads in 1464074134\n", | |
| "processed 157000 reads in 1464074135\n", | |
| "processed 158000 reads in 1464074137\n", | |
| "processed 159000 reads in 1464074138\n", | |
| "processed 160000 reads in 1464074139\n", | |
| "processed 161000 reads in 1464074140\n", | |
| "processed 162000 reads in 1464074141\n", | |
| "processed 163000 reads in 1464074143\n", | |
| "processed 164000 reads in 1464074144\n", | |
| "processed 165000 reads in 1464074145\n", | |
| "processed 166000 reads in 1464074146\n", | |
| "processed 167000 reads in 1464074147\n", | |
| "processed 168000 reads in 1464074149\n", | |
| "processed 169000 reads in 1464074150\n", | |
| "processed 170000 reads in 1464074151\n", | |
| "processed 171000 reads in 1464074152\n", | |
| "processed 172000 reads in 1464074153\n", | |
| "processed 173000 reads in 1464074155\n", | |
| "processed 174000 reads in 1464074156\n", | |
| "processed 175000 reads in 1464074157\n", | |
| "processed 176000 reads in 1464074158\n", | |
| "processed 177000 reads in 1464074159\n", | |
| "processed 178000 reads in 1464074161\n", | |
| "processed 179000 reads in 1464074162\n", | |
| "processed 180000 reads in 1464074163\n", | |
| "processed 181000 reads in 1464074164\n", | |
| "processed 182000 reads in 1464074165\n", | |
| "processed 183000 reads in 1464074167\n", | |
| "processed 184000 reads in 1464074168\n", | |
| "processed 185000 reads in 1464074169\n", | |
| "processed 186000 reads in 1464074170\n", | |
| "processed 187000 reads in 1464074171\n", | |
| "processed 188000 reads in 1464074172\n", | |
| "processed 189000 reads in 1464074174\n", | |
| "processed 190000 reads in 1464074175\n", | |
| "processed 191000 reads in 1464074176\n", | |
| "processed 192000 reads in 1464074177\n", | |
| "processed 193000 reads in 1464074178\n", | |
| "processed 194000 reads in 1464074179\n", | |
| "processed 195000 reads in 1464074181\n", | |
| "processed 196000 reads in 1464074182\n", | |
| "processed 197000 reads in 1464074183\n", | |
| "processed 198000 reads in 1464074184\n", | |
| "processed 199000 reads in 1464074186\n", | |
| "processed 200000 reads in 1464074187\n", | |
| "processed 201000 reads in 1464074188\n", | |
| "processed 202000 reads in 1464074189\n", | |
| "processed 203000 reads in 1464074190\n", | |
| "processed 204000 reads in 1464074192\n", | |
| "processed 205000 reads in 1464074193\n", | |
| "processed 206000 reads in 1464074194\n", | |
| "processed 207000 reads in 1464074195\n", | |
| "processed 208000 reads in 1464074196\n", | |
| "processed 209000 reads in 1464074198\n", | |
| "processed 210000 reads in 1464074199\n", | |
| "processed 211000 reads in 1464074200\n", | |
| "processed 212000 reads in 1464074201\n", | |
| "processed 213000 reads in 1464074202\n", | |
| "processed 214000 reads in 1464074204\n", | |
| "processed 215000 reads in 1464074205\n", | |
| "processed 216000 reads in 1464074206\n", | |
| "processed 217000 reads in 1464074207\n", | |
| "processed 218000 reads in 1464074208\n", | |
| "processed 219000 reads in 1464074210\n", | |
| "processed 220000 reads in 1464074211\n", | |
| "processed 221000 reads in 1464074212\n", | |
| "processed 222000 reads in 1464074213\n", | |
| "processed 223000 reads in 1464074214\n", | |
| "processed 224000 reads in 1464074216\n", | |
| "processed 225000 reads in 1464074217\n", | |
| "processed 226000 reads in 1464074218\n", | |
| "processed 227000 reads in 1464074219\n", | |
| "processed 228000 reads in 1464074220\n", | |
| "processed 229000 reads in 1464074221\n", | |
| "processed 230000 reads in 1464074223\n", | |
| "processed 231000 reads in 1464074224\n", | |
| "processed 232000 reads in 1464074225\n", | |
| "processed 233000 reads in 1464074226\n", | |
| "processed 234000 reads in 1464074227\n", | |
| "processed 235000 reads in 1464074228\n", | |
| "processed 236000 reads in 1464074229\n", | |
| "processed 237000 reads in 1464074231\n", | |
| "processed 238000 reads in 1464074232\n", | |
| "processed 239000 reads in 1464074233\n", | |
| "processed 240000 reads in 1464074234\n", | |
| "processed 241000 reads in 1464074235\n", | |
| "processed 242000 reads in 1464074236\n", | |
| "processed 243000 reads in 1464074238\n", | |
| "processed 244000 reads in 1464074239\n", | |
| "processed 245000 reads in 1464074240\n", | |
| "processed 246000 reads in 1464074241\n", | |
| "processed 247000 reads in 1464074242\n", | |
| "processed 248000 reads in 1464074243\n", | |
| "processed 249000 reads in 1464074245\n", | |
| "processed 250000 reads in 1464074246\n", | |
| "processed 251000 reads in 1464074247\n", | |
| "processed 252000 reads in 1464074248\n", | |
| "processed 253000 reads in 1464074249\n", | |
| "processed 254000 reads in 1464074251\n", | |
| "processed 255000 reads in 1464074252\n", | |
| "processed 256000 reads in 1464074253\n", | |
| "processed 257000 reads in 1464074254\n", | |
| "processed 258000 reads in 1464074255\n", | |
| "processed 259000 reads in 1464074257\n", | |
| "processed 260000 reads in 1464074258\n", | |
| "processed 261000 reads in 1464074259\n", | |
| "processed 262000 reads in 1464074260\n", | |
| "processed 263000 reads in 1464074261\n", | |
| "processed 264000 reads in 1464074263\n", | |
| "processed 265000 reads in 1464074264\n", | |
| "processed 266000 reads in 1464074265\n", | |
| "processed 267000 reads in 1464074266\n", | |
| "processed 268000 reads in 1464074267\n", | |
| "processed 269000 reads in 1464074268\n", | |
| "processed 270000 reads in 1464074270\n", | |
| "processed 271000 reads in 1464074271\n", | |
| "processed 272000 reads in 1464074272\n", | |
| "processed 273000 reads in 1464074273\n", | |
| "processed 274000 reads in 1464074274\n", | |
| "processed 275000 reads in 1464074276\n", | |
| "processed 276000 reads in 1464074277\n", | |
| "processed 277000 reads in 1464074278\n", | |
| "processed 278000 reads in 1464074279\n", | |
| "processed 279000 reads in 1464074280\n", | |
| "processed 280000 reads in 1464074281\n", | |
| "processed 281000 reads in 1464074283\n", | |
| "processed 282000 reads in 1464074284\n", | |
| "processed 283000 reads in 1464074285\n", | |
| "processed 284000 reads in 1464074286\n", | |
| "processed 285000 reads in 1464074287\n", | |
| "processed 286000 reads in 1464074288\n", | |
| "processed 287000 reads in 1464074290\n", | |
| "processed 288000 reads in 1464074291\n", | |
| "processed 289000 reads in 1464074292\n", | |
| "processed 290000 reads in 1464074293\n", | |
| "processed 291000 reads in 1464074294\n", | |
| "processed 292000 reads in 1464074295\n", | |
| "processed 293000 reads in 1464074296\n", | |
| "processed 294000 reads in 1464074298\n", | |
| "processed 295000 reads in 1464074299\n", | |
| "processed 296000 reads in 1464074300\n", | |
| "processed 297000 reads in 1464074301\n", | |
| "processed 298000 reads in 1464074302\n", | |
| "processed 299000 reads in 1464074304\n", | |
| "processed 300000 reads in 1464074305\n", | |
| "processed 301000 reads in 1464074306\n", | |
| "processed 302000 reads in 1464074307\n", | |
| "processed 303000 reads in 1464074309\n", | |
| "processed 304000 reads in 1464074310\n", | |
| "processed 305000 reads in 1464074311\n", | |
| "processed 306000 reads in 1464074312\n", | |
| "processed 307000 reads in 1464074313\n", | |
| "processed 308000 reads in 1464074314\n", | |
| "processed 309000 reads in 1464074316\n", | |
| "processed 310000 reads in 1464074317\n", | |
| "processed 311000 reads in 1464074318\n", | |
| "processed 312000 reads in 1464074319\n", | |
| "processed 313000 reads in 1464074320\n", | |
| "processed 314000 reads in 1464074322\n", | |
| "processed 315000 reads in 1464074323\n", | |
| "processed 316000 reads in 1464074324\n", | |
| "processed 317000 reads in 1464074325\n", | |
| "processed 318000 reads in 1464074326\n", | |
| "processed 319000 reads in 1464074327\n", | |
| "processed 320000 reads in 1464074329\n", | |
| "processed 321000 reads in 1464074330\n", | |
| "processed 322000 reads in 1464074331\n", | |
| "processed 323000 reads in 1464074332\n", | |
| "processed 324000 reads in 1464074334\n", | |
| "processed 325000 reads in 1464074335\n", | |
| "processed 326000 reads in 1464074336\n", | |
| "processed 327000 reads in 1464074337\n", | |
| "processed 328000 reads in 1464074338\n", | |
| "processed 329000 reads in 1464074339\n", | |
| "Graph done \n", | |
| "Number of vertices 8070 number of edges 5984\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!perl construct_graph.pl C200012.fasta C200012.32 10 C200012.dbg \"s\"" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Create the paired set using the paired reads. It takes as input the two paired files, file1.fastq and file2.fastq the k-mer counts file, file1.kvalue and a threshold for ignoring erroneous k-mers. Output is a paired set (```*.ps```) file." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 21, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "5880 is the total no. of keys in the kmerhash \n", | |
| "Bio::SeqIO::fastq=HASH(0x7fc4e92f19c8) Bio::SeqIO::fastq=HASH(0x7fc4e90ab2e8) files \n", | |
| "Processed 1000 in 1464076776 \n", | |
| "Processed 2000 in 1464076874 \n", | |
| "Processed 3000 in 1464076969 \n", | |
| "Processed 4000 in 1464077061 \n", | |
| "Processed 5000 in 1464077160 \n", | |
| "Processed 6000 in 1464077251 \n", | |
| "Processed 7000 in 1464077344 \n", | |
| "Processed 8000 in 1464077441 \n", | |
| "Processed 9000 in 1464077538 \n", | |
| "Processed 10000 in 1464077638 \n", | |
| "Processed 11000 in 1464077728 \n", | |
| "Processed 12000 in 1464077818 \n", | |
| "Processed 13000 in 1464077909 \n", | |
| "Processed 14000 in 1464078000 \n", | |
| "Processed 15000 in 1464078091 \n", | |
| "Processed 16000 in 1464078182 \n", | |
| "Processed 17000 in 1464078272 \n", | |
| "Processed 18000 in 1464078373 \n", | |
| "Processed 19000 in 1464078471 \n", | |
| "Processed 20000 in 1464078565 \n", | |
| "Processed 21000 in 1464078654 \n", | |
| "Processed 22000 in 1464078745 \n", | |
| "Processed 23000 in 1464078835 \n", | |
| "Processed 24000 in 1464078924 \n", | |
| "Processed 25000 in 1464079016 \n", | |
| "Processed 26000 in 1464079107 \n", | |
| "Processed 27000 in 1464079197 \n", | |
| "Processed 28000 in 1464079289 \n", | |
| "Processed 29000 in 1464079391 \n", | |
| "Processed 30000 in 1464079490 \n", | |
| "Processed 31000 in 1464079579 \n", | |
| "Processed 32000 in 1464079664 \n", | |
| "Processed 33000 in 1464079746 \n", | |
| "Processed 34000 in 1464079827 \n", | |
| "Processed 35000 in 1464079908 \n", | |
| "Processed 36000 in 1464079990 \n", | |
| "Processed 37000 in 1464080071 \n", | |
| "Processed 38000 in 1464080153 \n", | |
| "Processed 39000 in 1464080236 \n", | |
| "Processed 40000 in 1464080317 \n", | |
| "Processed 41000 in 1464080398 \n", | |
| "Processed 42000 in 1464080480 \n", | |
| "Processed 43000 in 1464080564 \n", | |
| "Processed 44000 in 1464080651 \n", | |
| "Processed 45000 in 1464080733 \n", | |
| "Processed 46000 in 1464080817 \n", | |
| "Processed 47000 in 1464080901 \n", | |
| "Processed 48000 in 1464080983 \n", | |
| "Processed 49000 in 1464081066 \n", | |
| "Processed 50000 in 1464081149 \n", | |
| "Processed 51000 in 1464081234 \n", | |
| "Processed 52000 in 1464081321 \n", | |
| "Processed 53000 in 1464081404 \n", | |
| "Processed 54000 in 1464081488 \n", | |
| "Processed 55000 in 1464081571 \n", | |
| "Processed 56000 in 1464081655 \n", | |
| "Processed 57000 in 1464081740 \n", | |
| "Processed 58000 in 1464081826 \n", | |
| "Processed 59000 in 1464081909 \n", | |
| "Processed 60000 in 1464081992 \n", | |
| "Processed 61000 in 1464082076 \n", | |
| "Processed 62000 in 1464082158 \n", | |
| "Processed 63000 in 1464082239 \n", | |
| "Processed 64000 in 1464082320 \n", | |
| "Processed 65000 in 1464082400 \n", | |
| "Processed 66000 in 1464082480 \n", | |
| "Processed 67000 in 1464082560 \n", | |
| "Processed 68000 in 1464082639 \n", | |
| "Processed 69000 in 1464082729 \n", | |
| "Processed 70000 in 1464082827 \n", | |
| "Processed 71000 in 1464082912 \n", | |
| "Processed 72000 in 1464082995 \n", | |
| "Processed 73000 in 1464083080 \n", | |
| "Processed 74000 in 1464083167 \n", | |
| "Processed 75000 in 1464083251 \n", | |
| "Processed 76000 in 1464083335 \n", | |
| "Processed 77000 in 1464083421 \n", | |
| "Processed 78000 in 1464083507 \n", | |
| "Processed 79000 in 1464083588 \n", | |
| "Processed 80000 in 1464083672 \n", | |
| "Processed 81000 in 1464083757 \n", | |
| "Processed 82000 in 1464083842 \n", | |
| "Processed 83000 in 1464083926 \n", | |
| "Processed 84000 in 1464084010 \n", | |
| "Processed 85000 in 1464084094 \n", | |
| "Processed 86000 in 1464084178 \n", | |
| "Processed 87000 in 1464084265 \n", | |
| "Processed 88000 in 1464084347 \n", | |
| "Processed 89000 in 1464084429 \n", | |
| "Processed 90000 in 1464084509 \n", | |
| "Processed 91000 in 1464084591 \n", | |
| "Processed 92000 in 1464084674 \n", | |
| "Processed 93000 in 1464084756 \n", | |
| "Processed 94000 in 1464084839 \n", | |
| "Processed 95000 in 1464084919 \n", | |
| "Processed 96000 in 1464085001 \n", | |
| "Processed 97000 in 1464085084 \n", | |
| "Processed 98000 in 1464085165 \n", | |
| "Processed 99000 in 1464085247 \n", | |
| "Processed 100000 in 1464085330 \n", | |
| "Processed 101000 in 1464085412 \n", | |
| "Processed 102000 in 1464085493 \n", | |
| "Processed 103000 in 1464085574 \n", | |
| "Processed 104000 in 1464085656 \n", | |
| "Processed 105000 in 1464085739 \n", | |
| "Processed 106000 in 1464085822 \n", | |
| "Processed 107000 in 1464085903 \n", | |
| "Processed 108000 in 1464085984 \n", | |
| "Processed 109000 in 1464086067 \n", | |
| "Processed 110000 in 1464086149 \n", | |
| "Processed 111000 in 1464086229 \n", | |
| "Processed 112000 in 1464086312 \n", | |
| "Processed 113000 in 1464086394 \n", | |
| "Processed 114000 in 1464086477 \n", | |
| "Processed 115000 in 1464086560 \n", | |
| "Processed 116000 in 1464086642 \n", | |
| "Processed 117000 in 1464086724 \n", | |
| "Processed 118000 in 1464086806 \n", | |
| "Processed 119000 in 1464086889 \n", | |
| "Processed 120000 in 1464086971 \n", | |
| "Processed 121000 in 1464087052 \n", | |
| "Processed 122000 in 1464087135 \n", | |
| "Processed 123000 in 1464087218 \n", | |
| "Processed 124000 in 1464087302 \n", | |
| "Processed 125000 in 1464087386 \n", | |
| "Processed 126000 in 1464087470 \n", | |
| "Processed 127000 in 1464087559 \n", | |
| "Processed 128000 in 1464087667 \n", | |
| "Processed 129000 in 1464087756 \n", | |
| "Processed 130000 in 1464087840 \n", | |
| "Processed 131000 in 1464087923 \n", | |
| "Processed 132000 in 1464088004 \n", | |
| "Processed 133000 in 1464088087 \n", | |
| "Processed 134000 in 1464088170 \n", | |
| "Processed 135000 in 1464088255 \n", | |
| "Processed 136000 in 1464088339 \n", | |
| "Processed 137000 in 1464088422 \n", | |
| "Processed 138000 in 1464088505 \n", | |
| "Processed 139000 in 1464088592 \n", | |
| "Processed 140000 in 1464088676 \n", | |
| "Processed 141000 in 1464088760 \n", | |
| "Processed 142000 in 1464088842 \n", | |
| "Processed 143000 in 1464088922 \n", | |
| "Processed 144000 in 1464089008 \n", | |
| "Processed 145000 in 1464089090 \n", | |
| "Processed 146000 in 1464089174 \n", | |
| "Processed 147000 in 1464089265 \n", | |
| "Processed 148000 in 1464089355 \n", | |
| "Processed 149000 in 1464089449 \n", | |
| "Processed 150000 in 1464089542 \n", | |
| "Processed 151000 in 1464089636 \n", | |
| "Processed 152000 in 1464089730 \n", | |
| "Processed 153000 in 1464089825 \n", | |
| "Processed 154000 in 1464089920 \n", | |
| "Processed 155000 in 1464090013 \n", | |
| "Processed 156000 in 1464090105 \n", | |
| "Processed 157000 in 1464090201 \n", | |
| "Processed 158000 in 1464090294 \n", | |
| "Processed 159000 in 1464090390 \n", | |
| "Processed 160000 in 1464090485 \n", | |
| "Processed 161000 in 1464090579 \n", | |
| "Processed 162000 in 1464090674 \n", | |
| "Processed 163000 in 1464090767 \n", | |
| "Processed 164000 in 1464090861 \n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!perl construct_paired_without_bloom.pl -file1 C200012.fq.uniq.trim.good_1.dc_clean.fq.sync -file2 C200012.fq.uniq.trim.good_2.dc_clean.fq.sync -paired -kmerfile C200012.32 -thresh 10 -wr C200012.ps" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Running the VIPRA algorithm takes inputs generated above and a parameter for the average insert size, threshold parameter and a value for M (factor) which decides the number of paths to generate per vertex." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 22, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "!perl dg_cover.pl -graph C200012.dbg -kmer C200012.32 -paired C200012.ps -fact 3 -thresh 10 -IS 200 > C200012.dgc" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 24, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "!perl process_dg.pl C200012.dgc > C200012.pathfas" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 25, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "!perl get_paths_dgcover.pl -f C200012.dgc -w C200012.paths" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "The below doesn't work" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 32, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Illegal division by zero at likelihood_singles_wrapper.pl line 361.\r\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!perl likelihood_singles_wrapper.pl -condgraph C200012.dgc -compset C200012.pathfas -pathsfile C200012.paths -back -slow -gl 7500 > C200012.mle" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.5.1" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 0 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @sdwfrost,
The files that are used for likelihood_singles_wrapper.pl are two temporary files that are generated by dg_cover.pl.
These files are named *.cond.graph for the -condgraph option,
and *comp.txt file for the -compset option. Maybe that is the reason why it is not working. If you don't mind sharing the files with me, I'd be happy to look at it.
Cheers