Skip to content

Instantly share code, notes, and snippets.

View drio's full-sized avatar
🐢
I don't know

David Rio drio

🐢
I don't know
View GitHub Profile
fprintf(stderr, "DRD>> name:", a->readName);
fprintf(stderr, "DRD>> length: %d", a->readNameLength);
fprintf(stderr, "DRD>> space: %d", a->space);
fprintf(stderr, "DRD>> %d", a->numEnds);
$ cat go.sh | ruby -pe 'gsub!(%r{/stornext/snfs1/next-gen/solid/hgsc.solid.pipeline/hgsc.bfast.pipe}, "/stornext/snfs1/next-gen/drio-scratch/working.copies/hgsc.bfast.pipe")' > go2.sh
@drio
drio / lsf script
Created May 13, 2010 21:09
lsf script mode
$ cat test.lsf
#BSUB-o output.txt
#BSUB-e error.txt
#BSUB-q normal
touch ./great.txt
@drio
drio / gist:402918
Created May 16, 2010 15:03
dnaa patch
Subject: [PATCH] There may be cases in PE/MP data where the second read is not available.
We have to consider those reads (singletons) as read1 not read2.
---
dqc/dqc_postalignqc.c | 12 +++++++++++-
1 files changed, 11 insertions(+), 1 deletions(-)
diff --git a/dqc/dqc_postalignqc.c b/dqc/dqc_postalignqc.c
index d3a4881..250b4d1 100644
--- a/dqc/dqc_postalignqc.c
DRD>> matching one end: RN : T30100230100230100230100230100101002301002301000000
DRD>> matching one end: #Entries: 1
DRD>> i: 0 referencePositions[ctr]: -1923810816 m->positions[i]: 80
DRD>> matching one end: RN : T3100230100230100230100023
DRD>> matching one end: #Entries: 27
DRD>> i: 0 referencePositions[ctr]: -1923810816 m->positions[i]: -1272874568
bfast: Align.c:235: AlignRGMatchesOneEnd: Assertion `readStartInsertionLengths[ctr] + readEndInsertionLengths[ctr] <= readLength' failed.
./run_bfast.sh: line 48: 16959 Aborted (core dumped) $bbin localalign -U $space -t -f $ref -n1 -m $seed.bmf > $seed.baf
@drio
drio / gist:405196
Created May 18, 2010 16:32
bfast script
$ cat run_bfast.sh
#!/bin/bash
#
set -e
#set -x
dist=`pwd`
#fq="$dist/reads/ecoli.reads.fastq"
fq="$dist/reads/reads.problem_zlib.fastq"
ref_h="/stornext/snfs4/next-gen/solid/bf.references/h/hsap.36.1.hg18/hsap_36.1_hg18.fa"
BOOM!: read name: 429_1207_1471
[bns_coor_pac2real] bug! Coordinate is longer than sequence (4294967294>=3080436051). Abort!
./run_bfast.sh: line 42: 27646 Aborted (core dumped) $bbin bwaaln -c -t8 $ref $fq > $seed.bmf
[bwa_aln_core] write to the disk...
>> read name: 429_1207_1471 (p->aln[j].a= 0)
>> bwt[1]->seq_len: 3080436051
>> bwt_sa(bwt[1]): 3080436002
>> p->len: 51
[bns_coor_pac2real] bug! Coordinate is longer than sequence (4294967294>=3080436051). Abort!
For that step (match), the software first reads into memory a binary version of the reference
genome:
/stornext/snfs4/next-gen/solid/bf.references/h/hsap.36.1.hg18/hsap_36.1_hg18.fa.nt.brg
Then it splits the input data (reads from stornext) into 8 tmp files (/space1/tmp).
Then per each of the indexes (13G files located in
/stornext/snfs4/next-gen/solid/bf.references/h/hsap.36.1.hg18/hsap_36.1_hg18.fa.cs.*.bif)
loads one a time and spawns 8 threads each processing the data from the tmp files (8 files).
#!/usr/bin/env ruby19
#
# Total time loading the reference genome: 0 hour, 3 minutes and 2 seconds.
# Total time loading and deleting indexes: 6 hour, 28 minutes and 36 seconds.
# Total time searching indexes: 1 hour, 30 minutes and 2 seconds.
# Total time merging and writing output: 0 hour, 12 minutes and 25 seconds.
# Total time elapsed: 8 hours, 20 minutes and 12 seconds.
#
require 'find'