## Access date: 2017-01-04
$ svn log -q |grep 'z.li'
r87229 | z.li | 2014-03-08 12:13:28 +0800 (Sat, 08 Mar 2014)
r85532 | z.li | 2014-01-15 13:00:43 +0800 (Wed, 15 Jan 2014)
r85454 | z.li | 2014-01-12 13:57:16 +0800 (Sun, 12 Jan 2014)
r72578 | z.li | 2013-01-16 15:11:20 +0800 (Wed, 16 Jan 2013)
r72577 | z.li | 2013-01-16 13:43:05 +0800 (Wed, 16 Jan 2013)
r69358 | z.li | 2012-09-12 15:02:36 +0800 (Wed, 12 Sep 2012)
r68440 | z.li | 2012-08-14 15:56:16 +0800 (Tue, 14 Aug 2012)
r68439 | z.li | 2012-08-14 15:42:45 +0800 (Tue, 14 Aug 2012)
https://github.com/Bioconductor-mirror/CorMut/blob/release-3.4/R/seqFormat.R#L8
seq_f01=as.list(sapply(allSeq,function(x)c2s(s2c(x)[ref01!="-"])))
Use gsub
will significantly speedup this operation.
Deleting -
will convert the sequences from aligned
to un-aligned
, this is critical, as the package require aligned sequences, https://github.com/Bioconductor-mirror/CorMut/blob/release-3.4/R/seqFormat.R#L2: format=c("clustal","fasta","mase","phylip","msf")
, and calculate differences based on codon (codon position will no more aligned after simply deleting -
).
Besides with previous issus, each codon was calcuated 3 times.
The count of different codon will be real_diff * 3
.
https://github.com/Bioconductor-mirror/CorMut/blob/master/R/CorMut-internal.R#L126-L129
randomly choose a base to substitute ambiguous site
https://github.com/Bioconductor-mirror/CorMut/blob/release-3.4/R/kaksAA.R#L67-L77:
LOD=0
Nnum=NS+NY
for(ii in NYs:Nnum){
per_item=choose(Nnum,ii)*(q^ii)*((1-q)^(Nnum-ii))
#avoid the appear of NaN
if(per_item!="NaN"){
LOD=per_item+LOD
}
#print(choose(dim(mat)[2],i)*(q^i)*((1-q)^(dim(mat)[2]-i)))
}
LOD=-log10(LOD)
there will be precision issue for:
per_item=choose(Nnum,ii)*(q^ii)*((1-q)^(Nnum-ii))
and NaN
will be generated very often.
the author just throw them away by:
if(per_item!="NaN"){
LOD=per_item+LOD
}
He should use pbinom
to calculate the p-value!!!