Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(1) If we use Celera Assembler's bogart unitiger, it will generate a file called "bests.edges" in the "4-unitigger" as one of its output directories. | |
(2) I wrote a simple script converting the edge list to a GML. The script can be downloaded https://github.com/PacificBiosciences/HBAR-DTK/blob/master/src/CA_best_edge_to_GML.py | |
(3) Load the graph into gephi (https://gephi.org/) | |
(4) I typically use the following steps of different layout algorithms in Gephi to get a good layout, | |
1) "YifanHu's Multilevel" to get a rough layout, the output usually catches good large scale structure and detangle the graph reasonably one so I can start to see features of the assembly overlap graph (or the string graph.) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some thought about Heng Li's proposal for assembly graph format http://lh3.github.io/2014/07/19/a-proposal-of-the-grapical-fragment-assembly-format/ | |
some quick comments. | |
Is this format trying represent the raw overlaps or finally assembly graph or both? | |
It seems to me that it is more suitable for the first. In the work to represent diploid genome assembly, I had to do multiple level of reduction of the graph from the initial string/overlap graph to simply the problem. if we are looking at a more reduced assembly, we might have to deal with edges corresponding to unitigs with the same in and out nodes. In this format, such bubble paths (difference between them bigger than small indel) will be in different row, the behavior of such edges with the same in and out node should be defined. What I did for diploid work is to assign uid for each edges. | |
Also, I do think the final assembly should avoid the bidirectional edges. It should be resolved by the assembler. From pragmatic point, it will confuse a lot of bi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I currently work for PacBio as a bioinformatist developing some methods to handle single molecule data and genome assembly properly. | |
Recently, I feel I am so lacking of vision. I have spent most of my time helping to develop methods in hope that they will be useful for the scientific community to use PacBio data. While we were developing those methods, as far as I could tell, many of those ONT fans had zero vision about them. We openly revealed those methods for the benefit to the scientific community to understand the value of PacBio's and PacBio-like data. We naively assumed ONT would generate some great data with raw single molecule read accuracy > 96% as what Clive presented in 2012 AGBT. If so, those ONT fans would not need to use any of those methods we had developed. After a while, we find out that some of the visionary ONT fans are finally "inspired" to use some of our methods for processing some ONT data and publishing papers to show some values which some of those fan questioned about before. Wit |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
``` | |
$ cat Dockerfile | |
FROM ubuntu | |
RUN apt-get update -qq && \ | |
apt-get install -qqy tar gzip curl jq && \ | |
apt-get install -qqy python python-pip vim-tiny less git | |
RUN pip install httpie requests | |
RUN apt-get install -qqy curl | |
RUN mkdir /build && cd /build && curl -s https://nim-lang.org/download/nim-0.17.2.tar.xz > nim-0.17.2.tar.xz && \ | |
tar xvf nim-0.17.2.tar.xz && \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from os.path import abspath, expanduser | |
from io import StringIO | |
import contextlib | |
import gzip | |
import re | |
import subprocess | |
## | |
# Utility functions for FastaReader | |
## |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
+ bash getStats.sh chm13_pergrine_p_ctg_cns.fasta | |
Checking attempted with maximum distance from contig end of 1000 bp..Done | |
******************* BAC SUMMARY ****************** | |
TOTAL : 341 | |
BP : 51532183 | |
************** Statistics for: chm13_p_ctg_cns.fasta **************** | |
BACs closed: 321 (94.1349); BACs attempted: 333 %good = 96.3964; BASES 48527269 (94.1689) | |
Median: 99.9878 | |
MedianQV: 39.1364 | |
Mean: 99.94864 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0.0.0.0 ads.doubleclick.net | |
0.0.0.0 s.ytimg.com | |
0.0.0.0 ad.youtube.com | |
0.0.0.0 ads.youtube.com | |
0.0.0.0 clients1.google.com | |
0.0.0.0 dts.innovid.com | |
0.0.0.0 googleads4.g.doubleclick.net | |
0.0.0.0 pagead2.googlesyndication.com | |
0.0.0.0 pixel.moatads.com | |
0.0.0.0 rtd.tubemogul.com |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
❯ ./wfa_adapt | |
s0 len: 16179, s1 len: 16326 Alignment contains 16168 matches 46 mismatches, 149 insertions, and 2 deletions | |
The alignment length is not consitent with sequence length: | |
16168 + 46 + 2 = 16216 != 16179 | |
16168 + 46 + 149 = 16363 != 16326 |
OlderNewer