Last active
December 21, 2015 19:59
-
-
Save cschin/6357971 to your computer and use it in GitHub Desktop.
Some details to make good vitalization for the overlapping data within Celera Assembler
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(1) If we use Celera Assembler's bogart unitiger, it will generate a file called "bests.edges" in the "4-unitigger" as one of its output directories. | |
(2) I wrote a simple script converting the edge list to a GML. The script can be downloaded https://github.com/PacificBiosciences/HBAR-DTK/blob/master/src/CA_best_edge_to_GML.py | |
(3) Load the graph into gephi (https://gephi.org/) | |
(4) I typically use the following steps of different layout algorithms in Gephi to get a good layout, | |
1) "YifanHu's Multilevel" to get a rough layout, the output usually catches good large scale structure and detangle the graph reasonably one so I can start to see features of the assembly overlap graph (or the string graph.) | |
2) "ForceAtlas 2" to smooth the path in the graph. It is a physics based layout algorithm. If you tune the "Gravity" and "Repulsion" parameters right, one can the space-filling-curve-like layout that I showed you yesterday. | |
3) The "ForceAtlas 2" layout algorithm has a tendency to collapse the bubbles. I need to use the "Yifan Hu Proportional" to "open them up". | |
A lot of this requires some try-and-errors. Some of the knowledge from my physics education and rough understanding of the layout algorithm helps. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment