Skip to content

Instantly share code, notes, and snippets.

@davipatti
Last active February 4, 2019 10:41
Show Gist options
  • Save davipatti/b450918a8eb800f17628969327040623 to your computer and use it in GitHub Desktop.
Save davipatti/b450918a8eb800f17628969327040623 to your computer and use it in GitHub Desktop.
Select phylogenetically distant taxa in a tree

Selecting dissimilar leaves in a phylogenetic tree

Algorithm

To select N leaves.

- Pick a leaf at random. Add it to a selected list.

While N < len(selected):

- For all not selected leaves, compute minimum distance (d) to any leaf in selected.
- Identiy the leaf that has maximum d, and add this to selected.
def selectMostDistantLeaves(tree, n):
"""Select n phylogenetically distance leaves in a tree.
Args:
tree (dendropy.Tree)
n (int)
Returns:
list containing labels of leaf nodes
"""
tree = Tree(tree) # distancesFromNode alters tree, so make a copy
node = random.choice(tree.leaf_nodes())
selected = pd.DataFrame({node: distancesFromNode(tree, node)})
while selected.shape[1] < n:
node = selected.apply(min, axis=1).idxmax() # Could store only minima, not entire df. Would save repeated computation
selected[node] = distancesFromNode(tree, node)
return [n.taxon.label for n in selected.columns]
def distancesFromNode(tree, node):
"""Compute distances from all nodes in tree to node.
Args:
tree (dendropy.Tree)
node (dendropy.node) Must be node in tree.
Returns:
pd.Series containing distance from node
"""
tree.reroot_at_node(node)
series = pd.Series(tree.calc_node_root_distances(), index=tree.leaf_nodes())
series[node] = 0
return series
@davipatti
Copy link
Author

dissimilar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment