Skip to content

Instantly share code, notes, and snippets.

@mdml
Last active March 30, 2023 16:30
Show Gist options
  • Save mdml/7537455 to your computer and use it in GitHub Desktop.
Save mdml/7537455 to your computer and use it in GitHub Desktop.
Dendrograms: Convert from Scipy to D3

A dendrogram is a common way to represent hierarchical data. For Python users, Scipy has a hierarchical clustering module that performs hierarchical clustering and outputs the results as dendrogram plots via matplotlib. When it's time to make a prettier, more customized, or web-version of the dendogram, however, it can be tricky to use Scipy's dendrogram to create a suitable visualization. My preferred method of visualizing data -- especially on the web -- is D3. This example includes a script to convert a Scipy dendrogram into JSON format used by D3's cluster method.

In the example, I cluster six genes by their expression values from two experiments. You can easily replace that data with your own, larger data set, to harness the power of both Scipy and D3 for analyzing hierarchical data. The D3 code I used to generate this example is straight from Mike Bostock's [dendrogram example](](http://bl.ocks.org/mbostock/4063570): just replace the JSON file flare.json with your own (and maybe tweak the width/height).

{
"children": [
{
"children": [
{
"children": [],
"name": "f"
},
{
"children": [
{
"children": [],
"name": "b"
},
{
"children": [
{
"children": [
{
"children": [],
"name": "c"
},
{
"children": [],
"name": "d"
}
],
"name": "c-d"
},
{
"children": [
{
"children": [],
"name": "a"
},
{
"children": [],
"name": "e"
}
],
"name": "a-e"
}
],
"name": "a-c-d-e"
}
],
"name": "a-b-c-d-e"
}
],
"name": "a-b-c-d-e-f"
}
],
"name": "Root1"
}
#!/usr/bin/python
# Load required modules
import pandas as pd
import scipy.spatial
import scipy.cluster
import numpy as np
import json
import matplotlib.pyplot as plt
# Example data: gene expression
geneExp = {'genes' : ['a', 'b', 'c', 'd', 'e', 'f'],
'exp1': [-2.2, 5.6, 0.9, -0.23, -3, 0.1],
'exp2': [5.4, -0.5, 2.33, 3.1, 4.1, -3.2]
}
df = pd.DataFrame( geneExp )
# Determine distances (default is Euclidean)
dataMatrix = np.array( df[['exp1', 'exp2']] )
distMat = scipy.spatial.distance.pdist( dataMatrix )
# Cluster hierarchicaly using scipy
clusters = scipy.cluster.hierarchy.linkage(distMat, method='single')
T = scipy.cluster.hierarchy.to_tree( clusters , rd=False )
# Create dictionary for labeling nodes by their IDs
labels = list(df.genes)
id2name = dict(zip(range(len(labels)), labels))
# Draw dendrogram using matplotlib to scipy-dendrogram.pdf
scipy.cluster.hierarchy.dendrogram(clusters, labels=labels, orientation='right')
plt.savefig("scipy-dendrogram.png")
# Create a nested dictionary from the ClusterNode's returned by SciPy
def add_node(node, parent ):
# First create the new node and append it to its parent's children
newNode = dict( node_id=node.id, children=[] )
parent["children"].append( newNode )
# Recursively add the current node's children
if node.left: add_node( node.left, newNode )
if node.right: add_node( node.right, newNode )
# Initialize nested dictionary for d3, then recursively iterate through tree
d3Dendro = dict(children=[], name="Root1")
add_node( T, d3Dendro )
# Label each node with the names of each leaf in its subtree
def label_tree( n ):
# If the node is a leaf, then we have its name
if len(n["children"]) == 0:
leafNames = [ id2name[n["node_id"]] ]
# If not, flatten all the leaves in the node's subtree
else:
leafNames = reduce(lambda ls, c: ls + label_tree(c), n["children"], [])
# Delete the node id since we don't need it anymore and
# it makes for cleaner JSON
del n["node_id"]
# Labeling convention: "-"-separated leaf names
n["name"] = name = "-".join(sorted(map(str, leafNames)))
return leafNames
label_tree( d3Dendro["children"][0] )
# Output to JSON
json.dump(d3Dendro, open("d3-dendrogram.json", "w"), sort_keys=True, indent=4)
<!DOCTYPE html>
<meta charset="utf-8">
<style>
.node circle {
fill: #fff;
stroke: steelblue;
stroke-width: 1.5px;
}
.node {
font: 10px sans-serif;
}
.link {
fill: none;
stroke: #ccc;
stroke-width: 1.5px;
}
</style>
<body>
<script src="http://d3js.org/d3.v3.min.js"></script>
<script>
var width = 800,
height = 550;
var cluster = d3.layout.cluster()
.size([height, width - 160]);
var diagonal = d3.svg.diagonal()
.projection(function(d) { return [d.y, d.x]; });
var svg = d3.select("body").append("svg")
.attr("width", width)
.attr("height", height)
.append("g")
.attr("transform", "translate(40,0)");
d3.json("d3-dendrogram.json", function(error, root) {
var nodes = cluster.nodes(root),
links = cluster.links(nodes);
var link = svg.selectAll(".link")
.data(links)
.enter().append("path")
.attr("class", "link")
.attr("d", diagonal);
var node = svg.selectAll(".node")
.data(nodes)
.enter().append("g")
.attr("class", "node")
.attr("transform", function(d) { return "translate(" + d.y + "," + d.x + ")"; })
node.append("circle")
.attr("r", 4.5);
node.append("text")
.attr("dx", function(d) { return d.children ? -8 : 8; })
.attr("dy", 3)
.style("text-anchor", function(d) { return d.children ? "end" : "start"; })
.text(function(d) { return d.name; });
});
d3.select(self.frameElement).style("height", height + "px");
</script>
@FL33TW00D
Copy link

This is fantastic. Thank you very much.

@williamjin1992
Copy link

Brilliant example!
But if I only want to label the terminal nodes of the tree, what modification should I perform?
Thanks!

@SED910
Copy link

SED910 commented Dec 23, 2021

@williamjin1992 did you succeded ?

@jjj1231978
Copy link

Thanks so much!

@nsunkad
Copy link

nsunkad commented Dec 2, 2022

@mdml This is awesome! One thing, I noticred this d3 code has the same height for each edge.

I'm pretty new to d3. Is there a way to preserve the heights/colors of each edge so I can model a tree like in this Matplot graph?

Screen Shot 2022-12-02 at 12 20 36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment