To run this you will need the following packages:
library(corrr)
library(dplyr)
library(purrr)
library(tidyr)
library(nycflights13)
library(igraph)
library(ggraph)
My objective here is to demonstrate how to visualize a correlation matrix using a network plot grouped by a variable. In this case we are interested in weather data grouped by airport (origin
) from the weather
data in the nycflights13
package. Our first step is to do some basic data munging. This gives us some numeric variables and the grouping variable (origin
) that we can work with:
weather_sub <- weather %>%
group_by(origin) %>%
select(-(year:hour)) %>%
select_if(is.numeric)
We are using the corrr
package to evaluate these relationships. The trick is that we need to evaluate this on the basis of the group - in this case origin
. We can make use of the map
function from the purrr
package. We are mapping the stretch
and correlate
functions over the weather$origin
vector then filtering for correlation coefficients over the absolute value of 0.3 then converting the data into a suitable format for ggraph
. Note that we need to directly call compose
from purrr
:
weather_cor <- weather_sub %>%
group_by(origin) %>% ## redundant but worth it for illustration
nest() %>%
mutate(data = map(data, purrr::compose(stretch, correlate))) %>%
unnest() %>%
select(x, y, r, origin) %>%
filter(abs(r) > .3) %>%
graph_from_data_frame(directed = FALSE)
ggraph(weather_cor, layout = "kk") +
geom_edge_link(aes(edge_alpha = abs(r), color = r), edge_width = 5) +
guides(edge_alpha = "none") +
scale_edge_colour_gradientn(limits = c(-1, 1), colors = heat.colors(5)) +
geom_node_point(color = "black", size = 4) +
geom_node_text(aes(label = name), repel = TRUE) +
facet_edges(~origin) +
theme_minimal()