Skip to content

Instantly share code, notes, and snippets.

@ekalosak
Created February 14, 2020 00:44
Show Gist options
  • Save ekalosak/f2582f7420374d394ab10d2dda056aaa to your computer and use it in GitHub Desktop.
Save ekalosak/f2582f7420374d394ab10d2dda056aaa to your computer and use it in GitHub Desktop.
Visualize high-dimensional optimization runs
GOLD.chili_peppers GOLD.ham SEPIA.chili_peppers SEPIA.ham score step step.quart
otter 5.5555555555556 lion 1.11111111111112 693.805 1 1
gila monster 2.22222222222224 ocelot 3.33333333333336 343.59 2 1
gila monster 4.44444444444448 rhinoceros 1.11111111111112 409.635 3 1
gila monster 2.22222222222224 rhinoceros 4.44444444444448 390.759 4 2
otter 8.88888888888896 rhinoceros 5.5555555555556 830.934 5 2
gila monster 0 rhinoceros 5.5555555555556 361.278 6 2
gila monster 0 silver fox 1.11111111111112 717.802 7 3
gila monster 1.11111111111112 ocelot 7.77777777777784 933.906 8 3
gila monster 1.11111111111112 ocelot 1.11111111111112 408.426 9 3
gnu 6.66666666666672 rhinoceros 7.77777777777784 872.502 10 4
gnu 2.22222222222224 silver fox 6.66666666666672 941.119 11 4

Using R version 3.6.2 Start the R repl. Load your data. Your data should be in the format of the attached foobar.csv. The first few columns are the X values suggested by the optimizer. Score is the Y value. Step is the, well, step. Step.quart is the quartile of the step.

states$step.quart <- as.factor(ntile(states$step, 4))

Then use this plotting code:

lower_continuous_plot <- function(data, mapping, ...) {
  ggplot(data = data, mapping = mapping) +
    geom_hex(..., bins = 4) +
    scale_fill_viridis_c(option = "inferno")
}

lower_combo_plot <- function(data, mapping, ...) {
  x.str <- str_remove(as.character(mapping["x"]), "~")
  y.str <- str_remove(as.character(mapping["y"]), "~")
  col.str <- str_remove(as.character(mapping["color"]), "~")
  if (class(data[, x.str]) == "numeric") {
    pp <- ggplot(data, mapping) +
      geom_violin(aes_string(x = y.str, y = x.str, fill = col.str)) +
      scale_fill_viridis_d() +
      coord_flip()
  } else {
    pp <- ggplot(data, mapping) + geom_violin()
  }
  return(pp)
}

upper_combo_plot <- function(data, mapping, ...) {
  ggplot(data = data, mapping = mapping) +
    geom_point(..., aes(alpha = 0.4)) +
    guides(alpha = F) +
    scale_color_viridis_c(option = "viridis")
}

upper_continuous_plot <- function(data, mapping, ...) {
  ggplot(data = data, mapping = mapping) +
    geom_point(..., aes(alpha = 0.4)) +
    guides(alpha = F) +
    scale_color_viridis_c(option = "viridis")
}

diagnonal_continuous_plot <- function(data, mapping, ...) {
  x.str <- str_remove(as.character(mapping["x"]), "~")
  nbins <- length(unique(data[, x.str]))
  ggplot(data = data, mapping = mapping) +
    geom_histogram(..., aes(fill = step.quart), bins = nbins) +
    scale_fill_viridis_d()
}

p <- ggpairs(states,
  columns = which(colnames(states) %in% state.names),
  aes(color = step, size = score),
  lower = list(
    continuous = lower_continuous_plot,
    combo = lower_combo_plot
  ),
  upper = list(
    continuous = upper_continuous_plot,
    combo = upper_combo_plot
    # continuous = upper_continuous_plot,
    # discrete = lower_discrete_plot,
    # na = lower_discrete_plot
  ),
  diag = list(
    continuous = diagnonal_continuous_plot
  ),
  legend = c(1, 2)
) + theme_bw()
@ekalosak
Copy link
Author

This is the expected result.
Screen Shot 2020-02-13 at 4 37 56 PM
The diagonal is the marginal frequency of each X in the optimization run. The lower triangle is similar - a 2d marginal frequency. The upper triangle is the sequence of steps, as described by the legend on the rhs. Diagonal plots for continuous variables show, via color, the proportion of samples at that x-value suggested at a particular step.quantile.

@lawrennd
Copy link

Really nice, thanks @ekalosak!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment