Last active
May 10, 2019 01:12
-
-
Save mbjones/5003ab37ec42367a3b6e065b08aacea9 to your computer and use it in GitHub Desktop.
Plotting FAIR metrics
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "FAIR Metrics" | |
author: "Matt Jones" | |
date: "5/8/2019" | |
output: html_document | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(echo = TRUE) | |
library(dplyr) | |
library(tidyr) | |
library(ggplot2) | |
library(scales) | |
``` | |
## Generate a simulated data set | |
This is a fake data set that takes the form: | |
```{r load_data} | |
updates <- data.frame(v1=seq(as.Date("2000/1/1"), by = "month", length.out = 10), | |
v2=seq(as.Date("2000/5/1"), by = "month", length.out = 10), | |
v3=seq(as.Date("2000/8/1"), by = "month", length.out = 10)) %>% | |
gather(key = version, value = update_date) %>% | |
arrange(update_date) %>% | |
select(update_date) | |
fair <- expand.grid(version = seq(1,3,1), object = seq(1,5,1), scope = c("adc", "knb")) %>% | |
mutate(pid = paste(scope, object, version, sep=".")) %>% | |
mutate(f = sample(90:100, 30, replace=TRUE)) %>% | |
mutate(a = sample(70:100, 30, replace=TRUE)) %>% | |
mutate(i = sample(40:70, 30, replace=TRUE)) %>% | |
mutate(r = sample(30:50, 30, replace=TRUE)) %>% | |
mutate(score = as.integer((f + a + i + r)/4)) | |
scores <- bind_cols(updates, fair) | |
head(scores) | |
``` | |
## Calculate stats | |
Generate stats by first grouping by month, then keep only the most recent | |
observation for each dataset that month, and then take the average of those | |
for each of the FAIR facets by month. Finally, transpose the data. | |
```{r calculate_means} | |
most_recent <- scores %>% | |
arrange(update_date, object, version) %>% | |
group_by(update_date, object) %>% | |
top_n(1, version) | |
score_means <- most_recent %>% | |
group_by(update_date) %>% | |
summarise(f=mean(f), a=mean(a), i=mean(i), r=mean(r)) %>% | |
gather(metric, mean, -update_date) | |
score_means$metric <- factor(score_means$metric, | |
levels=c("f", "a", "i", "r"), | |
labels=c("Findable", "Accessible", "Interoperable", "Reusable")) | |
head(score_means) | |
``` | |
## Plot it! | |
```{r plot} | |
d1_colors <- c("#ff582d", "#c70a61", "#1a6379", "#60c5e4") | |
ggplot(data=score_means, mapping=aes(x=update_date, y=mean, color=metric)) + | |
geom_line() + | |
geom_point(size=1) + | |
theme_bw() + | |
scale_colour_manual(values=d1_colors) + | |
scale_x_date(date_breaks="3 months", date_minor_breaks="months", labels=date_format("%Y %b")) + | |
scale_y_continuous(limits=c(0,100)) | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment