Skip to content

Instantly share code, notes, and snippets.

@seanjtaylor
Created October 5, 2016 19:04
Show Gist options
  • Save seanjtaylor/6f68bc6c028c32ee623136e3996ac564 to your computer and use it in GitHub Desktop.
Save seanjtaylor/6f68bc6c028c32ee623136e3996ac564 to your computer and use it in GitHub Desktop.
library(rvest)
library(dplyr)
library(stringr)
library(ggplot2)
tbls <- read_html('https://en.wikipedia.org/wiki/List_of_serial_killers_by_number_of_victims') %>% html_table()
t1 <- tbls[[1]] %>% select(name = Name, country = Country, years = `Years active`, victims = `Proven victims`) %>% mutate(victims = as.character(victims))
t2 <- tbls[[2]] %>% select(name = Name, country = Country, years = `Years active`, victims = `Proven victims`)
t3 <- tbls[[3]] %>% select(name = Name, country = Country, years = `Years active`, victims = `Proven victims*`)
t4 <- tbls[[4]] %>% select(name = Name, country = Country, years = `Years active`, victims = `Proven victims*`) %>% mutate(victims = as.character(victims))
first.year <- Vectorize(function(s) {
min(as.numeric(unlist(stringr::str_match_all(s, '[0-9]{4}'))))
})
last.year <- Vectorize(function(s) {
max(as.numeric(unlist(stringr::str_match_all(s, '[0-9]{4}'))))
})
bind_rows(t1, t2, t3, t4) %>%
mutate(first.year = first.year(years),
last.year = last.year(years),
victims = as.numeric(stringr::str_extract(victims, '[0-9]+'))) %>%
group_by(last.year) %>%
summarise(n = n(), victims = sum(victims)) %>%
ggplot(aes(x = last.year, y = n)) +
geom_line() +
xlab('Last Year Active') +
ylab('Number of Serial Killers') +
theme_bw()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment