Skip to content

Instantly share code, notes, and snippets.

@favstats
Created January 10, 2019 11:54
Show Gist options
  • Select an option

  • Save favstats/5a856d8fc0679d23850d5491496839de to your computer and use it in GitHub Desktop.

Select an option

Save favstats/5a856d8fc0679d23850d5491496839de to your computer and use it in GitHub Desktop.
Twitter Followers per Day from Social Blade
## Helper function to preview ggplots
## thanks to @tjmahr for sharing!
ggpreview <- function (..., device = "png") {
fname <- tempfile(fileext = paste0(".", device))
ggplot2::ggsave(filename = fname, device = device, ...)
system2("open", fname)
invisible(NULL)
}
## install pacman if you don't have it
pacman::p_load(tidyverse, rvest, glue, ggrepel, emo)
## my social blade data
social_html <- read_html("https://socialblade.com/twitter/user/favstats/monthly")
## function to scrape social blade data
get_social_data <- function(x) {
date <- social_html %>%
html_nodes(xpath = glue("/html/body/div[10]/div[1]/div[{x}]/div[1]")) %>%
html_text()
daily_follower <- social_html %>%
html_nodes(xpath = glue("/html/body/div[10]/div[1]/div[{x}]/div[3]/div[1]/span")) %>%
html_text()
tibble(date, daily_follower)
}
## scraping and cleaning
social_data <- 5:34 %>%
map_dfr(get_social_data) %>%
mutate(daily_follower = ifelse(daily_follower == "--", 0, parse_number(daily_follower))) %>%
mutate(date = lubridate::as_date(date)) %>%
mutate(pos_neg = ifelse(daily_follower >= 0, "pos", "neg"))
## graph
ggtwitter <- social_data %>%
ggplot(aes(date, daily_follower)) +
geom_point() +
geom_line(size = 0.5) +
geom_text_repel(data = social_data %>% filter(daily_follower >= 5),
aes(label = daily_follower),
nudge_y = 1, nudge_x = 0.5,
direction = "y") +
theme_minimal() +
geom_hline(yintercept = 0, linetype = "dashed", color = "grey") +
annotate(geom = "text", x = as.Date("2019-01-06"), y = 62,
label = ji_glue(":open_mouth:Hadley Wickham retweets me:open_mouth:")) +
annotate(geom = "text", x = as.Date("2019-01-04") + 0.35, y = 14,
label = ji_glue("Tweet about Rstudio Cloud :cloud:")) +
labs(x = "", y = "Twitter Followers per Day\n",
title = "Welcome to all my new Twitter Followers!",
subtitle = "Beware: You will encounter a lot of R. And Memes\n",
caption = "Data from Social Blade") +
scale_x_date(date_breaks = "4 day", date_labels = "%d %b %Y") +
theme(plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(face = "italic"))
## preview plot with this awesome function
# ggpreview(width = 10, height = 6)
ggsave("ggtwitter.png", width = 10, height = 6)
## Tweet with Graph can be found here: https://twitter.com/favstats/status/1083330490915086336
@erinlynmclean
Copy link
Copy Markdown

Howdy! I'm getting

> social_html <- read_html("https://socialblade.com/twitter/user/favstats/monthly") Error in open.connection(x, "rb") : HTTP error 403.

When I try to run this. Any ideas? The preliminary research I did tells me I may not have permissions to scrape web data from this domain.

@favstats
Copy link
Copy Markdown
Author

Yes, this is not working indeed! Seems like they closed the opportunity to retrieve data from their website like this. You can check out RSelenium (https://github.com/ropensci/RSelenium) to scrape the website anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment