Skip to content

Instantly share code, notes, and snippets.

@fauxneticien
Created November 29, 2022 19:20
Show Gist options
  • Save fauxneticien/37e42be795341408c6ae62e2ae2e9c6b to your computer and use it in GitHub Desktop.
Save fauxneticien/37e42be795341408c6ae62e2ae2e9c6b to your computer and use it in GitHub Desktop.
Scrape from Reddit API
library(httr)
library(purrr)
library(tibble)
# See also https://bookdown.org/paul/apis_for_social_scientists/reddit-api.html
url <- 'https://www.reddit.com/r/mentalhealth/new.json?t=day&limit=100'
response <- GET(url, user_agent('Extracting data from Reddit'))
data <- content(response, type = 'application/json')
# Figure out what you want to extract from sample (e.g. title)
sample <- data$data$children[[1]]$data
# Helpers:
# sort(names(sample))
# View(sample)
df <- purrr::map_dfr(data$data$children, function(post) {
d <- post$data
tibble(
title = d$title,
text = d$selftext,
num_comments = d$num_comments
)
})
head(df)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment