Created
February 7, 2015 19:50
-
-
Save jennybc/862a01dc9243118d83c9 to your computer and use it in GitHub Desktop.
Digest link header in paginated GitHub API request
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' Digest the Link header in a paginated result. | |
#' | |
#' Converts the Link header from a monolithic string to a usable data.frame. | |
#' | |
#' The GitHub API automatically paginates when the number of requested items | |
#' exceeds the number of items per page. When this occurs, the result returned | |
#' by the server will include a Link header that provides the URLs for other | |
#' pages of results, such as the next page and the last page. These assorted | |
#' URLs are catenated in a single string and this function converts that | |
#' information into a data.frame that is useful for traversing the pages. | |
#' | |
#' @param x Output of a function that gets potentially paginated results, e.g., | |
#' \code{get.*.repositories()} | |
#' | |
#' @return A data.frame, one row per URL = page. Maximum number of rows is four: | |
#' one each for "next", "last", "first", and "prev" page, indicated by the | |
#' \code{rel} variable. The \code{per_page} variable will be constant across | |
#' all rows and gives the number of items per page. If header contains no | |
#' links at all, the return value is NULL and a message is given. | |
#' | |
#' @references | |
#' \url{https://developer.github.com/guides/traversing-with-pagination/} | |
#' \url{https://developer.github.com/v3/#pagination} | |
#' | |
#' @examples | |
#' repos <- get.organization.repositories(org = "STAT545-UBC", per_page = 1) | |
#' digest_header_links(repos) | |
digest_header_links <- function(x) { | |
y <- x$headers$link | |
if(is.null(y)) { | |
message("No links found in header.") | |
return(NULL) | |
} | |
y %>% | |
str_split(", ") %>% unlist %>% # split into e.g. next, last, first, prev | |
str_split_fixed("; ", 2) %>% # separate URL from the relation | |
plyr::alply(2) %>% # workaround: make into a list | |
dplyr::as_data_frame %>% # convert to data.frame, no factors! | |
setNames(c("URL", "rel")) %>% # sane names | |
dplyr::mutate_(rel = ~ str_match(rel, "next|last|first|prev"), | |
per_page = ~ str_match(URL, "per_page=([0-9]+)") %>% | |
`[`( , 2) %>% as.integer, | |
page = ~ str_match(URL, "&page=([0-9]+)") %>% | |
`[`( , 2) %>% as.integer, | |
URL = ~ str_replace_all(URL, "<|>", "")) | |
} |
Excellent! I know you probably won't see this, @aronlinberg, because Gist comments don't trigger notifications, but I'll still reply in case you do: The dplyr::as_data_frame()
thing is probably because I've installed from GitHub and you from CRAN (or GitHub, but not as recent).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks @jennybc! I was able to adapt this gist for my purposes: https://gist.github.com/aronlindberg/2a9e9802579b2d239655
I modified the original function to use
as.data.frame()
becausedplyr::as_data_frame
could not be found. I also made sure that it returns0
instead of a message when there are no headers (necessary for my subsequent iteration function to work).