Skip to content

Instantly share code, notes, and snippets.

@jsta
Last active January 7, 2017 16:46
Show Gist options
  • Save jsta/7d2832ddb1b2246b24d1f427bf04df3d to your computer and use it in GitHub Desktop.
Save jsta/7d2832ddb1b2246b24d1f427bf04df3d to your computer and use it in GitHub Desktop.
Scrape lake metadata tables from Wikipedia
library(WikipediR)
library(rvest)
get_lake_wiki <- function(lake_name){
res <- WikipediR::page_content("en", "wikipedia", page_name = lake_name,
as_wikitext = FALSE)
res <- res$parse$text[[1]]
res <- xml2::read_html(res)
res <- rvest::html_nodes(res, "table")
res <- rvest::html_table(res[1])[[1]]
# format coordinates ####
coords <- res[which(res[,1] == "Coordinates"), 2]
coords <- strsplit(coords, "\\/")[[1]]
coords <- sapply(coords, function(x) strsplit(x, "Coordinates: "))
coords <- sapply(coords, function(x) strsplit(x, " "))
coords <- paste(unlist(coords), collapse = ",")
coords <- strsplit(coords, ",")[[1]]
coords <- coords[!(1:length(coords) %in%
c(which(nchar(coords) == 0),
grep("W", coords),
grep("N", coords))
)][1:2]
coords <- paste(gsub(";", "", coords), collapse = ",")
res[which(res[,1] == "Coordinates"), 2] <- coords
# rm junk rows
if(length(grep("well-defined", res[,1])) != 0){
res <- res[!(1:nrow(res) %in% grep("well-defined", res[,1])),]
message("Shore length is not a well-defined measure.")
}
if(length(grep("Islands", res[,1])) != 0){
res <- res[!(1:nrow(res) %in% grep("Islands", res[,1])),]
}
res
}
get_lake_wiki("Corey Lake")
get_lake_wiki("Lake Koshkonong")
@jsta
Copy link
Author

jsta commented Jan 7, 2017

Functions greatly improved and moved to a separate package now on CRAN (https://github.com/jsta/wikilake).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment