Skip to content

Instantly share code, notes, and snippets.

@jmclawson
Last active July 1, 2023 04:56
Show Gist options
  • Save jmclawson/c982b4e9763f456ab1b09e5034beb6f9 to your computer and use it in GitHub Desktop.
Save jmclawson/c982b4e9763f456ab1b09e5034beb6f9 to your computer and use it in GitHub Desktop.
using sf to map from external shapefiles and from packages
---
title: "Mapping with R"
output:
html_document:
df_print: paged
toc: true
toc_float: true
date: "2023-06-30"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Load packages
The `sf` package is one common way to map with `ggplot2.`
```{r message=FALSE, warning=FALSE}
library(sf)
library(tidyverse)
```
## Get shape data
The hardest part about mapping is to find good mapping data. This is stored in a special data type that defines borders and locations on the globe. Shape data can be imported into R, or it can be provided by packages.
### From files
The UN collects shape data in "shapefiles" for some countries here: https://salb.un.org/en/data. Most countries are lacking geospatial assets in this list. For countries with data, shapefiles are presented in compressed zip format that should be uncompressed after downloading. Then read it with `read_sf()`.
The resulting object will have a special column called `geometry` (*the very last column!*) of type `MULTIPOLYGON`.
```{r cyprus-un-data}
# hosted zip file from this location: https://geoportal.un.org/arcgis/sharing/rest/content/items/d50cdce398af4712b6e31042c085e5ba/data
# unzip it and locate the file with the SHP suffix
cyprus_un <- read_sf("gis/BNDA_CYP_1960-08-01_lastupdate.shp")
cyprus_un
```
Additionally, it is a special kind of data frame, with `sf` as the first of its classes:
```{r}
class(cyprus_un)
```
Any object with `sf` as its class and this special `geometry` column can be mapped simply with `ggplot()` and `geom_sf()`---no need to specify X- or Y-axis:
```{r cyprus-map-1}
cyprus_un %>%
ggplot() +
geom_sf()
```
As with any `ggplot` figure, we can map columns to aesthetics:
```{r cyprus-map-2}
cyprus_un %>%
ggplot() +
geom_sf(aes(fill = adm1nm))
```
For comparison, here's a similar process with Mexico. Its legibility suffers because of too much detail.
```{r mexico-un}
# hosted zip file from this location: https://geoportal.un.org/arcgis/sharing/rest/content/items/14e7cef499aa42bdb773511d4be516b2/data
# unzip it and locate the file with the SHP suffix
mexico_un <- read_sf("gis/BNDA_MEX_2018-02-01_lastupdate.shp")
mexico_un %>%
mutate(region = fct_collapse(adm1nm,
Aguascalientes = "Aguascalientes",
Zacatecas = "Zacatecas",
Jalisco = "Jalisco",
"Ciudad de México" = "Ciudad de México",
other_level = "other")) %>%
ggplot() +
geom_sf(aes(fill = region)) +
labs(fill = NULL) +
theme_minimal() +
scale_fill_manual(
values = c(Aguascalientes = "orange",
Zacatecas = "lightblue",
Jalisco = "lightgreen",
"Ciudad de México" = "pink",
other = "gray"))
```
### From a package
Packages that include shapefiles make it possible to avoid the task of finding and importing shapefiles manually. One such package is `rnaturalearth`.
As always, there will be differences in quality and methods of encoding the data. Cyprus shows less detail than the UN data, and it is divided differently.
```{r cyprus-ne, message=FALSE, warning=FALSE}
library("rnaturalearth")
library("rnaturalearthdata")
cyprus_ne <- ne_states(country = "cyprus", returnclass = "sf")
cyprus_northern_ne <- ne_states(country = "northern cyprus", returnclass = "sf")
cyprus_combined_ne <- rbind(cyprus_ne, cyprus_northern_ne)
cyprus_combined_ne %>%
ggplot() +
geom_sf(aes(fill = name)) +
theme_minimal() +
labs(fill = NULL)
```
Mexico also shows some differences in the level of detail and the names.
```{r mexico-ne}
mexico_ne <- ne_states(country = "mexico", returnclass = "sf")
mexico_ne %>%
mutate(region = fct_collapse(name,
Aguascalientes = "Aguascalientes",
Zacatecas = "Zacatecas",
Jalisco = "Jalisco",
"Distrito Federal" = "Distrito Federal",
other_level = "other")) %>%
ggplot() +
geom_sf(aes(fill = region)) +
theme_minimal() +
scale_fill_manual(
values = c(Aguascalientes = "orange",
Zacatecas = "lightblue",
Jalisco = "lightgreen",
"Distrito Federal" = "pink",
other = "gray"))
```
The biggest difference with the data from the package is that it covers the world:
```{r world-ne}
world_ne <- ne_countries(scale = "small", returnclass = "sf")
world_ne %>%
ggplot() +
geom_sf(aes(fill = economy)) +
theme_minimal()
world_ne %>%
ggplot() +
geom_sf(aes(fill = pop_est)) +
theme_minimal() +
scale_fill_viridis_c(option = "F",
labels = scales::label_comma())
```
## Join shape data with other sources
Shape data can be combined with other data to map it. Be careful when using joins with data frames of class `sf`. The order used for joins will make a difference and could remove the special `sf` class if that data frame isn't given priority in the join.
```{r message=FALSE, warning=FALSE}
j2sr <- read_csv("data/j2sr.csv")
j2sr_latest <- j2sr %>%
filter(!is.na(child_health)) %>%
group_by(country) %>%
slice_max(order_by = year, n = 1) %>%
ungroup()
sf_success <-
left_join(world_ne, j2sr_latest,
by = join_by("geounit"=="country"))
class(sf_success)
sf_fail <-
left_join(j2sr_latest, world_ne,
by = join_by("country"=="geounit"))
class(sf_fail)
```
Mapping from the joined data set can provide new insights about familiar information.
```{r}
sf_success %>%
ggplot() +
geom_sf(aes(fill = child_health)) +
theme_minimal()
```
Limiting the data can be a good way to explore more detail within a local context.
```{r}
ne_countries(continent = "South America", returnclass = "sf", scale = "large") %>%
left_join(j2sr_latest,
by = join_by("geounit"=="country")) %>%
ggplot() +
geom_sf(aes(fill = child_health)) +
theme_void()
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment