Last active
September 23, 2024 05:46
-
-
Save ksonda/5ae70a069a35ebe8ef4bd622df73530d to your computer and use it in GitHub Desktop.
R-geoconnex
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Using the geoconnex Reference Feature Server with R" | |
format: | |
html: | |
code-fold: show | |
toc: true | |
toc-location: left | |
toc-depth: 4 | |
toc-expand: 3 | |
page-layout: full | |
--- | |
## Introduction | |
Below we demonstrates how to use the Geoconnex Reference Feature server API with R's spatial packages, particularly `sf`. We'll explore various use cases for working with hydrological and related spatial data. These use cases can be groups into three categories: | |
1. Finding identifiers and spatial geometries for real-world features | |
2. Finding features that are hydrologically related to a feature of interest (if that information is available) | |
3. Finding datasets related to a feature of interest | |
This API conforms to the [OGC-API Features specification](https://ogcapi.ogc.org/features/), and its full API documentation is available [here](https://reference.geoconnex.us/openapi) for those who wish to explore its capabilities further than what is demonstrated below. | |
## Setup | |
First, let's load the necessary libraries and set up our API base URL. | |
```{r setup, message=FALSE} | |
library(sf) | |
library(dplyr) | |
library(httr) | |
library(mapview) | |
library(jsonlite) | |
library(knitr) | |
library(DT) | |
base_url <- "https://reference.geoconnex.us" | |
``` | |
## Use Case 1: Identifying features and their spatial geometry | |
The Geoconnex Reference Feature server is a source for identifiers and geometries for real-world features that many organizations may collect and publish data about. The attributes of these features vary, but all include a "uri" which serves as a [persistent identifer](https://docs.geoconnex.us/contributing/minting). First, let's discover what kinds of features are available: | |
```{r collections} | |
url <- paste0(base_url, "/collections?f=json") | |
collections <- jsonlite::fromJSON(url) | |
datatable(collections$collections) | |
``` | |
We see a number of options available, including watershed boundaries like the Hydrologic Unit Codes, administrative boundaries like counties, states, and public water systems, and hydrologic features like mainstems (rivers) and aquifers, and hydrometric features such as dams and gages. The reference feature server lets us find features according to attribute and spatial queries. In general, this is accomplished by passing queries of the form `https://reference.geoconnex.us/collections/<collectionId>/items?filter=` Let's see how to use these collections! | |
### Attribute Queries | |
#### Fuzzy text match | |
Let's say we're interested in a specific river, one we think is called the "Animas river". We can pass an attribute filter query to the `mainstems` collection", and then use the R `sf` package to read into a dataframe, and the `mapview` package to visualize. | |
```{r mainstem} | |
# construct a query for a river name that includes the string "animas" | |
query <- URLencode("name_at_outlet ILIKE '%animas%'") | |
url <- paste0(base_url, "/collections/mainstems/items?f=json&filter=", query) | |
# Read the data into an sf object | |
animas_rivers <- st_read(url, quiet = TRUE) | |
# Display the results | |
animas_rivers |> | |
select(uri, name_at_outlet, outlet_drainagearea_sqkm) |> | |
datatable() | |
# Map the results | |
mapview(animas_rivers |> | |
select(uri, name_at_outlet), zcol = "name_at_outlet") | |
``` | |
There are evidently 3 rivers that include the word "Animas". Let's say we were interested in the "Animas River", shown on the map in Green. We find that it's Geoconnex URI is `https://geoconnex.us/ref/mainstems/35394`. | |
#### Logical and quantitative | |
We can also do filters based on logical and quantitative filters on attributes. Let's say we wanted to find all rivers with drainage areas (in this reference dataset, the attribute is `outlet_drainagearea_sqkm`) greater than 1,000,000 square kilometers: | |
```{r mainstem2} | |
# construct a query for a river with outlet_drainagearea_sqkm > 600,000 | |
query <- URLencode("outlet_drainagearea_sqkm > 500000") | |
url <- paste0(base_url, "/collections/mainstems/items?f=json&filter=", query) | |
# Read the data into an sf object | |
large_mainstems <- st_read(url, quiet = TRUE) | |
# Display the results | |
large_mainstems |> | |
select(uri, name_at_outlet, outlet_drainagearea_sqkm) |> | |
datatable() | |
# Map the results | |
mapview(large_mainstems, zcol = "name_at_outlet") | |
``` | |
#### Combining Queries | |
Queries over multiple attributes can also be made, combining with 'AND' or 'OR'. For example, let's find all Dams that include the name "Hoover", but then also filter to only the ones with a drainage area of more than 100,000 square kilometers: | |
```{r hoover-dams} | |
# Step 1: Find all dams named "Hoover" | |
query_hoover <- URLencode("name LIKE '%Hoover%'") | |
url_hoover <- paste0(base_url, "/collections/dams/items?f=json&filter=", query_hoover) | |
hoover_dams <- st_read(url_hoover, quiet = TRUE) | |
cat("Number of dams named 'Hoover':", nrow(hoover_dams), "\n") | |
# Create an interactive table of all Hoover dams | |
datatable( | |
hoover_dams |> | |
st_drop_geometry() |> | |
select(uri, name, drainage_area_sqkm) |> | |
arrange(desc(drainage_area_sqkm)), | |
options = list(pageLength = 10, scrollX = TRUE), | |
caption = "All Dams Named 'Hoover'", | |
rownames = FALSE | |
) | |
# Step 2: Query for large Hoover dams using a combined filter | |
query_large_hoover <- URLencode("name LIKE '%Hoover%' AND drainage_area_sqkm > 100000") | |
url_large_hoover <- paste0(base_url, "/collections/dams/items?f=json&filter=", query_large_hoover) | |
large_hoover_dams <- st_read(url_large_hoover, quiet = TRUE) | |
cat("\nNumber of large Hoover dams (Drainage Area > 100,000 sq km):", nrow(large_hoover_dams), "\n") | |
# Create an interactive table of large Hoover dams | |
datatable( | |
large_hoover_dams |> | |
st_drop_geometry() |> | |
select(uri, name, drainage_area_sqkm) |> | |
arrange(desc(drainage_area_sqkm)), | |
options = list(pageLength = 10, scrollX = TRUE), | |
caption = "Large Dams Named 'Hoover' (Drainage Area > 100,000 sq km)", | |
rownames = FALSE | |
) | |
# Create a map view of all Hoover dams, highlighting the large ones | |
m <- mapview(hoover_dams |> | |
select(uri, name, drainage_area_sqkm), layer.name = "All Hoover Dams", label="name") | |
m + mapview(large_hoover_dams |> | |
select(uri, name, drainage_area_sqkm), color = "red", col.regions="red", layer.name = "Large Hoover Dams", lwd=2, cex=15, label="Hoover") | |
``` | |
We found 39 Dams in the US named "Hoover", but only 1 with a large drainage area, the famous one near Las Vegas, NV. | |
### Spatial Queries | |
#### Bounding box | |
We can also do spatial queries, using [bounding box](https://wiki.openstreetmap.org/wiki/Bounding_box) queries (min lon, min lat, max lon, max lat) or by passing [WKT-encoded geometry](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry). Let's say we want to find all Public Water Systems within a bounding box around the four-corners region. | |
```{r pws-four-corners} | |
# Define the bounding box for the Four Corners area | |
# Format: (min Longitude, min Latitude, max Longitude, max Latitude) | |
bbox <- c(-109.5, 36.5, -107.5, 37.5) | |
# Construct the URL with the bbox parameter | |
url <- paste0(base_url, "/collections/pws/items?f=json&bbox=", paste(bbox, collapse = ",")) | |
# Read the data into an sf object | |
four_corners_pws <- st_read(url, quiet = TRUE) | |
# Display summary of the results | |
cat("Number of Public Water Systems found:", nrow(four_corners_pws), "\n") | |
# Create an interactive table of the results | |
four_corners_pws |> | |
st_drop_geometry() |> | |
select(uri, pws_name, population_served_count) |> | |
arrange(desc(population_served_count)) |> | |
datatable( | |
options = list(pageLength = 5, scrollX = TRUE), | |
caption = "Public Water Systems in the Four Corners Area", | |
rownames = FALSE | |
) | |
# Create a map view of the results | |
m <- mapview(four_corners_pws, zcol = "population_served_count", layer.name = "Population Served", label= "pws_name") | |
# Add the bounding box to the map | |
bbox_poly <- st_as_sf(st_as_sfc(st_bbox(c(xmin = bbox[1], ymin = bbox[2], xmax = bbox[3], ymax = bbox[4]), crs = 4326))) | |
m + mapview(bbox_poly, col.region = "red", alpha.regions=0, color="red", lwd=2, layer.name = "Query Bounding Box") | |
``` | |
#### Intersection | |
When it comes to spatial queries, we are not restricted to bounding box queries. We can pass any spatial predicate along with WKT geometries to a collection filter. Let's say we have several field sites near Farmington, NM, and we want to identify which HUC10 watersheds they fall within. We'll use the point-in-polygon query capability of the INTERECTS spatial; predicate to find this information: | |
```{r huc10} | |
# Define our field site (example coordinate near Farmington, NM) | |
site_lon <- -108.2186 | |
site_lat <- 36.7280 | |
# Construct the query | |
query <- sprintf("INTERSECTS(geometry, POINT(%f %f))", site_lon, site_lat) |> URLencode() | |
url <- paste0(base_url, "/collections/hu10/items?f=json&filter=", query) | |
# Make the API call | |
huc10 <- st_read(url, quiet = TRUE) |> | |
select(id,uri,name) | |
# Display the results table | |
datatable(huc10) | |
# Create a map | |
site_point <- st_point(c(site_lon, site_lat)) |> | |
st_sfc(crs = 4326) |> | |
st_sf() | |
mapview(huc10, zcol = "name", layer.name = "HUC10 Watershed") + | |
mapview(site_point, col.regions = "red", layer.name = "Field Site") | |
``` | |
Here we see that our field site is in the HUC10 1408010505, which has the associated Geoconnex URI https://geoconnex.us/ref/hu10/1408010505. This identifier can be used if we were to publish data about our site, following [Geoconenx guidance and best practices](https://docs.geoconnex.us/contributing/step-1/dataprep). | |
#### Intersection by URL reference | |
Complex features can have many coordinates, and thus requests via `?filter` to the API can be too long to format as URL. To get around this, the API supports a special intersection process that involves passing a URL for any GeoJSON feature to a collection. Let's say we want to know which counties intersect the Animas River (https://geoconnex.us/ref/mainstems/35394). | |
```{r process-intersector} | |
# Define the process endpoint | |
process_url <- "https://reference.geoconnex.us/processes/intersector/execution" | |
# Define the input parameters | |
input_params <- list( | |
inputs = list( | |
url = "https://geoconnex.us/ref/mainstems/35394", | |
collection = "counties" | |
) | |
) | |
# Execute the process | |
response <- POST( | |
url = process_url, | |
body = toJSON(input_params, auto_unbox = TRUE), | |
add_headers("Content-Type" = "application/json"), | |
encode = "json" | |
) | |
# Convert the result to an sf object | |
intersecting_counties <- st_read(response, quiet = TRUE) | |
# Create an interactive table of the results | |
intersecting_counties |> | |
st_drop_geometry() |> | |
select(name, uri) |> | |
datatable( | |
options = list(pageLength = 5, scrollX = TRUE), | |
caption = "Counties Intersecting the Animas River" | |
) | |
# Fetch the Animas River geometry | |
animas_river <- st_read("https://geoconnex.us/ref/mainstems/35394", quiet = TRUE) | |
# Create a map view of the results | |
mapview(intersecting_counties, zcol = "name", layer.name = "Intersecting Counties") + | |
mapview(animas_river, color = "blue", layer.name = "Animas River") | |
``` | |
Note that of the three counties intersecting the Animas River, two are named "San Juan", <https://geoconnex.us/ref/counties/08111> in Colorado, and <https://geoconnex.us/ref/counties/35045> in New Mexico, highlighting the importance of unique identifiers and the usefulness of HTTP identifiers that direct to spatial/data representations of a given feature. | |
## Use Case 2: Finding hydrologically related features | |
One of Geoconnex' main use cases is to support the discovery of hydrologically related monitoring locations. That's why one of the key uses of the Geoconnex API is to serve relationships between key hydrometric locations like Dams and Streamggages and mainstem rivers. Both the `dams` and `gages` collections include the attribute `mainstem_uri` that allows one to find all such features that are on a given mainstem. In addition, `dams` and `gages` have the attribute `subjectof` that directs to the URL of a source data system's record about the dam or streamgage. Below we demonstrate finding all dams and gages on the Animas River: | |
```{r hydro} | |
# Animas River mainstem URI | |
animas_uri <- "https://geoconnex.us/ref/mainstems/35394" | |
# Function to query features by mainstem URI | |
query_by_mainstem <- function(collection, mainstem_uri) { | |
query <- URLencode(sprintf("mainstem_uri = '%s'", mainstem_uri)) | |
url <- paste0(base_url, "/collections/", collection, "/items?f=json&filter=", query) | |
st_read(url, quiet = TRUE) | |
} | |
# Query dams and gages | |
animas_dams <- query_by_mainstem("dams", animas_uri) | |
animas_gages <- query_by_mainstem("gages", animas_uri) | |
# Function to create a hyperlink in HTML | |
create_link <- function(url) { | |
ifelse(!is.na(url) & url != "", | |
sprintf('<a href="%s" target="_blank">Link</a>', url), | |
"N/A") | |
} | |
# Create interactive tables | |
animas_dams <- animas_dams |> | |
# select(uri, name, subjectof) |> | |
mutate(subjectof = sapply(subjectof, create_link)) | |
datatable(animas_dams, options = list(pageLength = 5, scrollX = TRUE), | |
caption = "Dams on the Animas River", | |
escape = FALSE) # Allow HTML in the table | |
animas_gages <- animas_gages |> | |
select(uri, name, subjectof) |> | |
mutate(subjectof = sapply(subjectof, create_link)) | |
datatable(animas_gages,options = list(pageLength = 5, scrollX = TRUE), | |
caption = "Gages on the Animas River", | |
escape = FALSE) # Allow HTML in the table | |
# Fetch the Animas River geometry | |
animas_river <- st_read(animas_uri, quiet = TRUE) | |
# Create a map view of the results | |
mapview(animas_river, color = "blue", layer.name = "Animas River") + | |
mapview(animas_dams, col.regions = "red", layer.name = "Dams") + | |
mapview(animas_gages, col.regions = "green", layer.name = "Gages") | |
``` | |
## Use Case 3: Finding datasets realted to hydrologic features | |
The most important use case of the Geoconnex system for data users is to discover datasets related to a given hydrologic feature. This functionality is the reason for implementing a [Knowledge Graph approach](https://docs.geoconnex.us/about/principles). There are two ways we currently offer users to discover datasets: | |
1. Convenience references to "datasets" within Reference Feature Collections (currently implemented only for `mainstems`) | |
1. Offering full SPARQL access to the graph database at <https://graph.geoconnex.us> | |
### Convenience "dataset" reference in API | |
Each mainstem's GeoJSON response includes a nested array within the "datasets" attribute which can be extracted. In the example below, 58 datasets are available about the Animas River. These can be filtered downstream of the API call. For example, in the interactive data table, we can use the search bar to searchf or mention of "temperature". Of the 58, 6 datasets are regarding the `variableMeasured` "Temperature", and these appear to be 3 pairs of USGS datasets from 3 distinct USGS monitoring locations, with each pair having a 2 different download options. | |
```{r datasets} | |
# Animas River mainstem URI | |
animas_uri <- "https://geoconnex.us/ref/mainstems/35394" | |
# Fetch the Animas River data | |
response <- GET(animas_uri, query = list(f = "json")) | |
animas_data <- content(response, "text") |> fromJSON() | |
# Extract datasets | |
datasets <- animas_data$properties$datasets | |
datatable(datasets, | |
options = list(pageLength = 5, scrollX = TRUE, | |
search = list(regex=TRUE, caseInsensitive = TRUE, search = 'temperature') | |
) | |
) | |
``` | |
### SPARQL | |
For those who wish to make SPARQL queries directly, the endpoint is <https://graph.geoconnex.us/repositories/iow>. For example, one would query the graph for all datasets `schema:about` monitoring locations that are on (`<hyf:referencedPosition/hyf:HY_IndirectPosition/hyf:linearElement>`) the Animas River, that have a value for the `schema:variableMeasured` that includes the string "temperature". | |
```{r datasets-animas-sparql} | |
# SPARQL endpoint | |
endpoint <- "https://graph.geoconnex.us/repositories/iow" | |
# Revised SPARQL query | |
query <- 'PREFIX schema: <https://schema.org/> | |
PREFIX gsp: <http://www.opengis.net/ont/geosparql#> | |
PREFIX hyf: <https://www.opengis.net/def/schema/hy_features/hyf/> | |
SELECT DISTINCT ?monitoringLocation ?siteName ?datasetDescription ?type ?url | |
?variableMeasured ?variableUnit ?measurementTechnique ?temporalCoverage | |
?distributionName ?distributionURL ?distributionFormat ?wkt | |
WHERE { | |
VALUES ?mainstem { <https://geoconnex.us/ref/mainstems/35394> } | |
?monitoringLocation hyf:referencedPosition/hyf:HY_IndirectPosition/hyf:linearElement ?mainstem ; | |
schema:subjectOf ?item ; | |
hyf:HydroLocationType ?type ; | |
gsp:hasGeometry/gsp:asWKT ?wkt . | |
?item schema:name ?siteName ; | |
schema:temporalCoverage ?temporalCoverage ; | |
schema:url ?url ; | |
schema:variableMeasured ?variableMeasured . | |
?variableMeasured schema:description ?datasetDescription ; | |
schema:name ?variableMeasuredName ; | |
schema:unitText ?variableUnit ; | |
schema:measurementTechnique ?measurementTechnique . | |
OPTIONAL { | |
?item schema:distribution ?distribution . | |
?distribution schema:name ?distributionName ; | |
schema:contentUrl ?distributionURL ; | |
schema:encodingFormat ?distributionFormat . | |
} | |
# Filter datasets by the desired variable description | |
FILTER(REGEX(?datasetDescription, "temperature", "i")) | |
} | |
ORDER BY ?siteName | |
' | |
# Execute the SPARQL query | |
response <- GET( | |
url = endpoint, | |
query = list(query = query), | |
accept("application/sparql-results+json") | |
) | |
# Parse the JSON response | |
result <- content(response, "text", encoding = "UTF-8") %>% fromJSON() | |
# Extract the results | |
datasets <- as.data.frame(result$results$bindings) |> | |
mutate(across(everything(), ~.$value)) | |
datatable(datasets) | |
``` | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment