Skip to content

Instantly share code, notes, and snippets.

@PietrH
Created October 16, 2025 13:27
Show Gist options
  • Save PietrH/1a89f7ca7e48ba63aff59d8e695e3552 to your computer and use it in GitHub Desktop.
Save PietrH/1a89f7ca7e48ba63aff59d8e695e3552 to your computer and use it in GitHub Desktop.
Species that are unique to a single GBIF dataset, which dataset has the most?
# Finding species that are only a single dataset and nowhere else: human
# observations edition
if(file.exists("query_results/0024071-251009101135966.zip")) {
species_per_dataset <-
readr::read_delim("query_results/0024071-251009101135966.zip")
} else {
species_per_dataset <-
rgbif::occ_download_get("0024071-251009101135966") |>
rgbif::occ_download_import()
}
# Get the specieskey's that are only in one dataset
species_only_in_one_dataset <-
species_per_dataset |>
dplyr::group_by(specieskey) |>
dplyr::tally() |>
dplyr::filter(n == 1) |> # number of datasets that the specieskey is in == 1
dplyr::pull(specieskey)
# datasets with species that only occur in one dataset
species_in_only_one_dataset <-
species_per_dataset |>
dplyr::filter(specieskey %in% species_only_in_one_dataset)
# howmany of these are iNaturalist? Very many! 175k
species_in_only_one_dataset |>
dplyr::filter(datasetkey == "50c9509d-22c7-4a22-a47d-8c48425ef4a7")
# what about observation.org?
species_in_only_one_dataset |>
dplyr::filter(datasetkey == "8a863029-f435-446a-821e-275f4f641165")
# what dataset has the most of these?
uniques_per_dataset <-
species_in_only_one_dataset |>
dplyr::group_by(datasetkey) |>
dplyr::tally(sort = TRUE) |>
head(100) |>
# Let's get the titles of the 10 datasets with the most species only in that
# dataset
dplyr::mutate(
title =
purrr::map_chr(datasetkey, \(datasetkey) {
rgbif::dataset_get(datasetkey)$title
}, .progress = TRUE)
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment