Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save jindrichmynarz/6bb63f7fe77cc4556134e11d254fb8e7 to your computer and use it in GitHub Desktop.
Save jindrichmynarz/6bb63f7fe77cc4556134e11d254fb8e7 to your computer and use it in GitHub Desktop.
Count class instances from LOD Laundromat
#!/usr/bin/env bash
#
# A demo script that retrieves counts of class instances from LOD Laundromat datasets
# Usage:
# Reads a list of newline separated class IRIs either from its first argument or standard input.
# $ count_class_instances_from_lod_laundromat.sh < classes.txt
#
# Caveat:
# Since we use an old-school shell script, class IRIs that contain special characters,
# like %-encoding or commas will break the CSV output.
# In general, using <https://github.com/LOD-Laundromat/Frank> might be a saner option.
set -e
die () {
echo >&2 "$@"
exit 1
}
# Test if JQ is installed.
command -v jq >/dev/null 2>&1 || die "Missing JQ!"
RDF_TYPE="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
VOID_TRIPLES="http://rdfs.org/ns/void#triples"
while IFS=$' \t\n\r' read -r CLASS; do
# Get datasets in which $CLASS occurs
curl \
--silent \
--data-urlencode "uri=${CLASS}" \
--get http://index.lodlaundromat.org/r2d |
jq --raw-output '.results | .[]' | # Output dataset IDs separated by newlines
while read -r DATASET; do
COUNT=$(curl \
--silent \
--header "Accept:application/ld+json" \
--data-urlencode "predicate=${RDF_TYPE}" \
--data-urlencode "object=${CLASS}" \
--get http://ldf.lodlaundromat.org/${DATASET} |
jq '..? | .["'${VOID_TRIPLES}'"]? | select (. != null)')
# Only output non-zero results
[ ${COUNT} -eq 0 ] || echo ${CLASS}","${DATASET}","${COUNT}
done
done < "${1:-/dev/stdin}" | # Read classes from the first argument or standard input
(echo "class,dataset,count" && cat) # Prefix output with a CSV header
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment