Skip to content

Instantly share code, notes, and snippets.

@lovasoa
Last active October 9, 2018 12:12
Show Gist options
  • Save lovasoa/0889f1dabeef4ae8ce75efdbe79c3fe9 to your computer and use it in GitHub Desktop.
Save lovasoa/0889f1dabeef4ae8ce75efdbe79c3fe9 to your computer and use it in GitHub Desktop.
Extract information from wikidata in one line of bash (with `jq`)
# Wikidata can be queried in SPARQL using https://query.wikidata.org/
# However, result size is limited. So this little script processes official wikidata dumps in order to extract information.
# This allows to make simple queries, and stream the results
curl --silent "https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2" | bunzip2 | jq --stream -c -M -r '
select(
(( .[0][1] == "labels" and .[0][2] == "en" and .[0][3] == "value" ) and length > 1) or
( .[0][1] == "claims" and .[0][2] == "P856" and .[0][6] == "value" )
) | .[1] '
@diebridge
Copy link

It's a masterpiece! it helps me a lot !! Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment