Last active
October 9, 2018 12:12
-
-
Save lovasoa/0889f1dabeef4ae8ce75efdbe79c3fe9 to your computer and use it in GitHub Desktop.
Extract information from wikidata in one line of bash (with `jq`)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Wikidata can be queried in SPARQL using https://query.wikidata.org/ | |
# However, result size is limited. So this little script processes official wikidata dumps in order to extract information. | |
# This allows to make simple queries, and stream the results | |
curl --silent "https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2" | bunzip2 | jq --stream -c -M -r ' | |
select( | |
(( .[0][1] == "labels" and .[0][2] == "en" and .[0][3] == "value" ) and length > 1) or | |
( .[0][1] == "claims" and .[0][2] == "P856" and .[0][6] == "value" ) | |
) | .[1] ' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It's a masterpiece! it helps me a lot !! Thank you!!