-
-
Save slint/eb4bcb8bc572a37b9650b8c55e759fc9 to your computer and use it in GitHub Desktop.
#!/bin/bash | |
# To use this script you need to have "curl" and "jq" installed. | |
COMMUNITY_ID="community_id" | |
OUTPUT_CSV="${COMMUNITY_ID}_community_stats.csv" | |
# Create CSV file header | |
echo "URL,DOI,Title,PublicationDate,Views,Downloads" > "${OUTPUT_CSV}" | |
# Download all records (including multiple versions) from the community (max 10k records) | |
curl -s -G "https://zenodo.org/api/records/" -d "size=10000" -d "all_versions=true" -d "communities=${COMMUNITY_ID}" \ | |
`# Process with jq to extract the required fields` \ | |
| jq -r '.hits.hits[] | [.links.self, .metadata.doi, .metadata.title, .metadata.publication_date, .stats.views, .stats.downloads] | @csv' \ | |
>> "${OUTPUT_CSV}" |
#!/bin/bash | |
# To use this script you need to have "curl" and "jq" installed. | |
QUERY="grants.code:12345" | |
OUTPUT_CSV="[${QUERY}]_query_stats.csv" | |
# Create CSV file header | |
echo "URL,DOI,Title,PublicationDate,Views,Downloads" > "${OUTPUT_CSV}" | |
# Download all records (including multiple versions) for the query (max 10k records) | |
curl -s -G "https://zenodo.org/api/records/" -d "size=10000" -d "all_versions=true" -d "q=${QUERY}" \ | |
`# Process with jq to extract the required fields` \ | |
| jq -r '.hits.hits[] | [.links.self, .metadata.doi, .metadata.title, .metadata.publication_date, .stats.views, .stats.downloads] | @csv' \ | |
>> "${OUTPUT_CSV}" |
#!/bin/bash | |
# To use this script you need to have "curl" and "jq" installed. | |
USER_ID="12345" | |
OUTPUT_CSV="${USER_ID}_user_stats.csv" | |
# Create CSV file header | |
echo "URL,DOI,Title,PublicationDate,Views,Downloads" > "${OUTPUT_CSV}" | |
# Download all records (including multiple versions) for the user (max 10k records) | |
curl -s -G "https://zenodo.org/api/records/" -d "size=10000" -d "all_versions=true" -d "q=owners:${USER_ID}" \ | |
`# Process with jq to extract the required fields` \ | |
| jq -r '.hits.hits[] | [.links.self, .metadata.doi, .metadata.title, .metadata.publication_date, .stats.views, .stats.downloads] | @csv' \ | |
>> "${OUTPUT_CSV}" |
Hi, removing the front slash (/) and reducing the size does not help in my case - the original error is gone, but the wrong documents are fetched. The only workaround so far is to pack all options in the url: curl -G "https://zenodo.org/api/records?communities=operaseu&all_versions=true&size=10000" \
I manage to parse some communities that way (most of the smaller ones, anyway), but not all. E.g. like this :
curl -s -G "https://zenodo.org/api/records?communities=${COMMUNITY_ID}" -d "size=10000" -d "all_versions=true" | jq -r '.hits.hits[] | [.metadata.doi, .metadata.title, .stats.unique_views, .stats.unique_downloads, .metadata.communities[][]] |@csv' >> "${OUTPUT_CSV}"
works with community 'lory_unilu_tf' but not 'lory_unilu'. The first one is a smaller community, the second a bigger one. So far the biggest community that I managed was about 500 records, but fails at 600 records. I've tried leaving off the 'metadata.title' in the hope that some wonky character is generating the error, but that has not helped.
Hi, I am getting the same parse error. There seem to be two different issues:
remove the final '/' in the zenodo-url in the curl command. For instance, it works with:
curl -s -G "https://zenodo.org/api/records" -d "size=100" -d "all_versions=true" -d "communities=xxx"
(but not with "https://zenodo.org/api/records/")
It seems that the zenodo api no longer allows to harvest thousands of records in one go (504 Gateway Time-out). I tried with 4000 records => not working, with 100 records => working. The same happens in the browser, or with openrefine (which is how I used to do my end of year statistics last year).
Thanks for a hint on how to get all records (stats) for one specific community. It seems that the Zenodo API has changed after the update in October, because it worked until then.
Best regards, Kathrin