Skip to content

Instantly share code, notes, and snippets.

@pgwillia
Last active February 25, 2025 20:42
Show Gist options
  • Save pgwillia/e96d9112e129e3e551d8e410a2b70628 to your computer and use it in GitHub Desktop.
Save pgwillia/e96d9112e129e3e551d8e410a2b70628 to your computer and use it in GitHub Desktop.
ERA Open Access report
year count community collection collection url
2025 22 The department of Cat The annals of 'Cat International' http://era.lvh.me:3000/communities/16ebf0dd-19f0-4ee7-a8f5-dfd533652a81/collections/786ee7fe-c345-4a92-8a3e-833724737f62
2025 2 The department of Cat Theses about cats http://era.lvh.me:3000/communities/16ebf0dd-19f0-4ee7-a8f5-dfd533652a81/collections/618912fe-0d3f-470b-bb99-1ee7bacd69a3
2025 22 The department of Unicorn The annals of 'Unicorn International' http://era.lvh.me:3000/communities/d49ff70a-24fd-42a5-97d0-ac1c7747f56e/collections/99df8858-f7bc-4c25-9dc2-cc7181ced3fc
2025 22 Special reports about dogs The annals of 'Dog International' http://era.lvh.me:3000/communities/0917355b-39a7-4484-bdc7-43e9a6a0ee39/collections/2b8f986f-5123-4148-82fb-56794da56cc9
2025 2 Special reports about dogs Theses about dogs http://era.lvh.me:3000/communities/0917355b-39a7-4484-bdc7-43e9a6a0ee39/collections/61618a4e-16a7-4814-bd23-80afbc38f396
2025 2 The department of Unicorn Theses about unicorns http://era.lvh.me:3000/communities/d49ff70a-24fd-42a5-97d0-ac1c7747f56e/collections/cc41166d-9f59-4944-aa7c-5bc4cd57bed1
2025 22 Special reports about hamburgers The annals of 'Hamburger International' http://era.lvh.me:3000/communities/c7e8e2c1-5448-4277-ae63-2d0158bef0a4/collections/a45f7a22-b6b3-4cc0-bbab-7de707b2ca01
2025 2 Special reports about hamburgers Theses about hamburgers http://era.lvh.me:3000/communities/c7e8e2c1-5448-4277-ae63-2d0158bef0a4/collections/242b3e37-3ebd-4a4e-89bb-780b1f1d8697
2025 22 The department of Librarian The annals of 'Librarian International' http://era.lvh.me:3000/communities/a9b73242-610a-4939-99c5-8aaed1e3b2f6/collections/3a1db360-5368-4985-8d04-90051f7ccd5a
2025 2 The department of Librarian Theses about librarians http://era.lvh.me:3000/communities/a9b73242-610a-4939-99c5-8aaed1e3b2f6/collections/7f2ac0dd-64d4-48ae-97dd-6e0e02d81048
2000 3 Special reports about dogs The annals of 'Dog International' http://era.lvh.me:3000/communities/0917355b-39a7-4484-bdc7-43e9a6a0ee39/collections/2b8f986f-5123-4148-82fb-56794da56cc9
CSV.open('open_access.csv', 'wb') do |csv|
csv << ['year', 'count', 'community', 'collection', 'collection url']
open_access_items = Item.select(:member_of_paths, :record_created_at)
.where(visibility: JupiterCore::VISIBILITY_PUBLIC)
.group_by {|item| item.record_created_at.year }
open_access_items.each do |year, items|
items.map(&:member_of_paths).flatten.tally.each do |member_of_path, count|
community_id, collection_id = member_of_path.split('/')
csv << [year, count, Community.find(community_id).title, Collection.find(collection_id).title,
Rails.application.routes.url_helpers.community_collection_url(community_id, collection_id)]
end
end
end
@ConnorSheremeta
Copy link

ConnorSheremeta commented Feb 25, 2025

Looks good! Had a quick look at this as my day is ending, a deeper dive could be done.

I think a final tally of the total item count by year (throughout all collections) may be beneficial although that could easily be done after the fact in sheets/excel if needed.

Some optimisation could be done, however it may not be worthwhile for a one-time script. Something that comes to mind would be utilising ActiveRecord's in_batches/find_in_batches for line 3/6 however that would require testing (and setting the batch size as the default is 1000, which wouldn't test well in the test environment.)

Given that this is a one-time script and the time constraint I don't see any issues with this as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment