Skip to content

Instantly share code, notes, and snippets.

@waynegraham
Created April 28, 2021 20:05
Show Gist options
  • Save waynegraham/cdcdd2300da36f6b25b89aa81227cc48 to your computer and use it in GitHub Desktop.
Save waynegraham/cdcdd2300da36f6b25b89aa81227cc48 to your computer and use it in GitHub Desktop.
Check Resource Counts
# frozen_string_literal: true
source "https://rubygems.org"
git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
gem "mechanize"
gem 'progress_bar'
gem 'terminal-table'
require 'colorize'
require 'mechanize'
require 'progress_bar'
require 'terminal-table'
@base_url = 'https://dlmenetwork.org/library/browse'
@agent = Mechanize.new
namespace :test do
desc 'Test landing page item counts'
task :landing_page do
rows = []
bar = ProgressBar.new(categories.size)
@page = @agent.get(@base_url)
categories = @page.search("//div[contains(@class, 'category')]")
categories.each do |category|
# extract items
count = category.search('small').text.gsub(/item?(s?)/, '').strip.to_i
label = category.search('span[@class="title"]').text
link = category.search('a').first
# go to page
view = @agent.click(link)
page_count = view.search('small').text.gsub(/item?(s?)/, '').strip.to_i
difference = page_count - count
bar.puts "Checking #{label}".green
bar.increment!
rows << [label, count, page_count, difference]
end
table = Terminal::Table.new headings: ['Category', 'Index Count', 'Page Count', 'Difference'], rows: rows
puts table
end
end
+--------------------------------------------------------------+-------------+------------+------------+
| Category | Index Count | Page Count | Difference |
+--------------------------------------------------------------+-------------+------------+------------+
| Manuscripts from the Free Library of Philadelphia | 87 | 87 | 0 |
| Abdul-Hamid II Books and Serials, Library of Congress | 321 | 3414 | 3093 |
| Manuscripts from the University of Pennsylvania Libraries | 692 | 692 | 0 |
| Persian Language Rare Materials, Library of Congress | 5232 | 5232 | 0 |
| Manuscripts from Haveford College | 32 | 32 | 0 |
| Manuscripts from Bryn Mawr College | 22 | 22 | 0 |
| Rare Books & Manuscripts, Columbia University Library | 220 | 220 | 0 |
| Manuscripts from the Library Company of Philadelphia | 4 | 4 | 0 |
| Sakip Sabanci Museum's Emirgân Archive | 305 | 3515 | 3210 |
| Abdul Hamid II Photograph Collection, Library of Congress | 1817 | 1818 | 1 |
| Manuscripts from the American Philosophical Society | 4 | 4 | 0 |
| Libraries of the Greek & Armenian Patriarchates in Jerusalem | 1002 | 1002 | 0 |
| Manuscripts from the Philadelphia Museum of Art | 1 | 1 | 0 |
| Muhammad Ali Eltaher Collection, Library of Congress | 105 | 5232 | 5127 |
| Manuscripts from St. Catherine's Monastery, Mt. Sinai | 1687 | 1687 | 0 |
| Medical Manuscripts | 189 | 31386 | 31197 |
| Qur'an Manuscripts | 186 | 141285 | 141099 |
| Persian Manuscripts | 677 | 677 | 0 |
| Manuscripts in Naskh Script | 1472 | 69247 | 67775 |
| Mathematical Manuscripts | 183 | 31386 | 31203 |
| Manuscripts in Muhaqqaq Script | 18 | 69247 | 69229 |
| Cairo Genizah Manuscripts | 22743 | 31386 | 8643 |
| Richard B. Parker Nile Watercraft Photographs | 75 | 5705 | 5630 |
| Arabic Manuscripts | 7211 | 7211 | 0 |
| Manuscripts in Riqa Script | 10 | 69247 | 69237 |
| Émile Béchard's Oriental Studies Photographs | 90 | 5705 | 5615 |
| Astronomy Manuscripts | 130 | 31386 | 31256 |
| Ottoman Turkish Manuscripts | 182 | 182 | 0 |
| Manuscripts in Thuluth Script | 84 | 69247 | 69163 |
| Manuscripts in Nastaliq Script | 515 | 69247 | 68732 |
| Turkish Painting: Ottoman Reformation to the Republic | 492 | 3515 | 3023 |
| Abidin Dino Archive | 2118 | 141285 | 139167 |
| Hassan Fathy Architectural Archives | 56 | 3931 | 3875 |
| Ramses Wissa Wassef Architectural Drawings | 48 | 5705 | 5657 |
| K.A.C. Creswell Photographs of Islamic Architecture | 1006 | 5705 | 4699 |
+--------------------------------------------------------------+-------------+------------+------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment