Skip to content

Instantly share code, notes, and snippets.

@hectorcorrea
Created October 23, 2024 14:46
Show Gist options
  • Save hectorcorrea/5f668038b28b2af633a0439bceac9789 to your computer and use it in GitHub Desktop.
Save hectorcorrea/5f668038b28b2af633a0439bceac9789 to your computer and use it in GitHub Desktop.
Compare list of files in Globus with list of files in PDC Describe
require "json"
doi = ARGV[0]
if doi == nil
puts "Must provide a DOI"
return
end
# List of files from Globus
# fetch via https://github.com/pulibrary/rdss-handbook/blob/main/globus_cli.md#a-complete-example
txt_file = "#{doi}.txt"
txt_files = File.readlines(txt_file).map(&:chomp)
# List of files from PDC Describe
# https://datacommons.princeton.edu/describe/works/467/file-list?_=1729628275940
json_file = "#{doi}.json"
json_files = JSON.parse(File.read(json_file))
puts "Globus count: #{txt_files.count}, PDC count: #{json_files.count}"
puts "Extra files:"
if json_files.count > txt_files.count
json_files.each do |file|
filename = file["filename_display"]
if txt_files.find {|x| x == filename} != nil
# puts "OK - #{filename}"
else
puts "#{filename}"
# require "byebug"; byebug
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment