Skip to content

Instantly share code, notes, and snippets.

@maakuth
Created October 2, 2020 13:49
Show Gist options
  • Save maakuth/7d9c95081c09ee035803dd8e5b0f88d6 to your computer and use it in GitHub Desktop.
Save maakuth/7d9c95081c09ee035803dd8e5b0f88d6 to your computer and use it in GitHub Desktop.
Here's a small thing I made that parses file_id.diz files from inside zip files and outputs some files into Internet Archive metadata format. I used this one to populate https://archive.org/details/suomipelit?tab=collection
#!/usr/bin/env ruby
require 'csv'
require 'zip'
require 'pp'
Zip.force_entry_names_encoding = 'ISO-8859-1'
entries = []
Dir.glob("*.zip").each do |file|
Zip::File.open(file) do |zipfile|
zipfile.entries.each do |entry|
if entry.name.downcase == 'file_id.diz'
entryhash = {filename: file}
fd = entry.get_input_stream
fd.readlines.each do |line|
line.encode!('UTF-8', 'Windows-1252')
linevalue = line.split(':')[1]
next if linevalue.nil?
linevalue.strip!
if line =~ /Tekij/
entryhash[:author] = linevalue
elsif line =~ /Nimi/
entryhash[:name] = linevalue
elsif line =~ /Versio/
entryhash[:version] = linevalue
end
end
entries << entryhash
end
end
end
end
CSV.open("pelit.csv", "wb") do |csv|
entries.each do |entry|
csv << ["suomipelit-#{entry[:filename]}", entry[:filename], "#{entry[:name]} #{entry[:version]}", entry[:author]]
end
end
pp entries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment