Skip to content

Instantly share code, notes, and snippets.

@billdueber
Last active October 21, 2016 02:19
Show Gist options
  • Save billdueber/4b17badc745a1c83ef90edea266aa595 to your computer and use it in GitHub Desktop.
Save billdueber/4b17badc745a1c83ef90edea266aa595 to your computer and use it in GitHub Desktop.
# A simpler example: create an object which takes
# in a file name and allows you to iterate over
# only the comments (lines that start with '#')
#
# Obviously, it'd be pretty easy to do this inline, but it's just an
# example. One could also imagine it returning every set of
# contiguous comment lines as a single comment, or even being smart enough to
# do C-style /* ... */ comments and extract those.
#
# The point is that you've got that all hidden in a class, and the user
# of it just needs to call #each or #find or whatever
#
# Note that it's pretty clear how you'd turn this into the 'grep' command
class CommentReader
include Enumerable
# Should take either a filename or an IO object, but whatever
def initialize(filename)
@file = File.open(filename, 'r:utf-8')
end
def each(&blk)
@file.each_line do |line|
if line =~ /\A\s*#/ # does it start with optional space and a hash?
yield line.chomp
else
# don't do a damn thing. Just go to the top and get the next line.
end
end
end
end
require 'marc'
class AlephSquentialFile
include Enumerable;
def initialize(file_or_io_object)
@file = file_or_io_object
end
def turn_a_bunch_of_lines_into_a_MARC_record(bunch_of_lines)
# implementation left up to the reader
# Throw an AlephSquentialFile::Malformed if something is wrong
# return a MARC::Record object on success
end
# By defining #each, I get all the other Enumerable
# methods for free. Of course, with a 2GB file of these
# things, you probably don't want to call, e.g., #map
# unless you're sure you're doing it lazily. Then again,
# memory *is* cheap these days...
# In real life I'd capture the line number and resuce any errors
# so one bad line wouldn't grind the whole thing to a halt and
# could just be logged
def each(&blk)
@file.each_line.slice_when{|a,b| a[0..8] != b[0..8]}.each do |bunch_of_lines|
record = turn_a_bunch_of_lines_into_a_MARC_record(bunch_of_lines)
yield record # exactly the same as blk.call(record)
end
end
end
# Now I can just use it as an iterator that spits out MARC::Record objects
# and not care what's going on underneath.
# Get the titles
AlephSquentialFile.new(File.open('myfile.seq')).each do |record|
title = record['245'] # record is a MARC::Record object
puts title
end
# Find the record with the latest publication date, stored in chars 7-10 of the 008 field
# Look! I get to use max_by!
latest_record = AlephSquentialFile.new(File.open('myfile.seq')).max_by{|r| r['008'].value[7..10].to_i}
# Find everything written by me. Again, be careful about getting giant sets back in real life.
bills_stuff = AlephSquentialFile.new(File.open('myfile.seq')).find_all{|r| r.author =~ /dueberb/i}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment