Last active
October 21, 2016 02:19
-
-
Save billdueber/4b17badc745a1c83ef90edea266aa595 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# A simpler example: create an object which takes | |
# in a file name and allows you to iterate over | |
# only the comments (lines that start with '#') | |
# | |
# Obviously, it'd be pretty easy to do this inline, but it's just an | |
# example. One could also imagine it returning every set of | |
# contiguous comment lines as a single comment, or even being smart enough to | |
# do C-style /* ... */ comments and extract those. | |
# | |
# The point is that you've got that all hidden in a class, and the user | |
# of it just needs to call #each or #find or whatever | |
# | |
# Note that it's pretty clear how you'd turn this into the 'grep' command | |
class CommentReader | |
include Enumerable | |
# Should take either a filename or an IO object, but whatever | |
def initialize(filename) | |
@file = File.open(filename, 'r:utf-8') | |
end | |
def each(&blk) | |
@file.each_line do |line| | |
if line =~ /\A\s*#/ # does it start with optional space and a hash? | |
yield line.chomp | |
else | |
# don't do a damn thing. Just go to the top and get the next line. | |
end | |
end | |
end | |
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'marc' | |
class AlephSquentialFile | |
include Enumerable; | |
def initialize(file_or_io_object) | |
@file = file_or_io_object | |
end | |
def turn_a_bunch_of_lines_into_a_MARC_record(bunch_of_lines) | |
# implementation left up to the reader | |
# Throw an AlephSquentialFile::Malformed if something is wrong | |
# return a MARC::Record object on success | |
end | |
# By defining #each, I get all the other Enumerable | |
# methods for free. Of course, with a 2GB file of these | |
# things, you probably don't want to call, e.g., #map | |
# unless you're sure you're doing it lazily. Then again, | |
# memory *is* cheap these days... | |
# In real life I'd capture the line number and resuce any errors | |
# so one bad line wouldn't grind the whole thing to a halt and | |
# could just be logged | |
def each(&blk) | |
@file.each_line.slice_when{|a,b| a[0..8] != b[0..8]}.each do |bunch_of_lines| | |
record = turn_a_bunch_of_lines_into_a_MARC_record(bunch_of_lines) | |
yield record # exactly the same as blk.call(record) | |
end | |
end | |
end | |
# Now I can just use it as an iterator that spits out MARC::Record objects | |
# and not care what's going on underneath. | |
# Get the titles | |
AlephSquentialFile.new(File.open('myfile.seq')).each do |record| | |
title = record['245'] # record is a MARC::Record object | |
puts title | |
end | |
# Find the record with the latest publication date, stored in chars 7-10 of the 008 field | |
# Look! I get to use max_by! | |
latest_record = AlephSquentialFile.new(File.open('myfile.seq')).max_by{|r| r['008'].value[7..10].to_i} | |
# Find everything written by me. Again, be careful about getting giant sets back in real life. | |
bills_stuff = AlephSquentialFile.new(File.open('myfile.seq')).find_all{|r| r.author =~ /dueberb/i} | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment