Skip to content

Instantly share code, notes, and snippets.

@rentalcustard
Created July 31, 2012 14:27
Show Gist options
  • Save rentalcustard/3217386 to your computer and use it in GitHub Desktop.
Save rentalcustard/3217386 to your computer and use it in GitHub Desktop.
Stateful CSV parsing sketch - pseudocode-ish.
class CSVParser
def initialize(line_handler=nil)
@line_handler = line_handler
@current_line = []
@lines = []
end
def receive(chunk)
#handle the chunk depending on state
#if we detect end of line:
@line_handler.handle(@current_line) if @line_handler
@lines << @current_line
@current_line = []
end
def data
@lines
end
end
#parsing CSV line-by-line
class LineHandler
def handle(line)
puts line.inspect #or whatever
end
end
class LinewiseCSVParserClient
def parse_csv
parser = CSVParser.new(LineHandler.new)
input = SomeInput.new
input.each_chunk do |data|
parser.receive(data)
end
end
end
#parsing CSV as a whole
class WholeFileCSVParserClient
def parse_csv
parser = CSVParser.new
parser.receive(File.read("some_path"))
return parser.data
end
end
#if above is too much code:
class CSVParser
def self.parse(io)
parser = self.new
parser.receive(io.read)
parser.data
end
end
@arp
Copy link

arp commented Jul 31, 2012

Yes, there's a case of functionality duplication anyways: either buffering or newline-handling. Sounds like CSV quantum uncertainty principle :)

In libcsv, detecting the final chunk is put onto the developer: you just need to call csv_fini() as the last operation which makes libcsv assume that the latest feeded chunk was actually the last one. In Ruby, this can be achieved by using blocks and finalizing stream consumption just after yield.

@JEG2
Copy link

JEG2 commented Jul 31, 2012

Yeah, something like finish() would be required, to detect the case I mentioned.

So yeah, we all agree that the evented model works. I'm not convinced it's superior to the correct approach where it's easy for me to do things like support Ruby's normal suite of iterators. I agree that it works though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment