Created
July 31, 2012 14:27
-
-
Save rentalcustard/3217386 to your computer and use it in GitHub Desktop.
Stateful CSV parsing sketch - pseudocode-ish.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class CSVParser | |
def initialize(line_handler=nil) | |
@line_handler = line_handler | |
@current_line = [] | |
@lines = [] | |
end | |
def receive(chunk) | |
#handle the chunk depending on state | |
#if we detect end of line: | |
@line_handler.handle(@current_line) if @line_handler | |
@lines << @current_line | |
@current_line = [] | |
end | |
def data | |
@lines | |
end | |
end | |
#parsing CSV line-by-line | |
class LineHandler | |
def handle(line) | |
puts line.inspect #or whatever | |
end | |
end | |
class LinewiseCSVParserClient | |
def parse_csv | |
parser = CSVParser.new(LineHandler.new) | |
input = SomeInput.new | |
input.each_chunk do |data| | |
parser.receive(data) | |
end | |
end | |
end | |
#parsing CSV as a whole | |
class WholeFileCSVParserClient | |
def parse_csv | |
parser = CSVParser.new | |
parser.receive(File.read("some_path")) | |
return parser.data | |
end | |
end | |
#if above is too much code: | |
class CSVParser | |
def self.parse(io) | |
parser = self.new | |
parser.receive(io.read) | |
parser.data | |
end | |
end |
Yeah, something like finish() would be required, to detect the case I mentioned.
So yeah, we all agree that the evented model works. I'm not convinced it's superior to the correct approach where it's easy for me to do things like support Ruby's normal suite of iterators. I agree that it works though.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Yes, there's a case of functionality duplication anyways: either buffering or newline-handling. Sounds like CSV quantum uncertainty principle :)
In libcsv, detecting the final chunk is put onto the developer: you just need to call csv_fini() as the last operation which makes libcsv assume that the latest feeded chunk was actually the last one. In Ruby, this can be achieved by using blocks and finalizing stream consumption just after yield.