Last active
December 16, 2015 16:09
-
-
Save mmmries/5460684 to your computer and use it in GitHub Desktop.
Tab-separated parsing using Ruby 2.0 CSV library
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The main parse method is mostly borrowed from a tweet by @JEG2 | |
class StrictTsv | |
attr_reader :filepath | |
def initialize(filepath) | |
@filepath = filepath | |
end | |
def parse | |
open(filepath) do |f| | |
headers = f.gets.strip.split("\t") | |
f.each do |line| | |
fields = Hash[headers.zip(line.split("\t"))] | |
yield fields | |
end | |
end | |
end | |
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tsv = Vendor::StrictTsv.new("your_file.tsv") | |
tsv.parse do |row| | |
puts row['named field'] | |
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'csv' | |
line = 'boogie\ttime\tis "now"' | |
begin | |
line = CSV.parse_line(line, col_sep: "\t") | |
puts "parsed correctly" | |
rescue CSV::MalformedCSVError | |
puts "failed to parse line" | |
end | |
begin | |
line = CSV.parse_line(line, col_sep: "\t", quote_char: "Ƃ") | |
puts "parsed correctly with random quote char" | |
rescue CSV::MalformedCSVError | |
puts "failed to parse line with random quote char" | |
end | |
#Output: | |
# failed to parse line | |
# parsed correctly with random quote char |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Having a need to parse a huge file and inspired by this snippet, we've written a bit more complex solution, with ability to switch headers on or off and access rows in both array- and hash-like way. I'd appreciate if you could take a look at it and possibly provide feedback.
Thanks,
Slotos