Skip to content

Instantly share code, notes, and snippets.

@mmmries
Last active December 16, 2015 16:09
Show Gist options
  • Save mmmries/5460684 to your computer and use it in GitHub Desktop.
Save mmmries/5460684 to your computer and use it in GitHub Desktop.
Tab-separated parsing using Ruby 2.0 CSV library
# The main parse method is mostly borrowed from a tweet by @JEG2
class StrictTsv
attr_reader :filepath
def initialize(filepath)
@filepath = filepath
end
def parse
open(filepath) do |f|
headers = f.gets.strip.split("\t")
f.each do |line|
fields = Hash[headers.zip(line.split("\t"))]
yield fields
end
end
end
end
tsv = Vendor::StrictTsv.new("your_file.tsv")
tsv.parse do |row|
puts row['named field']
end
require 'csv'
line = 'boogie\ttime\tis "now"'
begin
line = CSV.parse_line(line, col_sep: "\t")
puts "parsed correctly"
rescue CSV::MalformedCSVError
puts "failed to parse line"
end
begin
line = CSV.parse_line(line, col_sep: "\t", quote_char: "Ƃ")
puts "parsed correctly with random quote char"
rescue CSV::MalformedCSVError
puts "failed to parse line with random quote char"
end
#Output:
# failed to parse line
# parsed correctly with random quote char
@ToddFincannon
Copy link

Line 12 of strict_tsv.rb needs to use line.strip.split to remove the line terminator from the last column in each row.

@Slotos
Copy link

Slotos commented Jul 10, 2014

Having a need to parse a huge file and inspired by this snippet, we've written a bit more complex solution, with ability to switch headers on or off and access rows in both array- and hash-like way. I'd appreciate if you could take a look at it and possibly provide feedback.

Thanks,
Slotos

@mmmries
Copy link
Author

mmmries commented Feb 2, 2015

@Slotos your gem looks really helpful. I'm glad to have contributed in some way to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment