Skip to content

Instantly share code, notes, and snippets.

@Bajena
Created January 29, 2020 06:46
Show Gist options
  • Select an option

  • Save Bajena/8412fdc8e0613938a652cd4c78fd31b2 to your computer and use it in GitHub Desktop.

Select an option

Save Bajena/8412fdc8e0613938a652cd4c78fd31b2 to your computer and use it in GitHub Desktop.
class Loader
def load
Enumerator.new { |main_enum| stream(main_enum) }
end
private
def stream(main_enum)
reader = nil
file_uri.open do |file|
reader = Zlib::GzipReader.new(file)
reader.each_line.lazy.drop(1).each do |line|
main_enum << preprocess_row(line)
end
end
ensure
reader&.close
end
def file_uri
URI.parse("ftp://user:password@host.com/file.csv.gz")
end
def preprocess_row(row)
row.chomp.gsub('"', "").split(",")
end
end
@SampsonCrowley
Copy link
Copy Markdown

building a proper streaming CSV parser, you would actually open an IO object, pass that into CSV.foreach, and then feed each line into the IO

@SampsonCrowley
Copy link
Copy Markdown

what about CSVs containing quoted newlines, nested quotes, etc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment