Created
May 31, 2011 13:04
-
-
Save pigoz/1000465 to your computer and use it in GitHub Desktop.
Ruby SQL to CSV exporter
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require "rubygems" | |
begin | |
require 'sequel' | |
rescue LoadError | |
puts "Please run gem install sequel" | |
exit! | |
end | |
DB = Sequel.connect(:adapter => 'mysql2', | |
:host => '127.0.0.1', | |
:database => '???', | |
:user => '???', | |
:password => '???') | |
Dir['export/*.csv'].each { |f| File.delete(f); puts "deleted: #{f}" } | |
class CVSSQLExporter | |
def initialize(filename, db, sql) | |
@filename = File.join('export', filename) | |
@db = db | |
@sql = sql | |
end | |
def export() | |
require 'csv' | |
CSV.open(@filename, 'wb') do |csv| | |
first = true | |
@db[@sql].all.each do |row| | |
csv << row.inject([]) { |r, n| r << n[0] } and first = false if first | |
csv << row.inject([]) { |r, n| r << n[1] } | |
end | |
end | |
puts "exported: #{@filename}" | |
end #/def export | |
end | |
foo_sql = <<-SQL | |
select * | |
from bar as foo | |
SQL | |
CVSSQLExporter.new('foo.csv', DB, foo_sql).export |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you for leaving this behind, it was a useful template for me today.
and first = false
is a neat trick. One question I have:You specify
@db[@sql].all.each
which loads the entire dataset into memory first (.all
), and then iterates. Is there a specific reason you chose that over streaming the records individually ala@db[@sql].each
? It worked fine for me today against SQL Server, but maybe it doesn't work for all DB systems. Or maybe there's something specific to my scenario that allowed it. But for me, it kept my memory footprint low (500 MB vs 3 GB). If someone stumbles across this in the future, you may wish to take note if you run into memory issues.Additionally, I found a helpful to accept a dataset argument instead of specifying raw SQL. This allowed me to explore my data in "sequel style", and also send the traditional
DB[sql]
if I wished.If anyone finds it helpful, here was my solution using this gist.
Problem: Take a 5 million row table and export it into smaller CSVs that Excel could open (limit ~1mil). I chose to export the table grouped by a specific set of keys.
Unrelated, but I thought it was cool that a select distinct returns a hash, and the
.where
method accepts a hash. It made it really easy to dynamically partition my results in different iterations.Thanks again pigoz!