-
-
Save pigoz/1000465 to your computer and use it in GitHub Desktop.
require "rubygems" | |
begin | |
require 'sequel' | |
rescue LoadError | |
puts "Please run gem install sequel" | |
exit! | |
end | |
DB = Sequel.connect(:adapter => 'mysql2', | |
:host => '127.0.0.1', | |
:database => '???', | |
:user => '???', | |
:password => '???') | |
Dir['export/*.csv'].each { |f| File.delete(f); puts "deleted: #{f}" } | |
class CVSSQLExporter | |
def initialize(filename, db, sql) | |
@filename = File.join('export', filename) | |
@db = db | |
@sql = sql | |
end | |
def export() | |
require 'csv' | |
CSV.open(@filename, 'wb') do |csv| | |
first = true | |
@db[@sql].all.each do |row| | |
csv << row.inject([]) { |r, n| r << n[0] } and first = false if first | |
csv << row.inject([]) { |r, n| r << n[1] } | |
end | |
end | |
puts "exported: #{@filename}" | |
end #/def export | |
end | |
foo_sql = <<-SQL | |
select * | |
from bar as foo | |
SQL | |
CVSSQLExporter.new('foo.csv', DB, foo_sql).export |
Thank you for leaving this behind, it was a useful template for me today. and first = false
is a neat trick. One question I have:
You specify @db[@sql].all.each
which loads the entire dataset into memory first (.all
), and then iterates. Is there a specific reason you chose that over streaming the records individually ala @db[@sql].each
? It worked fine for me today against SQL Server, but maybe it doesn't work for all DB systems. Or maybe there's something specific to my scenario that allowed it. But for me, it kept my memory footprint low (500 MB vs 3 GB). If someone stumbles across this in the future, you may wish to take note if you run into memory issues.
Additionally, I found a helpful to accept a dataset argument instead of specifying raw SQL. This allowed me to explore my data in "sequel style", and also send the traditional DB[sql]
if I wished.
If anyone finds it helpful, here was my solution using this gist.
Problem: Take a 5 million row table and export it into smaller CSVs that Excel could open (limit ~1mil). I chose to export the table grouped by a specific set of keys.
class SqlCsvExporter
def initialize(filename, dataset)
@filename = File.join('export', "#{filename}.csv").gsub(/(.csv)+/, '.csv')
@dataset = dataset
end
def export
require 'csv'
CSV.open(@filename, 'wb') do |csv|
first = true
@dataset.each do |row|
csv << row.inject([]) { |r, n| r << n[0] } and first = false if first
csv << row.inject([]) { |r, n| r << n[1] }
end
end
puts "exported: #{@filename}"
end
end
# ... Set DB
table = DB[:relatively_large_table]
partitions = table.distinct.select(:website, :zip_code).all
partitions.each do |filter|
puts filename = filter.values.join.gsub(/[^a-z0-9]+/i, '_')
SqlCsvExporter.new(filename,
table.where(filter).order(:year_week, :brand, :size, :unique_product_number)
).export
end
Unrelated, but I thought it was cool that a select distinct returns a hash, and the .where
method accepts a hash. It made it really easy to dynamically partition my results in different iterations.
Thanks again pigoz!
Thanks a ton.This is really useful
Very useful, thank you.