Skip to content

Instantly share code, notes, and snippets.

@chrisle
Created November 2, 2012 16:46
Show Gist options
  • Save chrisle/4002603 to your computer and use it in GitHub Desktop.
Save chrisle/4002603 to your computer and use it in GitHub Desktop.
JSON > CSV > data_miner > database
# Converts JSON data into CSV and writes to a temporary CSV file
require 'ruport'
# see ruport_19.rb
require 'monkey_patches/ruport_19'
class CsvWriter
# Initialize an instance of CsvWriter
def initialize
yield(self) if block_given?
end
# Use Ruport to convert JSON into a CSV
#
# === Example
#
# json_data = { "hello" => "world" }
# file_name = "/tmp/helloworld.csv"
#
# CsvWriter.new do |c|
# c.json json_data
# c.output file_name
# end
def json(data)
unless data.empty?
headers = (data.class == Array) ? data.first.keys : data.keys
table = Ruport.Table :data => data, :column_names => headers
@csv_data = table.as(:csv)
end
end
# Pass through CSV data to the temporary file
#
# === Example
#
# csv_str = "1,2,3,4"
# file_name = "/tmp/helloworld.csv"
#
# CsvWriter.new do |c|
# c.csv csv_str
# c.output file_name
# end
def csv(data)
@csv_data = data
end
# Writes out to a CSV file. Handles unicode characters
def output(filename)
File.open(filename, 'w') do |f|
f.write @csv_data.encode('utf-8', 'iso-8859-1')
end
end
end
irb(main):001:0> MyModel.perform
# Example JSON file...
#
# {
# id: 1,
# something: 'hello',
# something_else: 'world',
# dont_import_me: 'big document to ignore'
# }
class MyModel < ActiveRecord::Base
# Migrationless schema... see data_miner gem's docs
self.primary_key = "id"
col :id, :type => :primary_key
col :something
col :something_else
data_miner do
process :auto_upgrade!
# I take data as CSV or JSON. CSV doesn't need conversion before importing
# but JSON does, so I have something like this ....
process 'json to csv' do
CsvWriter.new do |c|
c.json File.open("my_data.json", "rb").read
c.output "/tmp/data.csv"
end
end
# import stuff. Uses SQL IMPORT so it goes fast.
# To use ActiveRecord instead, add :validate => true
#
# This will also automagically figure out the right data types.
#
import "my big fat import", :url => "file:///tmp/data.csv" do
key :id
store :something
store :something_else
# note, it skips dont_import_me
end
end
# I use Resque so this works nicely
def self.perform
self.run_data_miner!
end
end
# Patches Ruport to use CSV instead of FCSV to work with Ruby 1.9.x
module Ruport
class Formatter
class CSV < Formatter
def csv_writer
@csv_writer ||= options.formatter ||
CSV(output, options.format_options || {})
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment