Created
June 3, 2011 19:18
-
-
Save murphyslaw/1006979 to your computer and use it in GitHub Desktop.
A ruby script that transforms an array from a csv file into an array with the same structure but with transformed column names and values. You can optionally define transformations on each column.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/ruby -w | |
require 'active_support/core_ext' | |
# The transformer class uses a source array and a mapping table to | |
# create a target array, with the same structure as the source array | |
# but transformed values. The structure of the source array is oriented | |
# on what is returned after parsing a csv file. | |
# | |
# Optionally, it is also possible to define transformation methods for each column, which | |
# are described below for the different transformation types. | |
# | |
# For example: | |
# source: [["column1", "column2"], ["first", "second"], ["a", "b"]] | |
# mapping: { "column1" => "custom_column1", "column2" => "custom_column2" } | |
# target: [["custom_column1", "custom_column2"], ["FIRST", "second"], ["A", "b"]] | |
# | |
# Any errors encountered during the run are stored in the +errors+ structure, that | |
# holds errors for each column. | |
# | |
# For example: | |
# transformer = Transformer.new(source, mapping, options) | |
# transformer.run | |
# transformer.errors['column2'] # returns all errors for 'column2' | |
# | |
# There are also a couple of options available, that are explained in more detail | |
# in the +set_default_options+ method. | |
class Transformer | |
attr_reader :source | |
attr_reader :mapping | |
attr_reader :source_columns | |
attr_writer :target_columns | |
attr_reader :options | |
attr_reader :errors | |
# To initialize the transformer you need to pass in the source array | |
# and the mapping table. | |
def initialize(source, mapping, options = {}) | |
@source = source | |
@mapping = mapping | |
@source_columns = @source.delete_at(0) | |
set_default_options(options) | |
initialize_errors | |
end | |
# Sets the default options for the transformer. | |
def set_default_options(options) | |
@options = options.reverse_merge!({ | |
:display_transformations => true, # If enabled, each source and transformed target row will be displayed. | |
}) | |
end | |
# Creates the data structure to hold errors for each of the columns. | |
def initialize_errors | |
@errors = {} | |
@source_columns.each do |column| | |
@errors[column] = [] | |
end | |
end | |
# Returns the index in the source array for the given column name. | |
def source_column_index(name) | |
source_columns.index(name) | |
end | |
# Returns the name of the target column as defined in the mapping table | |
# for a given source column name. | |
def target_column_name(source_column_name) | |
mapping[source_column_name] | |
end | |
# Returns an array containing the target column names. | |
def target_columns | |
@target_columns ||= | |
mapping.keys.inject([]) do |target_columns, source_column_name| | |
target_columns << target_column_name(source_column_name) | |
target_columns | |
end | |
end | |
# Transforms a single source row, depending on the mapping | |
# and transform methods. | |
def run_transformations(row) | |
transformed_row = [] | |
mapping.each do |source_column_name, target_column_name| | |
source_column_index = source_column_index(source_column_name) | |
source_value = row[source_column_index] | |
target_value = transform(source_column_name, source_value) | |
transformed_row << target_value | |
end | |
if options[:display_transformations] | |
puts "#{row.inspect}" | |
puts "#{transformed_row.inspect}\n\n" | |
end | |
transformed_row | |
end | |
# Checks wether there is a transformation method for the | |
# given column and executes it, otherwise it returns the | |
# original value unchanged. | |
def transform(source_column_name, value) | |
method_name = "transform_#{source_column_name}" | |
respond_to?(method_name) ? send(method_name, value) : value | |
end | |
# Main method to run the transformer. It turns the source array | |
# into the target array. | |
def run | |
target = [] | |
target << target_columns | |
source.map { |row| target << run_transformations(row) } | |
target | |
end | |
# Transformations | |
# | |
# To enable a transformation you need to define a method, that | |
# starts with "transform_" followed by the source column name | |
# you want to transform. The method is given the value and | |
# should return the target value. | |
# This transformation is an example for changing the values format. | |
# It also handles the case, when the value is nil. | |
def transform_column1(value) | |
value ||= '' | |
value.upcase | |
end | |
# This transformation is an example for only allowing certain | |
# values in the target and discarding other ones. | |
def transform_column2(value) | |
allowed_values = [ | |
'yes', | |
'no', | |
] | |
index = allowed_values.index(value) | |
errors['column2'] << "Warning: '#{value}' discarded, because it is not allowed. #{allowed_values.inspect}" unless index | |
index ? value : '' | |
end | |
# This transformation is an example for mapping specific source values | |
# to target values and discarding the rest. | |
def transform_column3(value) | |
mapping = { | |
'y' => 'yes', | |
'n' => 'no', | |
} | |
index = mapping.keys.index(value) | |
errors['column3'] << "Warning: '#{value}' discarded, because it is not mapped. #{mapping.keys.inspect}" unless index | |
index ? mapping[value] : '' | |
end | |
end | |
# | |
# Prepare source, mapping and options. | |
# | |
source = [ | |
['column1', 'column2', 'column3'], | |
['first', 'second', 'third'], | |
['no', 'yes', 'y'], | |
] | |
mapping = { | |
'column1' => 'custom_column1', | |
'column2' => 'custom_column2', | |
'column3' => 'custom_column3', | |
} | |
options = {} | |
transformer = Transformer.new(source, mapping, options) | |
# | |
# Output results. | |
# | |
puts "source (#{source.size}): #{source.inspect}\n\n" | |
target = transformer.run | |
puts "target (#{target.size}): #{target.inspect}\n\n" | |
transformer.errors.each do |column, errors| | |
puts errors | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment