For the last 18 months, I've worked on many Ruby microservices (HTTP and "hipster batch" *) whose main purpose was to take data in one representation (CSV, JSON, Database records), and turn it into data in another representation. Sometimes, the complexity of the conversion required parsing source data into some intermediate domain models, processing them, and then creating decorators to create the output respresentation... and sometimes, the logic was simple enough to do a pretty much straight mapping between formats.
For people like me who grew up in the OO era, objects are our tool of choice. And because we're comfortable with our object hammer, it's very natural to try and solve every problem by hitting it with an object. In these data processing services, it's instinctive to write an object that looks something like this:
class RevenueCSVFile
def initialize csv_path
@csv_path = csv_path
end
def convert_to_json json_path
...
end
end
But this kind of feels a bit icky... we're putting knowledge about JSON stuff into our CSV class. What's to say we shouldn't be doing RevenueJSONFile.from_csv
?
What we often forget about in our OO world is that processes can be objects too. Really, what we want is a converter object that knows how to take a CSV file and turn it into a JSON file.
class RevenueCSVToJSONConverter
def initialize csv_path, json_path
@csv_path = csv_path
@json_path = json_path
end
def convert
...
end
end
RevenueCSVToJSONConverter.new("input.csv", "output.json").convert
This is a bit cleaner, but really, there's no point for this object to have state. Once it's initialised, it's not like we're going to call any other method on it apart from convert
. It has one job (yay, single responsibility principle!) and that's all it should ever do.
class RevenueCSVToJSONConverter
def convert csv_path, json_path
...
end
end
That's a bit better, now we can even reuse this instance if we wanted, to loop through a directory. But I'm not entirely happy about writing:
RevenueCSVToJSONConverter.new.convert("input.csv", "output.json")
It feels a bit redundant. I don't want to type converter.convert
. Really, this feels more like it should be a function, but we don't really have pure standalone functions in Ruby (we know better than to unnecessarily pollute the global scope!).
But we could write an class that we use like a function.
class ConvertRevenueCSVToJSON
def self.call csv_path, json_path
...
end
end
Now, we can write:
ConvertRevenueCSVToJSON.call("input.csv", "output.json")
To me, that line conveys intent much more clearly. When I read that line like a sentence and convert it to normal human speak (something we're used to being able to do in Ruby) it tells me exactly what it's doing, without any redundant words. Also, because this is a single purpose class, with one entry point, where the name of the class tells me what it does, it's much less likely for the scope of this class to unintentionally creep (it would become tempting to add other convert_to_xxx
method when using RevenueCSVFile).
So why did I use the method name call
? In Ruby, call
is the method that is used to invoke a proc or a lambda. So if you have a piece of code expects a lambda or proc, most of the time you could pass in a class or object with a call
method instead**. This allows you to write a nice unit-testable piece of code that can then be used to compose other objects in a nice modular way. (For a great example of this, see apotonick's use of Callable with Reform http://nicksda.apotomo.de/2014/07/representable-2-0-with-better-inheritance-filters-and-automatic-collections/)
But there's something that I'm still not completely happy with in this design. Let's flesh out the example and I'll show you what I mean.
require 'csv'
require 'json'
class ConvertCSVToJSON
def self.call csv_path, json_path, row_processer
csv_rows = csv_rows(csv_path)
row_hashes = rows_to_hashes(csv_rows)
processed_hashes = process_row_hashes(row_hashes, row_processer)
json = rows_to_json(processed_hashes)
write_json_file(json_path, json)
end
def self.csv_rows(csv_path)
CSV.parse(File.read(csv_path), headers: true)
end
def self.rows_to_hashes rows
rows.collect(&:to_hash)
end
def self.process_row_hashes row_hashes, row_processer
row_hashes.collect { | row | row_processer.call(row) }
end
def self.rows_to_json rows
{
"collection" => rows
}.to_json
end
def self.write_json_file json_path, json
File.open(json_path, "w") { | file | file << json }
end
end
# By putting the logic for processing a revenue row hash in a separate class,
# it can now be tested without the overhead of having to make and parse a CSV,
# and can be easily mocked when testing ConvertCSVToJSON.
# We can also reuse the underlying CSV to JSON conversion code
# for CSV files containing different types of data.
class ProcessRevenueRow
def self.call row
row # Do some business logic here!
end
end
ConvertCSVToJSON.call("input.csv", "output.json", ProcessRevenueRow)
Now, I'm happy with pulling out the process row
logic, but there are a few things I don't like about this still. These are my personal preferences - I don't like having to prefix every method definition with self.
, I don't like having to pass args around all the time, and I don't like the fact that there isn't a clear separation between the public interface and the private methods of this class (you can make private class methods in Ruby, but it's a bit cumbersome and isn't commonly done).
Luckily, there is a way to address all of these concerns. If we refactor the class call
method to delegate to an instance call
method under the hood, we can get the cleanliness of a class interface with the convenience of instance methods - no more arg passing. It lets us hide our private methods, and as a bonus, it removes the temptation to start using class variables which pollute our global state.
require 'csv'
require 'json'
class ConvertCSVToJSON
def self.call csv_path, json_path, row_processer
new(csv_path, json_path, row_processer).call
end
def initialize csv_path, json_path, row_processer
@csv_path = csv_path
@json_path = json_path
@row_processer = row_processer
end
def call
write_json_file(csv_as_json)
end
private
attr_accessor :csv_path, :json_path, :row_processer
def csv_as_json
{
"collection" => processed_rows
}.to_json
end
def processed_rows
row_hashes.collect { | row | row_processer.call(row) }
end
def row_hashes
csv_rows.collect(&:to_hash)
end
def csv_rows
CSV.parse(File.read(csv_path), headers: true)
end
def write_json_file json
File.open(json_path, "w") { | file | file << json }
end
end
ConvertCSVToJSON.call("input.csv", "output.json", ProcessRevenueRow)
This is a fairly contrived example where the input of one method becomes the output of another method, but imagine examples where there are more objects to pass around that need to be accessed at different stages of the process (like a logger) - this becomes a much cleaner approach. There is a little extra boilerplate in the assignment of instance variables and the delegation of self.new
to the underlying instance, but in all but the simplest scenarios, I generally feel happy to make this tradeoff.
Testing also becomes much easier when there is just a class method interface exposed to the calling code. Compare testing a call to ConvertCSVToJSON
with testing our original design, the RevenueCSVFile
.
Old way:
let(:revenue_csv_file) { instance_double('RevenueCSVFile')}
let(:csv_path) { 'input.csv' }
let(:json_path) { 'output.json' }
before do
allow(RevenueCSVFile).to receive(:new).and_return(revenue_csv_file)
allow(revenue_csv_file).to receive(:convert_to_json)
end
it "creates a RevenueCSVFile with the given path" do
expect(RevenueCSVFile).to receive(:new).with(csv_path)
do_something_that_converts_csv_to_json
end
it "converts the CSV to JSON" do
expect(revenue_csv_file).to receive(:convert_to_json).with(json_path)
do_something_that_converts_csv_to_json
end
New way:
let(:csv_path) { 'input.csv' }
let(:json_path) { 'output.json' }
it "converts the CSV to JSON" do
expect(ConvertCSVToJSON).to receive(:call).with(csv_path, json_path, ProcessRevenueRow)
do_something_that_converts_csv_to_json
end
Now, this is an overly simplified example, and there are a few things about the ConvertCSVToJSON class that could be improved (passing around Files or streams instead of String paths) but I hope it serves to demonstrate the point that functions/operations/processes can be objects too - don't forget you have this tool in your OO toolbox!
A really good example of using objects as functions can be seen in Trailblazer's Operation classes https://github.com/apotonick/trailblazer. Function classes are the perfect places to encapsulate reusable business logic.
_* Hipster batch: microservices running on AWS instances that startup, read some data, process the data, and put it in S3 for another microservice to process at it's leisure.
_** A little known Ruby fact is that you can invoke a call method (or a lambda or proc) by using the syntax callable.(args)
eg. ConvertCSVToJSON.("input.csv", "output.json", ProcessRevenueRow)
. You'll get weird looks from your pair if you try it though.
Reminds me of strategy pattern, also case classes in scala. Apply is the closest thing to call in the scala world, although you don't need to explicitly write it