Functions as Classes

For the last 18 months, I've worked on many Ruby microservices (HTTP and "hipster batch" *) whose main purpose was to take data in one representation (CSV, JSON, Database records), and turn it into data in another representation. Sometimes, the complexity of the conversion required parsing source data into some intermediate domain models, processing them, and then creating decorators to create the output respresentation... and sometimes, the logic was simple enough to do a pretty much straight mapping between formats.

For people like me who grew up in the OO era, objects are our tool of choice. And because we're comfortable with our object hammer, it's very natural to try and solve every problem by hitting it with an object. In these data processing services, it's instinctive to write an object that looks something like this:

class RevenueCSVFile

  def initialize csv_path
    @csv_path = csv_path
  end

  def convert_to_json json_path
     ...
  end
end

But this kind of feels a bit icky... we're putting knowledge about JSON stuff into our CSV class. What's to say we shouldn't be doing RevenueJSONFile.from_csv?

What we often forget about in our OO world is that processes can be objects too. Really, what we want is a converter object that knows how to take a CSV file and turn it into a JSON file.

class RevenueCSVToJSONConverter

  def initialize csv_path, json_path
    @csv_path = csv_path
    @json_path = json_path
  end

  def convert
     ...
  end
end

RevenueCSVToJSONConverter.new("input.csv", "output.json").convert

This is a bit cleaner, but really, there's no point for this object to have state. Once it's initialised, it's not like we're going to call any other method on it apart from convert. It has one job (yay, single responsibility principle!) and that's all it should ever do.

class RevenueCSVToJSONConverter
  def convert csv_path, json_path
     ...
  end
end

That's a bit better, now we can even reuse this instance if we wanted, to loop through a directory. But I'm not entirely happy about writing:

RevenueCSVToJSONConverter.new.convert("input.csv", "output.json")

It feels a bit redundant. I don't want to type converter.convert. Really, this feels more like it should be a function, but we don't really have pure standalone functions in Ruby (we know better than to unnecessarily pollute the global scope!).

But we could write an class that we use like a function.

class ConvertRevenueCSVToJSON
  def self.call csv_path, json_path
     ...
  end
end

Now, we can write:

ConvertRevenueCSVToJSON.call("input.csv", "output.json")

To me, that line conveys intent much more clearly. When I read that line like a sentence and convert it to normal human speak (something we're used to being able to do in Ruby) it tells me exactly what it's doing, without any redundant words. Also, because this is a single purpose class, with one entry point, where the name of the class tells me what it does, it's much less likely for the scope of this class to unintentionally creep (it would become tempting to add other convert_to_xxx method when using RevenueCSVFile).

So why did I use the method name call? In Ruby, call is the method that is used to invoke a proc or a lambda. So if you have a piece of code expects a lambda or proc, most of the time you could pass in a class or object with a call method instead**. This allows you to write a nice unit-testable piece of code that can then be used to compose other objects in a nice modular way. (For a great example of this, see apotonick's use of Callable with Reform http://nicksda.apotomo.de/2014/07/representable-2-0-with-better-inheritance-filters-and-automatic-collections/)

But there's something that I'm still not completely happy with in this design. Let's flesh out the example and I'll show you what I mean.

require 'csv'
require 'json'

class ConvertCSVToJSON

  def self.call csv_path, json_path, row_processer
    csv_rows = csv_rows(csv_path)
    row_hashes = rows_to_hashes(csv_rows)
    processed_hashes = process_row_hashes(row_hashes, row_processer)
    json = rows_to_json(processed_hashes)
    write_json_file(json_path, json)
  end

  def self.csv_rows(csv_path)
    CSV.parse(File.read(csv_path), headers: true)
  end

  def self.rows_to_hashes rows
    rows.collect(&:to_hash)
  end

  def self.process_row_hashes row_hashes, row_processer
     row_hashes.collect { | row | row_processer.call(row) }
  end

  def self.rows_to_json rows
    {
      "collection" => rows
    }.to_json
  end

  def self.write_json_file json_path, json
    File.open(json_path, "w") { | file | file << json }
  end
end

# By putting the logic for processing a revenue row hash in a separate class,
# it can now be tested without the overhead of having to make and parse a CSV,
# and can be easily mocked when testing ConvertCSVToJSON.
# We can also reuse the underlying CSV to JSON conversion code
# for CSV files containing different types of data.

class ProcessRevenueRow
  def self.call row
    row # Do some business logic here!
  end
end

ConvertCSVToJSON.call("input.csv", "output.json", ProcessRevenueRow)

Now, I'm happy with pulling out the process row logic, but there are a few things I don't like about this still. These are my personal preferences - I don't like having to prefix every method definition with self., I don't like having to pass args around all the time, and I don't like the fact that there isn't a clear separation between the public interface and the private methods of this class (you can make private class methods in Ruby, but it's a bit cumbersome and isn't commonly done).

Luckily, there is a way to address all of these concerns. If we refactor the class call method to delegate to an instance call method under the hood, we can get the cleanliness of a class interface with the convenience of instance methods - no more arg passing. It lets us hide our private methods, and as a bonus, it removes the temptation to start using class variables which pollute our global state.

require 'csv'
require 'json'

class ConvertCSVToJSON

  def self.call csv_path, json_path, row_processer
    new(csv_path, json_path, row_processer).call
  end

  def initialize csv_path, json_path, row_processer
    @csv_path = csv_path
    @json_path = json_path
    @row_processer = row_processer
  end

  def call
    write_json_file(csv_as_json)
  end

  private

  attr_accessor :csv_path, :json_path, :row_processer

  def csv_as_json
    {
      "collection" => processed_rows
    }.to_json
  end

  def processed_rows
     row_hashes.collect { | row | row_processer.call(row) }
  end

  def row_hashes
    csv_rows.collect(&:to_hash)
  end

  def csv_rows
    CSV.parse(File.read(csv_path), headers: true)
  end

  def write_json_file json
    File.open(json_path, "w") { | file | file << json }
  end
end

ConvertCSVToJSON.call("input.csv", "output.json", ProcessRevenueRow)

This is a fairly contrived example where the input of one method becomes the output of another method, but imagine examples where there are more objects to pass around that need to be accessed at different stages of the process (like a logger) - this becomes a much cleaner approach. There is a little extra boilerplate in the assignment of instance variables and the delegation of self.new to the underlying instance, but in all but the simplest scenarios, I generally feel happy to make this tradeoff.

Testing also becomes much easier when there is just a class method interface exposed to the calling code. Compare testing a call to ConvertCSVToJSON with testing our original design, the RevenueCSVFile.

Old way:

let(:revenue_csv_file) { instance_double('RevenueCSVFile')}
let(:csv_path) { 'input.csv' }
let(:json_path) { 'output.json' }

before do
  allow(RevenueCSVFile).to receive(:new).and_return(revenue_csv_file)
  allow(revenue_csv_file).to receive(:convert_to_json)
end

it "creates a RevenueCSVFile with the given path" do
  expect(RevenueCSVFile).to receive(:new).with(csv_path)
  do_something_that_converts_csv_to_json
end

it "converts the CSV to JSON" do
  expect(revenue_csv_file).to receive(:convert_to_json).with(json_path)
  do_something_that_converts_csv_to_json
end

New way:

let(:csv_path) { 'input.csv' }
let(:json_path) { 'output.json' }

it "converts the CSV to JSON" do
  expect(ConvertCSVToJSON).to receive(:call).with(csv_path, json_path, ProcessRevenueRow)
  do_something_that_converts_csv_to_json
end

Now, this is an overly simplified example, and there are a few things about the ConvertCSVToJSON class that could be improved (passing around Files or streams instead of String paths) but I hope it serves to demonstrate the point that functions/operations/processes can be objects too - don't forget you have this tool in your OO toolbox!

A really good example of using objects as functions can be seen in Trailblazer's Operation classes https://github.com/apotonick/trailblazer. Function classes are the perfect places to encapsulate reusable business logic.

_* Hipster batch: microservices running on AWS instances that startup, read some data, process the data, and put it in S3 for another microservice to process at it's leisure.

_** A little known Ruby fact is that you can invoke a call method (or a lambda or proc) by using the syntax callable.(args) eg. ConvertCSVToJSON.("input.csv", "output.json", ProcessRevenueRow). You'll get weird looks from your pair if you try it though.

bethesque/blog.md Secret

thetrav commented Jan 31, 2015

lpadukana commented Feb 6, 2015

bethesque commented Mar 9, 2015