Skip to content

Instantly share code, notes, and snippets.

@bethesque
Last active May 2, 2016 07:02
Show Gist options
  • Save bethesque/1d0472e2810ef50b5ab7 to your computer and use it in GitHub Desktop.
Save bethesque/1d0472e2810ef50b5ab7 to your computer and use it in GitHub Desktop.
A way to write a hipster batch microservice in Ruby
  1. Identify the steps in the high level process

    eg.

    • Step 1: Download CSV files from box
    • Step 2: Convert CSV to JSON
    • Step 3: Upload JSON and CSV files to S3
  2. Create modules for each of those steps. In our use case, the top level process matches a pattern called "ETL" or "Extract, Transform, Load", so we used those names.

    $ mkdir -p lib/extract
    $ mkdir -p lib/transform
    $ mkdir -p lib/load
    
  3. Create an obvious entry point into the codebase. I like to call the class "Run".

      $ touch lib/run.rb
    
  4. Create a spec for the Run class that has an "it" block for each of the top level steps.

    describe Run do
      it "downloads the CSV files from box"
      it "converts the CSV files to JSON"
      it "uploads the CSV and JSON files to S3"
    end

    You will probably need to add more "it" blocks later, but this is a good place to start.

  5. At a very high level, write some very basic code that outlines the way you would like the code to run. This should reflect the high level process that you outlined in step 1. Don't worry about the fact that it won't run, just map out the high level process. This code will change. You will find complications. This is just a starting point to give you an idea of how the whole thing might fit together.

    eg.

    class Run
      def self.call
        csv_files = Extract::DownloadCSVFiles.call
        json_files = Transform::ConvertCSVToJSON.call csv_files
        Load::UploadFiles.call csv_files, json_files
      end
    end

    Now, comment out all but the first line. This is the line we are going to focus on first.

    class Run
      def self.call
        csv_files = Extract::DownloadCSVFiles.call
        # json_files = Transform::ConvertCSVToJSON.call csv_files
        # Load::UploadFiles.call csv_files, json_files
      end
    end
  6. Repeat this design process to implement the first class - outline the high level steps, create a spec that has an "it" for each step, create a class that has some rough, high level code, then comment out all but the first line.

    module Extract
        describe DownloadCSVFiles do
        
            describe ".call" do
                it "gets a list of CSV files in the Box folder"
                it "downloads the CSV files"
                it "copies the files to a temp directory"
                it "returns a list of File objects"
            end
        end
    end
    module Extract
        class DownloadCSVFiles
        
            def self.call
                 box_csv_files
                # .collect do | box_csv_file |
                    # download_file_from_box box_csv_file
                # end
            end
        end
    end
  7. If you don't know what some code should look like (how to download files from Box???), then do a spike here.

  8. When you know what the code should look like, write a spec for the first line, and implement it. Repeat until the class is finished! You may need to repeat the whole cycle again if this class needs to call out to other classes.

  9. Once this class is implemented, go back to the calling class, and update it to pass in any arguments that it needs, write the specs, write the code. Uncomment each line as you implement the code. Make sure you see it fail before you see it pass to ensure your test is really working.

  10. Refactoring! How to know when you need to refactor:

  11. The amount of code that it takes to set up the mocks for your test is more than half a screen.

  12. It is hard to test your code.

  13. The class is more than 100 lines long.

  14. The class has a mixture of "levels" in it - some high level stuff, and some low level stuff. Ideally, each class should operate on only one level. Classes should be like army ranks. A General class doesn't get his hands dirty with low level work, he just tells other classes what to do. He doesn't care how they do it, just that the work gets done. A Captain class does what a General class tells him, but he might delegate the fiddly little stuff to a Corporal class. A Corporal class does all the crappy little work. Classes should only be of one rank. If you have a General that is doing a Corporal's work, you need to refactor.

Tips
  • Each class should do one thing and one thing only. Try using the "Functions as Objects" pattern.
  • Pass around File objects, not String paths. This allows you to attach metadata to files if need be, and means the classes that use the Files can treat them as Streams.
  • There's not much point catching errors in low level code, because generally, if a problem happens, then the whole process should probably stop - just make sure the error is logged and re-raised in the Run class.
  • Don't catch Exception, catch StandardError. Catching Exception will catch Sigint and Sigterm etc and mean your program won't respond to Ctl+c.
  • Check all the environment variables at the start, and convert them to proper variables to pass around to the other classes. Don't call ENV from lower classes - this leaks implementation details about how that information is retrieved.
  • All dates should be in UTC, or should contain timezone information.
  • Serialize non integer numeric amounts as Strings, not doubles, as doubles lose precision. Using a String forces the calling code to parse the number deliberately, and means that a BigDecimal can be used instead.
  • Write unit tests for all classes. Write one end-to-end integration test that stubs out the external dependencies and checks the bare minimum to ensure that the overall business goals are being achieved. See this test as an example. Try to duplicate as little testing from the unit tests as possible, so the tests aren't brittle. eg. check that we're uploading a file to the data directory, don't check the full path as this would be duplicating the unit test for another class. If there is complicated behaviour in a failure scenario, maybe write one end-to-end test for a failure, but really, this should be covered by unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment