Skip to content

Instantly share code, notes, and snippets.

@dapangmao
Last active August 29, 2015 14:06
Show Gist options
  • Save dapangmao/23b03bf9dfcc2276401c to your computer and use it in GitHub Desktop.
Save dapangmao/23b03bf9dfcc2276401c to your computer and use it in GitHub Desktop.

One example of test-driven development in Python

Function or method is the most basic unit in Python programming. Test-driven development is a key for a developer to assure the code quality of those units. In his book, Harry Percival illustrated a few great examples about how to use TDD with Python and Django. It seems that for web development, TDD including unit testing and integration testing is the cornerstone for every success. For data analysis, coding mostly relies on built-in packages instead large framework like Django, which makes TDD easier. In my opnion, TDD in data analysis could have three steps.

  • Step 1: requirement analysis

    Before writing any code for data analysis, the programmer should seriously ask the customer or himself about the requirements.

    • What are the input parameter?
    • what if the input data doesn't fit the assumptions?
    • What is the purpose of this funtction or method? what are the desired outputs?
  • For example, there is a recent coding challenge called Maximum Product Subarray.

      > Find the contiguous subarray within an array (containing at least one number) which has the largest product.
      For example, given the array [2,3,-2,4],
      the contiguous subarray [2,3] has the largest product = 6.
    
  • OK, understanding this question is quite straight-forward. Given a array(or a list in Python), you return the integer that is the maximum product from a continuous subarry out of the input array.

    def maxProduct(A):
        """ A function to find the maximum product value for a continuous subarray.
        :param A: an array or list
        :type A: list
        """
        if A is None or not isinstance(A, list):
            return None
        if len(A) == 1:
            return A[0]
        pass
    • A production version of the codes above should be more like:
    class FunctionInputError(Exception):
        pass
        
    def maxProduct(A):
        """ A function to find the maximum product value for a continuous subarray.
        :param A: an array or list
        :type A: list
        """
        if A is None or not isinstance(A, list):
            raise FunctionInputError('must give a list as input')
        if len(A) == 1:
            return A[0]
        pass
  • Step 2: write test cases

    Given not a single line of logic codes has been writen yet, I call the current step as black-box testing, which means that I want this funtion to fail every test cases. Python has a built-in module doctest, which allows embedding the test cases within the docstring. I write six test cases, run the psedu-function below and arbitrarily specify the result to be -1. As expected, it fails all the six test cases with horrible red warnings under the Python shell. That is a good thing: it proves that the testing works.

    import doctest
    def maxProduct(A):
        """ A function to find the maximum product value for a continuous subarray.
        :param A: an array or list
        :type A: list
    
        - testcase1
            >>> maxProduct([0, 2, 3,-2,4, 1, -1])
            48
    
        - testcase2
            >>> maxProduct([2, 3, 0, 2, 4, 0, 3, 4])
            12
    
        - testcase3
            >>> maxProduct([0, 5, 3, -1, -2, 0, -2, 4, 0, 3, 4])
            30
    
        - testcase4
            >>> maxProduct([0, 1, -3, -4, -5, -1, 1, 2, 1, -1, -100, 0, -100000])
            12000
    
        - testcase5
            >>> maxProduct([0, -3000, -2, 0, -100, -100, 0, -9, -8, 1, 1, 2])
            10000
    
        - testcase6
            >>> maxProduct([0, -2, 0])
            0
        """
        if A is None or not isinstance(A, list):
            return None
        if len(A) == 1:
            return A[0]
        return -1
    
    doctest.testmod()
  • Step 3: implement the logic

    It's time to tackle the most difficult part: write the real function. Think about time complexity (it is best to use only one iteration around the input array which means O(n)), and space complexity (it is best not to use extra space). Run testmod() again and again to find mistakes and modify the codes accordingly. Finally I come with a solution with a helper shadow function _maxProduct. And it passes the six test cases. Althoug I am not sure that this function does not have any bug, at least it works now.

    import doctest
    from sys import maxint
    
    def maxProduct(A):
        """ A function to find the maximum product value for a continuous subarray.
        :param A: an array or list
        :type A: list
    
        - testcase1
            >>> maxProduct([0, 2, 3,-2,4, 1, -1])
            48
    
        - testcase2
            >>> maxProduct([2, 3, 0, 2, 4, 0, 3, 4])
            12
    
        - testcase3
            >>> maxProduct([0, 5, 3, -1, -2, 0, -2, 4, 0, 3, 4])
            30
    
        - testcase4
            >>> maxProduct([0, 1, -3, -4, -5, -1, 1, 2, 1, -1, -100, 0, -100000])
            12000
    
        - testcase5
            >>> maxProduct([0, -3000, -2, 0, -100, -100, 0, -9, -8, 1, 1, 2])
            10000
    
        - testcase6
            >>> maxProduct([0, -2, 0])
            0
        """
        if A is None or not isinstance(A, list):
            return None
        if len(A) == 1:
            return A[0]
        return max(_maxProduct(A), _maxProduct([a for a in reversed(A)]))
    
    def _maxProduct(A):
        max_val_forward = 1
        rst = -maxint
        for a in A:
            if a != 0:
                max_val_forward *= a
                rst = max(rst, max_val_forward)
            else:
                rst = max(0, rst)
                max_val_forward = 1
        return rst
    
    if __name__ == "__main__":
        doctest.testmod()

In conclusion, the most important thing about TDD in data analysis is writing test cases, which needs a lot of training and exercises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment