Enormous Switch vs Large Group of Methods

A primer on benchmarking and O(1) vs O(n) lookups

When writing Hubbub, a PHP-IRC bot/BNC, the IRC parser is a large collection of specialized methods, for example:

private function parse_rpl_isupport($line) { 
    // do some parsing...
    return $line;
}

... and, the methods are called dynamically:

if(method_exists($this, 'parse_' . $parsed->cmd)) {
    $parsed = $this->{'parse_' . $parsed->cmd}($parsed);
} else {
    $this->logger->notice("I don't have a parser method for the command: " . $parsed->cmd);
}

A method exists for every IRC protocol reply for a grand total of just over 500 methods defined in a single class. This is a bit unorthodox, and it confuses my IDE a little bit to put a cherry on top. These 500+ methods were generated from an IRC protocol definition file. At the time, it seemed the best way to accomplish the task we'll be benchmarking here, which is the quickest way to lookup and execute code based on a static key.

The only other alternative, aside from an enormous if()...elseif()... block, was an enormous switch (or a group of closures, but this did not occur to me at that point: closures were very new to PHP at the time). Recollecting from my C-programming days of old, it felt that an enormous switch() might be the logical answer. I wanted to convince myself to convert these dynamic method calls to a large switch statement, so I did a benchmark.

I wrote a code generator, genCode.php that generated the bulk of the code, and I loaded it up in my IDE, ran it through some formatting, and added some additional supporting code to perform the actual benchmarking. Here is the overview:

Switch Benchmarking

<?php
$startTime = microtime(1);
for($i = 0; $i < 500000; $i++) {
    $x = mt_rand(0,3333);
    switch($x) {
        case '0':
            $x = $x * $x + $x;
        break;
        // .. all the way to case '2999'...no default switch
    }
}
$endTime = microtime(1);
$tookMs = ($endTime - $startTime) * 1000;
echo "Took: " . sprintf("%.2f", $tookMs) . " ms\n";

Dynamic Method Call Benchmarking

Done similar to the Hubbub IRC protocol parser above:

<?php

class bigClass {

    public function runStuff($x) {
        if(method_exists($this, 'get_' . $x)) {
            $x = $this->{'get_' . $x}($x);
        }

        return $x;
    }

    private function get_0($pass) {
        return $pass * $pass + $pass;
    }
    
    // .. all the way to get_2999($pass) { 
   
} 

// and to execute...

$c = new bigClass();
$startTime = microtime(1);
for($i = 0; $i < 500000; $i++) {
   $x = mt_rand(0,3333);
   $c->runStuff($x);
}
$endTime = microtime(1);
$tookMs = ($endTime - $startTime) * 1000;
echo "Took: " . sprintf("%.2f", $tookMs) . " ms\n";

Results

The results were just mind blowing. For 500,000 iterations:

Big Switch: 91,935.78 ms (0.18387 ms per compare)
Methods: 3167.25 ms (0.00633 ms per compare)

That is not a typo. Trying to reproduce the original parser code (at top) more closely, I moved the loop inside of a new class method so we would be calling $this instead of $c in the 'execution' portion of the above code:

public function runInternally() {
    for($i = 0; $i < 50000; $i++) {
        $x = mt_rand(0,3333);
        if(method_exists($this, 'get_' . $x)) {
            $x = $this->{'get_' . $x}($x);
        }
    }
}
// ... and $this->runInternally() instead of the loop to call $c->runStuff();

And, as somewhat as expected, the results were actually faster: only 2373.09 ms for 500,000 iterations. Putting on my thinking cap, I'm not entirely sure why it wasn't glaringly obvious beforehand: PHP switch statements are computed just like an if()...elseif()...else() block: sequentially, in the order they are defined. Methods, on the other hand, are stored in a hash table lookup with constant-time lookups.. So it does not matter if we have 1 or 5,000 methods: they will all be called in constant time, i.e. O(1), while, using it statically as such, switches are O(n).

Closures

Just for fun, I thought I'd try one more style, which is perhaps the most appropriate of them all: closures within an array. I already know that PHP arrays have a O(1) expected constant lookup time, but I was just a bit curious of closures would produce any surprising results. So adding one last bit to codeGen.php, I generated some code that looks like this:

$lookupTable = [
    '0'    => function ($x) {
        return $x * $x + $x;
    },
    // all the way to '2999' => function($x) { ... 
];
    
// and to execute/benchmark, .. just like before
$startTime = microtime(1);
for($i = 0; $i < 500000; $i++) {
    $x = mt_rand(0,3333);
    if(isset($lookupTable[$x])) {
        $lookupTable[$x]($x);
    }
}
$endTime = microtime(1);
$tookMs = ($endTime - $startTime) * 1000;
echo "Took: " . sprintf("%.2f", $tookMs) . " ms\n";

The result was 3336.54 ms for 500,000 iterations, which is right on par with dynamic method calling. Nothing surprising here except there was no surprising results.

Initialization Times

Finally, another thing to take notice is that in all of these cases I've used a fairly high iteration count count of 500,000 iterations. This isn't very realistic compared to the real-world application of web requests. It is doubtful, for example, in a web request, that you would need to iterate over a switch 500,000 times. Which brings something else into the picture: initialization time.

All of these benchmarks, to date, are spreading the cost of initializing these enormous structures and spreading it across 500,00 iterations. So, I wanted to see what the benchmarks would look like with one single iteration. This would give us an idea of how long PHP is taking to parse, and completely initialize each of the three different examples.

To test this, I did two separate tests: One test is using the Unix utility time(1) which will give me the absolute total time to completely initialize an entirely fresh PHP instance, tokenize, and run the scripts. I used chrt(1) as well that will help reduce variability from other unrelated processes on my PC from affecting the scores. The second test, is the same kind of benchmark we did above using PHP's microtime(1), which we will call the "in-PHP" benchmarks.

These are the results, using something like chrt -f 99 time -o out.log -f '...' php test.php and an in-PHP microtime() clock, over ~200 iterations:

Methods: in-PHP avg 0.07ms, time(1) reports avg 63ms (+5ms)
Big Switch: in-PHP avg 0.32ms, time(1) reports avg 58ms (+0ms)
Closures: in-PHP avg 2.497ms, time(1) reports avg 79ms (+21ms)

These numbers must be interpreted with some logic: The most important thing to note is, the second number reported by time(1) utility is the total time it took to initialize PHP, and as you can see, in all cases, it's only a few milliseconds difference once we add in the total PHP initialization time.

As you can see, the closures took significantly longer to initialize in both runtime (2.497ms) and in tokenization (+21ms over the big switch). I thought this was quite significant. I'm not aware of any way to double-check this, but I'd very much like to see "tokenization-only" times, pre-execution to see if it is really taking a full 21 milliseconds to tokenize. Luckily, in a real-world web environment, we only need to worry about the 2.497ms, as all modern PHP installations come with a built-in tokenizer (opcode) cache.

Summary

One last note, in defense of the switch() statement: for very small switches, they still win. One last benchmark I tried, was running the original switch test with only 10 case's: the time was a mere 1139.11 ms. At exactly 50 case's, we finally hit 3174.83 ms which is the same ballpark as the method-calling method.

To summarize:

Switches are fast for a low iteration count, or less than ~50 case items (your mileage may vary).
Defining, and subsequently calling methods from a pool of thousands, is much faster anyway you spin it.
Closures are fast too, and in an array, can also be looked up in constant time, just like methods. There may be a performance hit during initialization, however. It may even be noticeable or worthwhile if you aren't running an opcode cache.
At no point were method/function calls "expensive" or slow, as at least one anti-PHP articles I've read claimed.
At any rate, a single lookup is taking .006 ms to 0.18ms. So, unless these items will be iterated at least 10,000 times and expected to be sub-second response, the speed difference should be absolutely negligible. I cannot imagine the time I have spent doing benchmarks here being worthwhile for any project.
As always, Micro-optimization is rarely, if ever, worth it.

Thank you,

A.B. Carroll ([email protected])

Jan 7, 2016

abcarroll/README.md