This benchmark has been misleading for a while. It was originally made to demonstrate how JIT compilers can do all sorts of crazy stuff to your code - especially LuaJIT - and was meant to be a starting point of discussion about what exactly LuaJIT does and how.
As a result, its not indicative of what its performance may be on more realistic data. Differences can be expected because
- the text will not consist of hard-coded constants
- the number of words (and therefore the dictionary) would be larger, and JIT compilers for JS and Lua often have special optimizations for small dictionaries/tables
- the words wont be pre-split, and allocating new words adds significant performance penalty (in that case a trie would probably outperform other approaches)
C code in a comparison is a total mess. Doing the memory allocation for a single integer inside of a cycle
is the most crazy idea I've ever seen:
cnt = malloc(sizeof(int));
besides the C-code doesn't replicate what Lua code is doing.
The similar code in C would look like this:
The results:
Its about 25x times faster than Lua.
Please don't make such useless comparison anymore. Thx
P.S. A hint. If you see the C-code is running slower, than any language besides Assembler then 100% you are doing something wrong. Absolutely any program in any language can be translated directly into C-code and cannot run faster than C, because the last one is simplified Assembler and nothing else. Nobody can top CPU instructions on the same CPU with what ever method.