This benchmark has been misleading for a while. It was originally made to demonstrate how JIT compilers can do all sorts of crazy stuff to your code - especially LuaJIT - and was meant to be a starting point of discussion about what exactly LuaJIT does and how.
As a result, its not indicative of what its performance may be on more realistic data. Differences can be expected because
- the text will not consist of hard-coded constants
- the number of words (and therefore the dictionary) would be larger, and JIT compilers for JS and Lua often have special optimizations for small dictionaries/tables
- the words wont be pre-split, and allocating new words adds significant performance penalty (in that case a trie would probably outperform other approaches)
The last C code isnt doing a hash lookup. Its a plain array. Which if you read the original code, they were all doing hash map lookups.
This is something luajit does very well because its builtin and the jit optimises the hash lookups at runtime. As others have mentioned, you need to write platform specific code in C to be able to achieve this - and its alot of work (essentially what the lua jit does).
In commercial work, I can vouch for luajit. I developed software for realtime guaranteed packet capture systems (fmad.io) running luajit - and in many cases luajit was easily the best fit for purpose in performance and ease of implementation. We often worked on many GB data sets and with some big network pipes that we needed to run guaranteed packet capture on.
The biggest benefit I found was using luajit as a high speed binder for Clibs.. with zero overhead with ffi calls to c it meant we could utilise C libraries, as well as still use luajit for building the app side. The interesting thing, is how much luajit is used in the network capture world.