Skip to content

Instantly share code, notes, and snippets.

@spion
Last active August 24, 2024 13:09
Show Gist options
  • Save spion/3049314 to your computer and use it in GitHub Desktop.
Save spion/3049314 to your computer and use it in GitHub Desktop.
C++ versus V8 versus luajit versus C benchmark - (hash) tables

Warning

This benchmark has been misleading for a while. It was originally made to demonstrate how JIT compilers can do all sorts of crazy stuff to your code - especially LuaJIT - and was meant to be a starting point of discussion about what exactly LuaJIT does and how.

As a result, its not indicative of what its performance may be on more realistic data. Differences can be expected because

  1. the text will not consist of hard-coded constants
  2. the number of words (and therefore the dictionary) would be larger, and JIT compilers for JS and Lua often have special optimizations for small dictionaries/tables
  3. the words wont be pre-split, and allocating new words adds significant performance penalty (in that case a trie would probably outperform other approaches)
spion@missperfect:~/Projects/testhash$ gcc -O3 test.c -o testc
spion@missperfect:~/Projects/testhash$ time ./testc
The 1000000
a 2000000
at 1000000
brown 1000000
dog 1000000
era 1000000
fox 1000000
jumped 1000000
lake 1000000
lazy 1000000
new 1000000
near 1000000
of 1000000
over 1000000
quick 1000000
restaurant 1000000
the 2000000
real 0m0.234s
user 0m0.228s
sys 0m0.004s
spion@missperfect:~/Projects/testhash$ g++ -O3 -std=gnu++0x test.cpp -o test
spion@missperfect:~/Projects/testhash$ time ./test
the 2000000
at 1000000
a 2000000
brown 1000000
dog 1000000
era 1000000
fox 1000000
jumped 1000000
The 1000000
lake 1000000
lazy 1000000
new 1000000
near 1000000
of 1000000
over 1000000
quick 1000000
restaurant 1000000
real 0m0.321s
user 0m0.320s
sys 0m0.000s
spion@missperfect:~/Projects/testhash$ time node test.js
The 1000000
quick 1000000
brown 1000000
fox 1000000
jumped 1000000
over 1000000
the 2000000
lazy 1000000
dog 1000000
at 1000000
a 2000000
restaurant 1000000
near 1000000
lake 1000000
of 1000000
new 1000000
era 1000000
real 0m0.317s
user 0m0.304s
sys 0m0.012s
spion@missperfect:~/Projects/testhash$ time luajit test.lua
new 1000000
lazy 1000000
The 1000000
era 1000000
of 1000000
at 1000000
fox 1000000
jumped 1000000
quick 1000000
brown 1000000
lake 1000000
over 1000000
near 1000000
the 2000000
dog 1000000
a 2000000
restaurant 1000000
real 0m0.161s
user 0m0.160s
sys 0m0.000s
// Like in the c++ version, we
// have a regular hash table
// with a custom hash function
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
struct fht_node {
char* key;
void* data;
struct fht_node * next;
};
typedef struct fht {
struct fht_node ** nodes;
size_t size;
} FHT;
FHT* fht_create() {
FHT *ht = malloc(sizeof(struct fht));
ht->size = 1 << (sizeof(char)*8);
ht->nodes = calloc(ht->size, sizeof(struct fht_node));
return ht;
}
fht_put(FHT* hashtbl, char* key, void* data) {
struct fht_node *node = hashtbl->nodes[key[0]];
while(node) {
if(!strcmp(node->key, key)) {
node->data=data;
return 0;
}
node=node->next;
}
node=malloc(sizeof(struct fht_node));
node->key= strdup(key);
node->data=data;
node->next=hashtbl->nodes[key[0]];
hashtbl->nodes[key[0]]=node;
}
void* fht_get(FHT* hashtbl, char* key) {
struct fht_node *node = hashtbl->nodes[key[0]];
while (node) {
if (!strcmp(node->key, key)) return node->data;
node = node->next;
}
return NULL;
}
int main() {
char* text[19] = {
"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a",
"restaurant", "near", "the", "lake", "of", "a", "new", "era"
};
FHT *hashtbl = fht_create();
int textLen = 19;
int times = 1000000;
int k, *cnt;
while (times--)
for (k = 0; k < textLen; ++k) {
cnt = fht_get(hashtbl, text[k]);
if (!cnt) {
cnt = malloc(sizeof(int));
*cnt = 1;
fht_put(hashtbl, text[k], cnt);
} else { *cnt += 1; }
}
for (k = 0; k < hashtbl->size; ++k) {
struct fht_node *n = hashtbl->nodes[k];
while (n) {
printf("%s %d\n", n->key, *((int *)n->data));
n = n->next;
}
}
}
#include <unordered_map>
#include <string>
#include <vector>
#include <iostream>
#include <string>
using namespace std;
// In C++ despite only accessing the hashtable
// once per loop, we also need a custom hash function
// otherwise the program will be 2 times slower.
struct CustomHash {
size_t operator() (const string& s) const { return (size_t)s[0]; }
};
int main() {
vector<string> text = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"};
unordered_map<string, int, CustomHash> cnt;
int times = 1000000;
while (times--)
for (auto it = text.begin(); it != text.end(); ++it)
cnt[*it] += 1;
for (auto it = cnt.begin(); it != cnt.end(); ++it) {
cout << it->first << " " << it->second << endl;
}
}
var text = ["The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"],
map = {};
times = 1000001;
while (--times)
for (var k = 0; k < text.length; ++k) {
// Unlike luajit, if we put if (map[text[k]]) map[text[k]] += 1
// in v8, this program will become 6 times slower. Instead we create a new object with a counter
// and this way we'll only need to access the hashtable once per loop.
var kth = map[text[k]];
if (kth) kth.c += 1;
else map[text[k]] = {c:1};
}
for (var key in map) {
console.log(key, map[key].c);
}
-- The lua program needs no optimizations.
-- It just works. Fast.
-- Note that luajit probably optimizes away
-- all dictionary lookups :)
local text = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"}
local map = {}
local times = 1000000
while times > 0 do
for i = 1, table.getn(text), 1 do
if map[text[i]] then map[text[i]] = map[text[i]] + 1
else map[text[i]] = 1 end
end
times = times - 1;
end
for key, value in pairs(map) do
io.write(key, " ", value, "\n")
end
Copy link

ghost commented Feb 18, 2017

C code in a comparison is a total mess. Doing the memory allocation for a single integer inside of a cycle
is the most crazy idea I've ever seen:cnt = malloc(sizeof(int));
besides the C-code doesn't replicate what Lua code is doing.
The similar code in C would look like this:

#include <stdio.h>
int main() {
	const char *text[] = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"};
	const int textLen = sizeof(text)/sizeof(text[0]);
	int map[textLen];
	int times = 1000000;
	for (int k = 0; k < textLen; ++k)	map[k] = 0;
	while (times--)	  for (int k = 0; k < textLen; ++k)	map[k] += 1;
	for (int k = 0; k < textLen; ++k)	printf("%s %d\n", text[k], map[k]);
}

The results:

real	0m0.003s
user	0m0.000s
sys	0m0.000s

Its about 25x times faster than Lua.

Please don't make such useless comparison anymore. Thx

P.S. A hint. If you see the C-code is running slower, than any language besides Assembler then 100% you are doing something wrong. Absolutely any program in any language can be translated directly into C-code and cannot run faster than C, because the last one is simplified Assembler and nothing else. Nobody can top CPU instructions on the same CPU with what ever method.

@Dzejrou
Copy link

Dzejrou commented Mar 13, 2017

@Zack24 While I'm not gonna argue whether the C code is a mess or not, the C code you posted works with an array. The Lua code works with a hash table :) So yes, your C code will be faster than the Lua code because your code does basically nothing.

And to your hint, Luajit is a tracing JIT, which has a) information about the exact hardware it runs on (which a C compiler does not) and b) a tracing window of the past X instruction which allows it to perform more optimizations based on the runtime of the program. So yes, a Lua code can be faster than C code (given the right benchmarks), because C relies on static compilation.

@towc
Copy link

towc commented Sep 7, 2017

Over 3 years have passed, I wonder how these compare now. I'm a JS fanboy so I'm going to champion for it. It can be marginally more optimized, but that goes for all other programs. One thing that you need to note is this though:

$ time node -e "console.log(1)"
1

real	0m0.237s
user	0m0.232s
sys	0m0.004s

on my machine. If you want a fair comparison with JIT languages, it's inherent that you need a larger code structure, which is where run-time optimizations really shine, compared to compile-time optimizations

@thetrung
Copy link

Update on 2020 anyone ?

LuaJIT vs NodeJS is a really interesting competition that I want to see more benchmarks here.
They both are great scripting language that I'm comfortable mostly beside compiled .NET language.

@spion
Copy link
Author

spion commented May 26, 2020

Quick summary

  • nodejs: 0.5s
  • luajit: 0.08s
  • C: 0.173s
  • C++: 0.36s

Switching to regular maps with node is now better:

let text = ["The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"],
    map = new Map(),
    times = 1000001;

while (--times > 0)
    for (let k = 0; k < text.length; ++k) {
        let key = text[k];
        let kth = map.get(key) // map[text[k]];
        if (kth != null) kth.c += 1;
        else map.set(key, {c:1})
    }

for (let [key, val] of map) {
    console.log(key, val.c);
}

and results with 0.33s run time.

@spion
Copy link
Author

spion commented May 26, 2020

To offset the effect of startup time we can multiply the times by 10x in which case we get (with the updated JS version):

luajit: 0.76s
C++: 3.35s
C: 1.44s
JS (with Map): 3.00s

It seems like only luajit is doing super-clever (unrealistic) optimizations at this point 😀

@thetrung
Copy link

JS faster than C++ is kinda funny.

Impressed with LuaJIT, must be the compiler doing the magic.

@kokizzu
Copy link

kokizzu commented Sep 18, 2020

from @codenoid https://play.golang.org/p/lwO4m6ZVTeQ

LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
luajit a.lua  0,05s user 0,00s system 98% cpu 0,052 total
go version go1.14.7 linux/amd64
go build a.go && ./a
./a  0,46s user 0,00s system 99% cpu 0,461 total

@iemelyanov
Copy link

Go code can be much faster without an extra check of contains key in the map.

package main

import (
	"fmt"
)

func main() {
	text := []string{"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"}
	dict := map[string]int{}
	times := 1000001

	for i := 0; i < times; i++ {
		for _, word := range text {
			dict[word]++
		}
	}

	for key, val := range dict {
		fmt.Println(key, val)
	}
}

0.24s user 0.00s system 100% cpu 0.245 total

@mbossX
Copy link

mbossX commented May 11, 2021

code

js

const text = ["The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"];
const map = new Map();
const times = 100000000;

let c = 0;
for (var k = 0; k < text.length; k++) {
    c = 1;
    for (let i = 0; i < times; i++) {
        c++;
    }
    map.set(text[k], c);
}

for (var key of text) {
    console.log(key, map.get(key));
}

lua

-- The lua program needs no optimizations.
-- It just works. Fast.
-- Note that luajit probably optimizes away
-- all dictionary lookups :)

local text = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"}
local map = {}
local times = 100000000
local n = #text

for i = 1, n do
    map[text[i]] = 1
    for _ = 1, times do  
        map[text[i]] = map[text[i]] + 1
    end
end

for i = 1, n do
    io.write(text[i], " ", map[text[i]], "\n")
end

c

#include <stdio.h>
int main() {
	const char *text[] = {"The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "at", "a", "restaurant", "near", "the", "lake", "of", "a", "new", "era"};
	const int textLen = sizeof(text)/sizeof(text[0]);
	int map[textLen];
	int times = 100000000;
	for (int k = 0; k < textLen; k++)	{
		map[k] = 1;
		for (int i=0;i<times;i++) {
			map[k] += 1;
		}
	}
	for (int k = 0; k < textLen; ++k)	printf("%s %d\n", text[k], map[k]);
}

gcc test.c -o test -O3

result

js

real    0m1.118s
user    0m1.118s
sys     0m0.000s

lua

real    0m1.774s
user    0m1.775s
sys     0m0.000s

c

real    0m0.001s
user    0m0.000s
sys     0m0.000s

PS

node -v
v15.12.0
luajit -v
LuaJIT 2.0.5 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
gcc version 8.3.0 (Debian 8.3.0-6)

@thetalmid
Copy link

In summary, languages are all so fast and benchmarking so unpredictable that it is impossible to determine which one to choose for your application based on performance only?

@thetrung
Copy link

thetrung commented Sep 6, 2021

when we can e JIT in LuaJIT, no other reason to not use it for scripting, beside written parts in C/C++ for bridge/performance

@LeandroPerrotta
Copy link

In little words, you need know way too much in C/C++ to get same results you get with regular knowledge in Lua/JS.

The times where C/C++ codes, even bad wrote ones, used to mean they were at least be fast than high level languages. Is gone.

@spion
Copy link
Author

spion commented Feb 14, 2022

The summary is that naive benchmarks give you wrong and unrealistic results. You would expect these to be strings, and you would expect the objects to be regular hash tables, but in reality scripting languages can switch data structures or hashing functions for these depending on the way the code is executed at runtime (i.e. JIT) so your test might not be representative of what would happen in a real program.

Still, its an interesting starter for a conversation on micro-benchmarks and JIT optimizations. I recommend mraleph's blog for a deep dive (mostly from a JS perspective), e.g. https://mrale.ph/blog/2014/02/23/the-black-cat-of-microbenchmarks.html

@thetrung
Copy link

Surely, it's always based on real use cases but playground benchmark like this, is really useful to have. There are alot of things happen behind every compiler optimization that programmers may not know or may not need to know, which hide away alot of complexity in development.

I hope people will keep coming back on this thread with their own benchmarks on every language that we are using today.

Productivity vs. Speed always will be fun to discuss.

@kokizzu
Copy link

kokizzu commented Feb 17, 2022

that last C code is cheating btw, it's comparing pointer to array of char, not comparing the array of char (string) itself
so it wouldn't give same result if the string is from user input / not constants.

@dlannan
Copy link

dlannan commented May 30, 2022

The last C code isnt doing a hash lookup. Its a plain array. Which if you read the original code, they were all doing hash map lookups.
This is something luajit does very well because its builtin and the jit optimises the hash lookups at runtime. As others have mentioned, you need to write platform specific code in C to be able to achieve this - and its alot of work (essentially what the lua jit does).
In commercial work, I can vouch for luajit. I developed software for realtime guaranteed packet capture systems (fmad.io) running luajit - and in many cases luajit was easily the best fit for purpose in performance and ease of implementation. We often worked on many GB data sets and with some big network pipes that we needed to run guaranteed packet capture on.
The biggest benefit I found was using luajit as a high speed binder for Clibs.. with zero overhead with ffi calls to c it meant we could utilise C libraries, as well as still use luajit for building the app side. The interesting thing, is how much luajit is used in the network capture world.

@FrankHB
Copy link

FrankHB commented Sep 6, 2023

As a common sense of such micro benchmarks, the init (and deinit) time counts. For example, in hosted implementation of C and C++ on x86_64 Linux, typically the C runtime will have an overhead of less than 100k of insns and C++ will have 3M. This still does not indicate that which one is definitely more efficint though: both can have freestanding implementations with less dynamic initialization required (e.g. getting rid of the iostream initialization mandated by the standard on a C++ hosted implementation). For typical manner of hosted implementations, if the kernel mode runtime/loader is implemented in C++, processes in userland can even share with pre-initized environments with no overhead at all. That said, why benchmarks of userland programs should rely on such stuff? Because this really makes differences when your benchmark runs sufficiently fast.

OTOH, there is nothing comparable on Lua or JS. There are no specs on what exactly mean to implement freestanding environments for them. The runtime initialization do incur some non-deterministic overhead which typically in turn rely on the native init overhead. Then for userland programs, you will compare "impure" implementations: Lua+C vs. JS+C++ instead of Lua vs. JS, for sure. This will not change before you can implement a hosted environment runtime (e.g. the OS kernel and loader) in your language being compared. Well, in this sense, Linux C++ is actually implemented in C (Linux kernel + e.g. glibc ld.so) + C++, but at least there are standard-conforming ways to replace non-C++ parts as long as you can replace Linux and libc, in reality. And where is the native loader implemented in JS/Lua?

I'm a bit surprised that nobody has pointed out this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment