I have some very simple benchmark that is run via Catch2, and compiled with -O3 using emscripten 3.1.37:
BENCHMARK("cpp sin only")
{
double sum = 1.0;
for (int t = 0; t < 2000000; ++t) {
sum += sin(double(t));
}
return sum;
};
#ifdef __EMSCRIPTEN__
BENCHMARK("js sin only")
{
EM_ASM_DOUBLE({
let sum = 1;
for (let i = 0; i < 2000000; i++) {
sum = sum + Math.sin(i);
}
return sum;
});
};
#endif
I would expect, that there wouldn't be a large difference between JavaScript and WebAssembly, but there is:
chrome:
benchmark name samples iterations est run time
mean low mean high mean
std dev low std dev high std dev
-------------------------------------------------------------------------------
cpp sin only 100 1 7.93775 s
79.3856 ms 79.147 ms 79.7195 ms
1.43061 ms 1.10437 ms 1.97222 ms
js sin only 100 1 2.21506 s
22.1354 ms 22.0064 ms 22.3 ms
742.138 us 614.746 us 901.128 us
Native, compiled with GCC 12.3.0, i get 24.2ms.
- To my understanding, JavaScript uses double precision floats for all numbers. So the comparison should be fair. When using float in the C++ version, it gets to 12ms in chrome, but that is still slower (and less precise). FF sits at around 30ms.
Maybe JavaScript uses a less precise, but faster implementation of sin and sqrt? Adding-fast-mathdoesn't increase performance for double. Float with fast-math in chrome becomes as fast as JavaScript, in FF it's still at around 30ms.- Is it, that WebAssembly isn't given as much time in the optimiser? That could explain, why it's so much slower in FF. But shouldn't emscripten take care of most of the optimisation?
- Could it be some sort of protection against meltdown/spectre?
Update (I ran several additional benchmarks on a request in the comments):
- g++12 is a bit faster than clang15, but that's within 10%
- performance of sqrt is almost equal between the versions (before there was a sqrt and a sin in the example)
- most of the time is spent in sin.
- increasing the iteration count to 2 millions makes JS about 4x faster. increasing it to 5 millions increases the lead to 10x. JS is still about the same speed as C++ native.
- note that the benchmark is execute 100 times by Catch2. The runtime of above code is around 1s.
- I verified that it's not as simple as JS using float. The webassembly c++ computation matches the JS result exactly.
- using 123456789042+1000000 increased the runtime by about 3-4x on gcc, clang native, webassembly c++ and js (the relative performance webassembly vs js stayed about the same).
- reference, this is the code i used: https://pastebin.com/Mu2barB6 and here are the results for chrome: https://pastebin.com/Hbte7yRj
update 2:
After a comment by user21489919, I reported the issue to emscripten.
you might double check (ha!) that the long double version of sin() is not being invoked for some reason. It's not obvious why it would be from your code sample, but c++ std provides long double (128 bit in clang) versions of sin().
You could be double double sure that regular double is used by using .c instead of .cpp and #include <math.h>.