Nuclex Signal/Slot Library: Benchmarks
When you’re writing some code that needs to notify code in othe r parts of the program, your weapon of choice is the "signal / slot concept". A signal is a connection point where any interested party can register a callback function to be invoked when the signal emits/fires.
There’s already an ocean of libraries out there providing this functionality to C++, but as you will see in this article, they’re all suffering from performance issues in one way or other. Plus, most don’t compile without warnings, have inconvenient sytax or lack unit tests.
So here’s the signal/slot "library" (it’s just three headers) I wrote to fix those issues for me, together with a summary of my design goals and a comprehensive benchmark on different compilers and CPUs.
Goals / Mandatory Use Cases
I started with a short laundry list of my use cases to guide the design:
Optimize granular usage (many small individual signals rather than a big multi-purpose one)
- minimal memory footprint embedded in classes
- fast construction / destruction
- Support GCC, clang and MSVC (with maximum warning levels)
- Performance should be near a vanilla virtual method call
Able to collect return values from subscribers / slots
- without allocating memory
- Binary (executable) size should stay small
- Callbacks must be able to unregister themselves while being called back
- Callbacks must be able to subscribe other callbacks while being called back
- Reliable (unit tests for every valid use and for every error case)
- Unsubscribe with function/method pointer + instance pointer pair, no "connection" objects or ids.
I then set out to find the leanest, fastest implementation that can cover these requirements ignoring everything else.
Compiler options: fastest possible code that runs on generic x86-64 (amd64) CPU.
/TP /GF /utf-8 /W4 /GS- /fp:fast /EHsc /std:c++17 /GR /O2 /Oy /Oi /Gy /GL /MD /Gw
-fvisibility=hidden -fvisibility-inlines-hidden -Wpedantic -Wall -Wextra -Wno-unknown-pragmas -shared-libgcc -fpic -funsafe-math-optimizations -std=c++17 -fpermissive -O3 -flto -fpie
-fvisibility=hidden -fvisibility-inlines-hidden -Wpedantic -Wall -Wextra -Wno-unknown-pragmas -fpic -funsafe-math-optimizations -std=c++17 -fpermissive -O3 -flto -fpie
Scores are cpu cycles per action. Benchmark runs repeat an action for
between 10,000,000 and 500,000,000 times, measure the total time, then
cycles_per_action = (cpu_speed_ghz x 1,000,000,000) / (total_time / number_of_repeats).
Overall tab shows the average of all data.
cpu cycles per action (lower = better)
optmizes for removing the oldest or newest callback. My benchmark removes callbacks in
reverse order of subscription. If the removal order is randomized, the result is a lot
worse (but at 50 callbacks still beats any competitors)
Nano: I included it because I wanted to compare to one of the fastest libraries around. However, it doesn’t support callbacks unsubscribing themselves while being called back and therefore doesn’t actually meet my requirements.
libsigc++ is interesting in terms of construction time. I assume it doesn’t initialize a thing until the first subscription, so the price is paid later.
Boost.Signals2 is known to be slow, but the results are just ridiculous.
Fairness notice: many of the libraries tested are thread-safe and thus handicap their performance with mutexes. This is silly, imho, since adding cheap mutex-based thread safety takes me one minute to do with a wrapper class around an event. I’m working on a lock-free, thread-safe event, but this will take time.
How much larger does a class become per event it embeds?
Note: The Nuclex implementation has a built-in buffer for 2 subscribed callbacks. It can be configured with a buffer for only 1 callback, which reduces its size to 32 bytes.
Nano Signals 11: https://github.com/FrankHB/nano-signal-slot
Nano Signals 17: https://github.com/NoAvailableAlias/nano-signal-slot