10 Stack Benchmarking DOs and DON’Ts

An interesting question came up on the mechanical-sympathy list about how to best benchmark a stack of different queue (aeron/argona, jctools, dpdk, pony) and transport (aeron, dpdk, seastar) options.
Who better to answer than Gil Tene, Vice President of Technology and CTO, Co-Founder, of Azul Systems? Here's his usual insightful and helpful response:
If you are looking at the set of "stacks" (all of which are queues/transports), I would strongly encourage you to avoid repeating the mistakes of testing methodologies that focus entirely on max achievable throughput and then report some (usually bogus) latency stats at those max throughout modes.
The tech empower numbers are a classic example of this in play, and while they do provide some basis for comparing a small aspect of behavior (what I call the "how fast can this thing drive off a cliff" comparison, or "peddle to the metal" testing), those results are not very useful for comparing load carrying capacities for anything that actually needs to maintain some form of responsiveness SLA or latency spectrum requirements.


