0
I’ve been getting into RISC-V optimization recently. Partly because I
got my SiFive VisionFive 2, and partly because unlike x86
the number of RISC-V instructions is so managable that I may actually
have a chance at beating the compiler.
I’m optimizing the inner loops of GNURadio, or in other
words the volk library. I’ve been getting up to a about a
doubling of the speed compared to the compiled C code, depending
on the function.
But it got me thinking how far I could tweak the compiler and its
options, too.
Yes, I should have done this much sooner.
Many years ago now I built some data processing thing in C++, and
thought it ran too slowly. Sure, I did a debug build, but how much
slower could that be? Half speed? Nope. 20x slower.
Of course this time I never compared to a debug build, so don’t expect
that kind of difference. Don’t expect that it’ll reach my hand
optimized assembly either, imperfect as it may be.
The test code
This may look like a synthetic benchmark, in simplified C++:
complex volk_32fc_x2_dot_prod_32fc_generic(const vector<complex> &in1,
const vector<complex> &in2)
{
complex res;
for (unsigned int i = 0; i Continue reading