RISC-V optimization and -mtune
I’ve been getting into RISC-V optimization recently. Partly because I got my SiFive VisionFive 2, and partly because unlike x86 the number of RISC-V instructions is so managable that I may actually have a chance at beating the compiler.
I’m optimizing the inner loops of GNURadio, or in other words the volk library. I’ve been getting up to a about a doubling of the speed compared to the compiled C code, depending on the function.
But it got me thinking how far I could tweak the compiler and its options, too.
Yes, I should have done this much sooner.
Many years ago now I built some data processing thing in C++, and thought it ran too slowly. Sure, I did a debug build, but how much slower could that be? Half speed? Nope. 20x slower.
Of course this time I never compared to a debug build, so don’t expect that kind of difference. Don’t expect that it’ll reach my hand optimized assembly either, imperfect as it may be.
The test code
This may look like a synthetic benchmark, in simplified C++:
complex volk_32fc_x2_dot_prod_32fc_generic(const vector<complex> &in1,
const vector<complex> &in2)
{
complex res;
for (unsigned int i = 0; i Continue reading