Doubling the speed of jpegtran with SIMD
It is no secret that at CloudFlare we put a great effort into accelerating our customers' websites. One way to do it is to reduce the size of the images on the website. This is what our Polish product is for. It takes various images and makes them smaller using open source tools, such as jpegtran, gifsicle and pngcrush.
However those tools are computationally expensive, and making them go faster, makes our servers go faster, and subsequently our customers' websites as well.
Recently, I noticed that we spent ten times as much time "polishing" jpeg images as we do when polishing pngs.
We already improved the performance of pngcrush by using our supercharged version of zlib. So it was time to look what can be done for jpegtran (part of the libjpeg distribution).
Quick profiling
To get fast results I usually use the Linux perf utility. It gives a nice, if simple, view of the hotspots in the code. I used this image for my benchmark.
perf record ./jpegtran -outfile /dev/null -progressive -optimise -copy none test.jpeg
And we get:
perf report
54.90% lt-jpegtran libjpeg.so.9.1.0 [.] encode_mcu_AC_refine
Continue reading