Page MenuHomeSolus

Optimizations for fftw package
Open, NormalPublic

Description

I have been looking at optimizing the fftw package but I don't have the time and I don't know any good way to benchmark it. Thus I figured I would share my findings in the hopes of someone having a go at it if the have the time and feel like doing it. Here's what I found:

Event Timeline

Jacalz created this task.Aug 8 2019, 8:53 AM
Jacalz updated the task description. (Show Details)Aug 8 2019, 8:54 AM
Jacalz edited projects, added Software; removed Lacks Project.Aug 8 2019, 8:56 AM

If you wish to take a look at optimizing it, you can look at the the phoronix test suite for fftw. Which runs fftw-source/tests/bench --time-repeat 100 -opatient libc64 There are many other ranges you can throw it it like ibc4096x4096 to libc32. You won't be able to use the phoronix test at it compiles it's own fftw and doesn't test the system version.

You can also download the fftw-bench collection. These programs are more comprehensive and take a lot longer to run.

I imagine the biggest win would be from enabling AVX2, although I imagine 03 and LTO can still make a difference. You can specify avx2 : yes in package.yml but you'll only want to enable AVX2 for the single and double variant. Not the long-double one. For the AVX2 build it is probably also worth it to add --enable-fma in addition to --enable-avx2.

Have a look and the glibc and openblas packages for examples of AVX2 enablement. Also take a look at the clearlinux package for fftw here

You need to be careful that after after enabling AVX2, CPUs which do not have AVX2 instructions can still compile fftw and run the non-avx2 libraries as normal. You'll also need to make sure that the avx2 libs are in /usr/lib64/haswell/ and that fftw and programs which use fftw are successfully loading the libs from /usr/lib64/haswell/. You can use strace to verify this.

Lastly, you'll need to demonstrate it actually makes a difference within fftw itself and, if you can, programs which use fftw if they have a benchmark you can run.

I would look at this myself but sadly I do not have any systems that have AVX2 at the moment and I'd imagine 03 and LTO would only provide slight speedups.

So that's a lot, but feel free do start testing if it's something you want to get into and use it as a learning experience if nothing else ;)

Thanks a lot for the information @joeboneichie! It is greatly appreciated, but as it currently stands none of my Linux machines have support for AVX2 either :I

oh well! Hopefully it's still useful info for somebody.

DataDrake triaged this task as Normal priority.Fri, Sep 6, 2:36 PM
DataDrake moved this task from Backlog to Improvement on the Software board.
DataDrake added a subscriber: DataDrake.

We should probably enable the Haswell libs at least. Unfortunately Haswell is the closest match for modern architectures because of weirdness in the ISAs between Haswell and Core2.