I have been looking at optimizing the fftw package but I don't have the time and I don't know any good way to benchmark it. Thus I figured I would share my findings in the hopes of someone having a go at it if the have the time and feel like doing it. Here's what I found:
- Clear Linux are building with their default flags plus the -ffunction-sections -falign-functions=32 -O3 -flto -fno-semantic-interposition -ffast-math flags but also with avx2/avx512 extensions. https://github.com/clearlinux-pkgs/fftw/blob/master/fftw.spec
- Arch are building with the default flags plus what the call upstream defaults -O3 -fomit-frame-pointer -malign-double -fstrict-aliasing -ffast-math along with some avx and sse2 extension stuff. https://git.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/fftw
- From some benchmarks done by phoronix we can see that the -O3 -flto build is the fastest. Just compare the -march=zenver1 with each other and so on, solus just builds generic arches.