Our compiler flags haven't seen any changes in a long time, there are a few things worth considering. However, it should be noted that overall our flags are quite good and avoid the most of the "bad defaults" of x86.
Initial draft is a bit a brain dump but need to keep the items tracked.
Our current flags are
cflags = -mtune=generic -march=x86-64 -g2 -O2 -pipe -fno-plt -fPIC -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=32 -fasynchronous-unwind-tables -ftree-vectorize -feliminate-unused-debug-types -Wall -Wno-error -Wp,-D_REENTRANT cxxflags = -mtune=generic -march=x86-64 -g2 -O2 -pipe -fno-plt -fPIC -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=32 -fasynchronous-unwind-tables -ftree-vectorize -feliminate-unused-debug-types -Wall -Wno-error -Wp,-D_REENTRANT ldflags = -Wl,--copy-dt-needed-entries -Wl,-O1 -Wl,-z,relro -Wl,-z,now -Wl,-z,max-page-size=0x1000 -Wl,-Bsymbolic-functions -Wl,--sort-common
- In GCC 12 -ftree-vectorize will become default at -O2. In Clang it is already default. However, a new default cost model has been created and made the default - very cheap. We can change the cost model back to -fvect-cost-model=cheap to retain the old behaviour or simply allow the new defaults to pass through. Reportedly very-cheap allows for faster compiler times and smaller code gen overall. However, there are a few cases where cheap can provide significant speed ups. Most likely, we will allow the new defaults to pass through for smaller binaries, and add -fvect-cost-model=cheap to ypkg speed flags for gcc or as an separate option.
- PIE by default. We already have full RELRO, PIC and stack protections enabled by default, the next logical step is to enable PIE. There is a build time option in gcc that enables PIE by default, LLVM/Clang could be patched to do the same thing to avoid most of the headaches. The performance impact is negligible on x86_64.
- Consider -fno-plt by default. With full RELRO, in effect we already avoid the PLT, however, why not go one step further and remove it entirely. It'll make programs slightly quicker and smaller. We will need an opt-out in YPKG for problematic programs that do screwy things (mostly xorg packages, glibc, wine). A good write of the cost-benefit for no PLT on x86_64 can be found here. It should be noted that Arch has had no PLT default for a while now with minimal issues.
- -Bsymbolic-non-weak-functions is a slightly safer version of -Bsymbolic-functions. However, we don't need to disable symbolic functions in too many of our packages. Still worth consideration.
- Support DT_RELR when it becomes available. It'll make shared objects and PIEs smaller and potentially quicker (as more of the program can fit in cpu cache). -Wl,-z,pack-relative-relocs.
- GCC 12 introduces a new option -mrelax-cmpxchg-loop, for reportably better pthread initialization, less cache misses when using the cmpxchg instruction in a spinlock: https://github.com/gcc-mirror/gcc/commit/0435b978f95971e139882549f5a1765c50682216