A few changes will have to be made to our tooling to switch to the new glibc HWCAPS capabilities.
AVX2/Haswell Hardware Capabilities
glibc have changed the install path from /usr/lib64/haswell to /usr/lib64/glibc-hwcaps/x86-64-v3/ (for our case) to be cross platform agnostic i.e. AMD inclusion, the old haswell path hasn't be removed yet but will be at some point (see news note below)
A future version of glibc will stop loading shared objects from the "tls" subdirectories on the library search path, the subdirectory that corresponds to the AT_PLATFORM system name, and also stop employing the legacy AT_HWCAP search mechanism. Applications should switch to the new glibc-hwcaps mechanism instead; if they do not do that, only the baseline version (directly from the search path directory) will be loaded.
gcc have introduced new arches definitions to match the glibc definitions e.g. -march=x86-64-v3 - https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
The closest match to x86-64-v3 is bdver4 (Excavator), this is actually a step down from haswell and we lose the following extensions: FSGSBASE PCLMUL RDRAND XSAVEOPT. This could potentially lead to a performance regression, but does include bdver4 processors. However, the AVX2 implementation of Excavator has a bad reputation from memory.
The relevant commit with everything nicely spelt out for us: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9a6b9396884b67c7c
TODO:
- Update gcc v11
- Update LLVM stack to v12
- YPKG2: when building with avx2 on change the %installlib% directory to /usr/lib64/glibc-hwcaps/x86-64-v3/
- YPKG2: when building with avx2 on change the march to -march=x86-64-v3. Keep -mtune=haswell for now.
- YPKG2: maybe a good idea to not run the check step on avx2 builds to avoid confusing segfaults for pc's without avx2 whilst we are here.
- dracut: Also ignore /usr/lib64/glibc-hwcaps/x86-64-v3/ dir.
- snapd: Allow libs from /usr/lib64/glibc-hwcaps/x86-64-v3/ to be loaded.
- Rebuild every package that has avx2 libs enabled to the new paths:
- glibc
- openblas
- gromacs
- libflac
- libwebp
- pytorch
- fftw3
- graphene
- openimagedenoise
- openvdb
- xxhash
- ...more?
- Verify these new paths are being loaded on Intel >=haswell and AMD >=zen (ldd).
- Verify there isn't any performance regression with base glibc libs. /usr/lib64/glibc/benchmarks from glibc-devel (or use phoronix's pts/glibc-bench)
- Verify there isn't any performance regression with the slightly different ARCH definitions for end applications (echo | gcc -dM -E - -march={haswell,x86_64-v3} to check).
- Benchmark R via rbenchmark. It sees a decent uplift from openblas' avx2 libs to verify there isn't any performance regression.
- Verify that the 32bit AVX2 stuff (/usr/lib32/glibc-hwcaps/x86-64-v3/) works as expected. Doesn't seem to be mentioned in the x86-64 psABI.