Optimise all the stuff we flopped on in https://www.phoronix.com/scan.php?page=article&item=8700k-linux-distros&num=1
Numpy is lame, take it out back and shoot it in the head. Put in a replacement, nobody'll know
Optimise all the stuff we flopped on in https://www.phoronix.com/scan.php?page=article&item=8700k-linux-distros&num=1
Numpy is lame, take it out back and shoot it in the head. Put in a replacement, nobody'll know
| R420 bash | |||
| R420:b4e1f141ec10 Remove all compiler flags overrides from the environment | |||
Thoughts to self:
Further thoughts:
Clear Linux will use a performance governor, whereas Solus will use on-demand settings, which would throw some of the tests.
Is always going to be a challenge when we enforce security flags and debugging and will get worse performance in most cases than what we ship (we can't get PGO or speed optimizations from ypkg or stripped binaries!). Taking a look at the fftw test. I took a look at what a distro ships in their repo, vs what is built in the test (quite a difference):
Repo:
-Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -O2
Test:
-pthread -O3 -lm -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math
How can we make our results faster (note that these don't actually make Solus faster, except the 1st one...)
In terms of numpy, we are largely looking at openblas I'd say (which is something that needs to be fixed). Due to 40 thread best build server being retired (RIP), it used to build with 40 max CPUs. Now it's -DMAX_CPU_NUMBER=8, which given the huge jump in threads is a bit of a regression, even for home loads. I imagine it will perform great provided you have 8 or less threads. Configure flags to look at (it would be good to test this on Ryzen and low threads [I can do that one xD] before and after the changes)
It does also highlight that it would be good see what a package ships as it's default flags, as they may add some that are advantageous to their particular code. I'm not sure if there's an easy way to do that though outside of running a build with unset flags.
I'd argue we've done the relevant portions for this already, and numpy/openblas/etc all got a kick up the arse