Page MenuHomeSolus

Make stuff suck less
Closed, ResolvedPublic


Optimise all the stuff we flopped on in

Numpy is lame, take it out back and shoot it in the head. Put in a replacement, nobody'll know

Event Timeline

ikey created this task.Oct 12 2017, 9:24 PM
Herald removed ikey as the assignee of this task. · View Herald TranscriptOct 12 2017, 9:24 PM
ikey triaged this task as Unbreak Now! priority.Oct 12 2017, 9:24 PM

Also lame is lame.

feskyde added a subscriber: feskyde.
gromez added a subscriber: gromez.Oct 13 2017, 11:10 AM
kydros added a subscriber: kydros.Oct 13 2017, 4:06 PM
Tearow added a subscriber: Tearow.Oct 16 2017, 9:20 PM

Thoughts to self:

  • numpy is really bad, do before & after benchmarks and make it not suck
  • our flags aren't propagating properly for those tests
  • some tests are even stripping the optimisation level while other distros see the -O2, so we insta lose

Further thoughts:

Clear Linux will use a performance governor, whereas Solus will use on-demand settings, which would throw some of the tests.

sunnyflunk added a subscriber: sunnyflunk.EditedOct 19 2017, 3:53 AM

Is always going to be a challenge when we enforce security flags and debugging and will get worse performance in most cases than what we ship (we can't get PGO or speed optimizations from ypkg or stripped binaries!). Taking a look at the fftw test. I took a look at what a distro ships in their repo, vs what is built in the test (quite a difference):

-Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -O2

-pthread -O3 -lm -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math

How can we make our results faster (note that these don't actually make Solus faster, except the 1st one...)

  • optimize the crap out of glibc, as that's the only part of Solus the test uses outside of the compiler
  • unset our FLAGS so we build with the same flags (at the expense of all user compiled programs on Solus not built with proper security and debugging [which is standard on Linux]. The benefit being no incompatibilities with cross compilations or programs not liking full relro)
  • add -march=native to the environment CFLAGS, since most code can use CPU extensions and therefore not transferable to other machines
  • drop framepointer and -g2 from environment flags. It does create a much larger binary without stripping
    • Test builds showed 1.75MB before -g2 to 14MB just by adding -g2 (with our full flags it was actually 24MB, but 1.5MB stripped).

In terms of numpy, we are largely looking at openblas I'd say (which is something that needs to be fixed). Due to 40 thread best build server being retired (RIP), it used to build with 40 max CPUs. Now it's -DMAX_CPU_NUMBER=8, which given the huge jump in threads is a bit of a regression, even for home loads. I imagine it will perform great provided you have 8 or less threads. Configure flags to look at (it would be good to test this on Ryzen and low threads [I can do that one xD] before and after the changes)

  • NUM_THREADS=? (maybe 64 is sensible. Or we just go 128 if it doesn't impact performance so we don't run into the cap again!)

It does also highlight that it would be good see what a package ships as it's default flags, as they may add some that are advantageous to their particular code. I'm not sure if there's an easy way to do that though outside of running a build with unset flags.

pokgak added a subscriber: pokgak.Oct 22 2017, 11:27 AM
W-Floyd added a subscriber: W-Floyd.Nov 2 2017, 8:30 AM
aszrul added a subscriber: aszrul.
ikey closed this task as Resolved.Nov 30 2017, 10:49 PM
ikey claimed this task.

I'd argue we've done the relevant portions for this already, and numpy/openblas/etc all got a kick up the arse