Return to Level1Techs.com

A compilation of slightly totally insane GCC ricer flags

funny

#1

Quite a while ago, I went over the full GCC documentation of the --param settings, in an effort to try and find some magical setting to make a certain program go faster. I ended up with the ridiculous set you see below, recompiled the project and the speedup was a big fat zero percent.
Ricer settings indeed.
But at least in my case nothing broke, so here they are, mostly having a theme of letting the compiler use all the RAM and CPU time it wants.
Maybe there is some program out there, that would benefit from these, so if you are in a “why the hell not?” mood, you are welcome to try them.

–param max-crossjump-edges=100000 --param max-delay-slot-insn-search=100000 --param max-delay-slot-live-search=100000 --param max-gcse-memory=2000000000 --param max-pending-list-length=100000 --param max-modulo-backtrack-attempts=100000 --param large-function-growth=100000 --param inline-unit-growth=100000 --param ipcp-unit-growth=100000 --param large-stack-frame-growth=1000000 --param max-early-inliner-iterations=1000 --param max-hoist-depth=1000 --param max-tail-merge-comparisons=1000 --param max-tail-merge-iterations=1000 --param iv-consider-all-candidates-bound=1000000 --param iv-max-considered-uses=1000000 --param scev-max-expr-size=100000 --param scev-max-expr-complexity=100000 --param max-iterations-to-track=1000 --param max-cse-path-length=100000 --param max-cse-insns=1000000 --param max-reload-search-insns=100000 --param max-cselib-memory-locations=100000 --param max-sched-ready-insns=100000 --param max-sched-region-blocks=100000 --param max-pipeline-region-blocks=100000 --param max-sched-region-insns=100000 --param max-pipeline-region-insns=100000 --param selsched-max-lookahead=100000 --param selsched-max-sched-times=10000 --param selsched-insns-to-rename=1000 --param max-partial-antic-length=1000000000 --param sccvn-max-scc-size=10000000 --param sccvn-max-alias-queries-per-access=100000 --param ira-max-loops-num=10000 --param ira-max-conflict-table-size=10000 --param loop-invariant-max-bbs-in-loop=10000000 --param loop-max-datarefs-for-datadeps=100000 --param max-vartrack-size=100000 --param max-vartrack-expr-depth=100000 --param ipa-cp-value-list-size=10000 --param ipa-max-agg-items=10000 --param max-slsr-cand-scan=10000


#2

@AdminDev @Kat

Get in here. GCC stuff.


#3

You want execution/run time to go faster or you want compilation to go faster?


#4

The goal was generating the fastest possible executable, without any regard for compilation time.


#5

:joy:

using flags in GCC is like running around naked high on drugs in a minefield. The further you go from default, the less tested all of it is and the more subtle bugs you can expect to blow up in your face. Wouldn’t recommend lol


#6

Yeah, for regular usage I use something like this:
g++ -O3 -std=c++14 -march=<whatever> -flto -fno-exceptions -fno-rtti -fno-unwind-tables
Fairly standard, except the disablement of RTTI and exception handling.


#7

gcc -Wall -Wextra -Wshadow -pedantic -std=c11 is my default

I add -O2 when doing “release” builds


#8

Oh god… WHY?!


#9

why what?


#10

If you’re so desperate for performance consider writing some of your program in assembly :stuck_out_tongue:


#11

I heared that so many times from people who should know better, I get vietnam flashbacks when someone says that…


#12

Yeah, I looked up the inline ASM syntax for GCC once, recoiled in horror and nausea, and decided to never touch that. The AT&T style is garbage by itself, but nooo, they had to make it even worse.


#13

So much this.

You may get 1-2 percent if you’re lucky with ricer flags at the cost of throwing the well known behaviour of the code under “standard” settings out of the window.

The amount of time you spend fucking with ricer compiler flags will not net you anything.

Use the flags the developer put in their make file. They probably know what’s best for their code better than you do.


#14

No love for -Os ?


#15

-Os is usually far worse than O2/O3, disables way too many things, there might be a couple rare cases where it it is faster due to being able to fit the entire hot code into the caches (L1i and µop), but by and large it is just for highly memory/storage constrained situations, like AVRs and tiny ARM SoCs


#16

Oh snap! That will go right into my cringe comp CFLAGS and CXXFLAGS on make.conf!