| ~ | | Changeset 9086:faeddb9fb678 |
| | ~ | Changeset 9240:e47a0faa1ad0 |
| | |
|
| | | gcc: Enable Link-Time Optimization for gcc >= 4.6 |
| | |
|
| ~ | | This patch adds a scons flag to indicate that compilation and linking
|
| ~ | | should be done using LTO. No check is performed to guarantee that the
|
| ~ | | linker supports LTO and use of the linker plugin, so the user has to
|
| ~ | | ensure that binutils GNU ld >= 2.21 or the gold linker is available. |
| | ~ | This patch adds Link-Time Optimization when building the fast target
|
| | ~ | using gcc >= 4.6, and adds a scons flag to disable it (-no-lto). No
|
| | ~ | check is performed to guarantee that the linker supports LTO and use
|
| | ~ | of the linker plugin, so the user has to ensure that binutils GNU ld |
| | + |
|
| | + |
= 2.21 or the gold linker is available. Typically, if gcc >= 4.6 is
available, the latter should not be a problem. Currently the LTO
option is only useful for gcc >= 4.6, due to the limited support on
clang and earlier versions of gcc. The intention is to also add
support for clang once the LTO integration matures.
|
| | |
|
| | | The same number of jobs is used for the parallel phase of LTO as the
|
| | | jobs specified on the scons command line, using the -flto=n flag that
|
| ~ | | was introduced with gcc 4.6. Supposedly the gold linker also supports
|
| ~ | | concurrent and incremental linking, but this is not used at this
|
| | ~ | was introduced with gcc 4.6. The gold linker also supports concurrent
|
| | ~ | and incremental linking, but this is not used at this point. |
| - | | point. |
| - | |
|
| - | | Currently the LTO option is only useful for gcc >= 4.6, due to the
|
| - | | limited support on clang and earlier versions of gcc. The intention is
|
| - | | to also add support for clang once the LTO integration matures. The
|
| - | | use of LTO is independent of the target, i.e. debug, opt, fast and
|
| - | | prof, although opt and fast are the most likely candidates. |
| | |
|
| | | The compilation and linking time is increased by almost 50% on
|
| | | average, although ARM seems to be particularly demanding with an
|
| | | increase of almost 100%. Also beware when using this as gcc uses a
|
| | | tremendous amount of memory and temp space in the process. You have
|
| | | been warned. |
| | |
|
| | + | After some careful consideration, and plenty discussions, the flag is
|
| | + | only added to the fast target, and the warning that was issued in an
|
| | + | earlier version of this patch is now removed. Similarly, the flag used
|
| | + | to enable LTO, now the default is to use it, and the flag has been
|
| | + | modified to disable LTO. The rationale behind this decision is that
|
| | + | opt is used for development, whereas fast is only used for long runs,
|
| | + | e.g. regressions or more elaborate experiments where the additional
|
| | + | compile and link time is amortized by a much larger run time. |
| | + |
|
| | | When it comes to the return on investment, the regression seems to be
|
| | | roughly 15% faster with LTO. For a bit more detail, I ran twolf on
|
| | | ARM.fast, with three repeated runs, and they all finish within 42
|
| | | minutes (+- 25 seconds) without LTO and 31 minutes (+- 25 seconds)
|
| | | with LTO, i.e. LTO gives an impressive >25% speed-up for this case. |
| | |
|
| | | Without LTO (ARM.fast twolf) |
| | |
|
| | | real 42m37.632s
|
| | | user 42m34.448s
|
| | | sys 0m0.390s |
| | |
|
| | | real 41m51.793s
|
| | | user 41m50.384s
|
| | | sys 0m0.131s |
| | |
|
| | | real 41m45.491s
|
| | | user 41m39.791s
|
| | | sys 0m0.139s |
| | |
|
| | | With LTO (ARM.fast twolf) |
| | |
|
| | | real 30m33.588s
|
| | | user 30m5.701s
|
| | | sys 0m0.141s |
| | |
|
| | | real 31m27.791s
|
| | | user 31m24.674s
|
| | | sys 0m0.111s |
| | |
|
| | | real 31m25.500s
|
| | | user 31m16.731s
|
| | | sys 0m0.106s |