This will hopefully give an broader idea what kind of setup & portability issues one can hit. I will also show some benchmarks. Today I will however start with a short review of history and current status of implementation of link- time optimizations in GCC. Update: See also Martin Li. The compiler parses the source, produces a representation in the intermediate language (IL), optimizes it and eventually generates assembly. Assembly file is turned into an object file by the assembler. Resulting binary is produced by the linker. Whole-Program Optimization of Executables” Project Summary Report #5 (Report Period: 7/1 /2015 to 9/30/2015). This includes some sizeable applications such as gcc (3.6 MB), gobmk (3.9MB), and Xalan (5.7MB).A scalable whole program optimizer for GCC Thursday, November. Consider this program: foo.c. At runtime the dynamic linker links multiple DSOs together and executes a program. Finally dynamic linker may be used again via dlopen to load more libraries/plugins. Practically all non- trivial optimizations happens in the compiler, while some limited tricks are performed by both assembler and linker. Early simple compilers internally worked on statement basis; often it was the case that the whole program would not fit into the memory (I will call this compilation scheme statement- at- a- time). Statement- at- a- time compilation greatly limits optimizations one can perform. Generally one can do only those transformations that are obvious from considering single statement in the isolation (such as expression simplification) though some basic inter- statement optimization is possible, too, such as basic common subexpression elimination or peephole optimization. Such optimizations can be implemented, at least in the limited form, as a single pass through the instruction stream without ever looking back. Extending the scope of the compilation from statement to function (function- at- a- time compilation) permits most of classical optimizations including constant propagation, common subexpression elimination, loop optimization, register allocation, advanced instruction selection and instruction scheduling. GCC started with a mixture of statement- at- a- time approach and function- at- a- time. Frontends generated intermediate code per- statement basis and immediately lowered it to very low- level (close to assembly) intermediate language called RTL (Register Transfer Language). According to Richard Stallman this was mostly done to deal with limitations of the stack size and memory of M6. GCC on. The rest of the compiler worked at function- at- a- time. Even early GCCs was thus able to do basic function wide optimizations and until late 2. GCC series the optimizers has developed into strong low level optimization engine that was more portable to new architectures than any other compilers available at that time. The optimization queue consited primarily of strenghtened form of constant subexpression elimination (global value numbering), loop optimization (loop unrolling, strength reduction), jump optimization (including conditional jump generation), instruction combining, register allocation, and peephole optimization. Interprocedural optimization. Whole program optimization is the compiler optimization of a program. Link Time Optimization (LTO) enabled, GCC is able to dump. Also gcc 4.5 will finally be able to do whole program optimization. But the only way to know for sure is just benchmark. Optimizing Linker.o files Front-End Compiler GCC FE Mid-Level Opt GCC FE Mid-Level Opt.a files Source File Source File Source File GCC FE Mid-Level Opt 'LLVM' Assembly Link Whole Program Analysis and Optimization RTL Expansion. Application Performance Evaluation on Different Compiler. Keywords – SPEC CPU2006, GCOV, GCC, Optimization 1. Those optimization scopes. Flexible Particle System - Tools Optimization. Whole program optimization: Yes; Enable enhanced instruction set. It seems that GCC optimizer does a much better job than Visual Studio!
Cygnus was a pioneering free software company that made its living from porting GCC to new architectures. The main focus in late 9. CISC centric optimizers work well with RISC and later VLIW instruction sets and to generally improve set of intra- procedural optimizations. Important development change happened in 1. EGCS fork which enabled some of deeper changes in the compiler. This brought challenges not only for the developers of C++ front- end, library and runtime (that is a separate topic I do not know enough about to write it), but soon enough for the backend developers, too. In late 9. 0's GCC was converted to function- at- a- time parsing. This was motivated mostly by a need of reasonable implementation of C++ templates. Here one want to keep close to source representation (generally called Abstract Syntax Tree, or AST) of templates until instantiation happens. Just like preprocessor macro, the template alone has no semantics that may be easily represented by classical compiler IL. This is hard to implement within a front- end that has no internal representation of the whole function body. It became essential to reliably inline functions, optimize out unreachable function bodies and do higher level transformation on function bodies (such as to replace data stored in instances of objects by scalar variables after all methods was inlined). C++ programers started to make strong assumptions about compiler's ability to cleanup their constructions while that are often not very well thought out from compiler developer's perspective. As one of main offenders, function inlining was not a strongest point of GCC, which still worked in function- at- a- time mode: GCC backend was organized as a library that was used by frontend whenever it needed to produce something into the assembly file. Functions declared inline was remembered if their body did not exceed 6. Obviously such inliner was limited to inline only in forwarding direction (the function must have been defined before used) and also was believed to consume a lot of memory because it remembered the expanded function bodies with all inline decisions already applied. In the new implementation inlining was moved into the front- end (in fact, both inliners has coexisted for few years before the new inliner was ported to all front- ends). At that time, C++ front- end already saved bodies of functions in the AST- like representation to handle templates. It also did not output most of functions just after it finished its parsing. Majority of functions was remembered in high- level form till very end of compilation when basic reachability analysis was performed to eliminate dead functions and to avoid unnecesary work of lowering them into the RTL representation. This was done because C++ include files tends to contain a lot of unused inline functions. Adding inlining was conceptually simple. At the end of the compilation, the compiler started with functions that was clearly necessary (i. For every function the following steps was applied: It was decided if function is inlinable and of so, its body was saved for later use. Calls to all small enough inlinable functions was inlined and recursively also all such calls within the inlined function bodies. The function was optimized and output to assembly. The expanded function body was released from memory. Every function that got mentioned in the assembly output was added into list of functions that are necessary. Initial implementation simply counted every statement to have 1. Functions was autoinlined if their size was less than 1. One of more famous examples, at that time was bug PR8. Gerald Peiffer) that exposed exponential explosion in memory usage of the compiler while building template heavy C++ program. This bug defeated a solution using simple hacks to GCC and nicely exposed problems of the new design. First problem is use of inline in C++ code. Functions declared in header are often implicitly inline only because of developer's lazyness and the inline keyword in general is not as useful as one would expect. This means that compiler can not honor all inline functions in the program - inlining those alone will make compiler to explode on was majority of modern C++ sources. We had some relatively heated discussions about it in 2. For this reason always. His Ph. D research project was implemented using pooma and till today it serves as very useful benchmark for new inliner patches. SUSE's benchmarking server nicely shows how the performance was steadily improving in 2. At that time it was already obvious that function- at- a- time optimization brings limitations we did not like. It used simple greedy algorithm that inlined all small enough functions in the order given by badness metrics (num. The algorithm was optimizing for number of functions fully inlined into their callees. The limits was basically a security precautions to avoid exploding on inline bombs and exposing nonlinearity of the compiler on very large function bodies. Bit more details about the early design appear in 2. GCC summit paper and in 2. GCC summit paper. The inliner heuristics was the incrementally improved into the today form and also the callgarph module has grown up into a full inter- procedural optimization framework with additional optimizations implemented, such as constant- propagation, function specialization, partial inlining, inter- procedural points- to analysis (still not enabled by default) and others. The inliner still remains the most important component of it, probably as in every optimizing compiler. Since late 9. 0's better commercial compilers and recently open- sourced Open. Here GCC was held back by both technical and political problems. Clasical link- time optimization is implemented by storing the intermediate language into the . To make this possible, one need to have well defined representation of the whole compilation unit and the whole program. Because the backend has grown- up from the per- function compilation, it did not really had this. At a same time, the Free Software Foundation was concerned about the consequences of adding a well- defined on- disk representation of GCC IL that could be used to hook GCC into proprietary front- ends and back- ends without violating GPL 2. First attempt to get intermodule was implemented by Apple developer Geoff Keating. His implementation extended the C front- end to allow parsing of multiple source files at once and presenting them to backend as one translation unit. While this feature allowed some early experiments with whole program optimization, it also had serious limitations. It worked for C only and plans to extend it for C++ was never realized. It however worked well for many simpler programs and it was also able to detect programming errors that would be unnoticed with traditional scheme. For this reason a new middle- end was implemented. The tree- SSA project was probably the most important change that happened to GCC recently, but a topic for a different article ; ). LLVM (2. 00. 3), Open. At the time Tree- SSA merge was being prepared, Chris Lattner presented at 2.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
January 2017
Categories |