| The Problem
Compiler front end generates expressions in arbitrary order
• Some orders (or shapes) may cost less to evaluate
• Sometimes “better” is a local property
• Sometimes “better” is a non-local property
Compiler should reorder ex pressions to fit their context
Old Problem
| |
 |
|
• Recognized in
1961 by Floyd
• Scarborough & Kolsky did it manually in Fortran H Enhanced
• PL.8 and HP compilers claimed to line -help/adding-exponents/solving- cubic -equations.html">solve it , without publishing
• Briggs & Cooper published one algorithm in 1994 |
 |
| • Eckhardt has another idea in the pipeline |
|
Need an efficient & effective way to rearrange expressions |
| Example |
Addition is commutative &
associative for integers |
Consider a simple three operand add
 |
 |
 |
 |
|
•
What if x is 2 and z is 3?* |
|
|
•
What if y+z is evaluated earlier?* |
A purely local issue
A non-local issue |
The “best” shape for the x+y+z depends on contextual knowledge
> There may be several conflicting options |
| Relating it to Lazy Code Motion
Lazy code motion makes significant improvements
• Sometimes, it misses opportunities
• Can only find textual subexpressions
• Array subscripts are a particular concern (Fortran H )
LCM has its limitations
• Requires strict naming scheme
> Can only run it once, early in optimization
> Other optimizations will destroy name space
• Relies on lexical identity (not value identity )
Would like version of LCM that fixes these problems
Should be fast, easy to implement, & simple to teach … |
| Plan of Attack 1.
Reassociation
> Discover facts about global code shape
> Reorder subexpressions based on that knowledge
2. Renaming
> Use redundancy elimination to find equivalences
> Rename virtual registers to reflect equivalences, and to
conform to the code shape constraints for LCM
> Encode value equality into the name space
3. LCM
> Run LCM unchanged on the result
> Performs code placement, partial redundancy elimination
> Run it anywhere, anytime, on any code
This lecture focuses on reassociation & renaming |
| Reassociation Simple
Idea
• Use algebraic properties to rearrange expressions
• Hard part is to choose one shape quickly
Our approach
1. Compute a rank for each expression
2. Propagate expressions forward to their uses
3. Reorder by sorting operands into rank order
Need a guiding principle
Order subscripts to improve code
motion (& constant propagation) |
| Computing Ranks The
intuitions
• Each expression & subexpression assigned a rank
• Loop-invariant’s rank < loop-variant’s rank
• Deeper nesting => higher rank
• Invariant in 2 loops < invariant in 1 loop
• All constants assigned the same rank
• Constants should sort together
 |
| Computing Ranks The
algorithm
1. Build pruned SSA form & fold copies into Ø-functions
2. Traverse CFG in reverse postorder (RPO)
a. Assign each block a rank number as visited
b. Each expression in block is ranked
i. x is constant => rank(x) is 0
ii. result of Ø-function has block’s RPO number
iii. x <op> y has rank max(rank(x), rank(y))
This numbering produces the “right” intuitive properties |
| Forward Propagation
The intuition
• Copy expressions forward to their uses
• Build up large expression trees from small ones
The implementation
• Replace Ø-functions with copies in predecessor blocks
• Trace back from copy to build expression tree
• Split critical edges to create appropriate predecessors
| |
 |
Split them here, not during ranking |
Notes
• Forward propagation is not an improvement
• Addresses limitation in PRE (expr live across > 1 block)
• Eliminates some partially-dead expressions |
| Reordering Expressions
The intuition
• Rank shows how far LCM can move an expression
• Sort subexpressions into ascending rank order
• Lets LCM move each as far as possible
The implementation
• Rewrite x -y + z as x + (-y) + z [Frailey 1970]
• Sort operands of associative ops by rank
• Distribute ope rations where both legal & profitable
| Distribution |
 |
Room for more work on this
issue |
| |
• Sometimes pays off, sometimes does not A(i,j,k)
• We explored one strategy: low rank x over high-rank + |
| Making it work What
have we done?
• Rewritten every expression based on global ranks
> and local concerns of constant propagation ...
• Tailored order of evaluation for LCM
• Broken the name space that LCM needs
> so, we cannot possibly run LCM
Undoing the damage
• Must systematically rename values to create LCM name space
• Can improve on the original name space, if we try
• Need a global renaming phase |
| Renaming |
 |
AWZ uses Hopcroft’s partitioning
algorithm
(read DFA minimization) to create equal-
valued equivalence classes of expressions |
| The intuition |
|
| • Use Alpern et al.’s
partitioning method |
| • Rename every value to expose congruences found by AWZ |
The implementation
|
• x, y ∈ same congruence class # use same name |
|
Reconstruct the 4
magic naming rules |
|
• Use hash table to regenerate consistent names |
 |
|
• Reserve variable names & insert copies |
Notes
• Clever implementation might eliminate some stores
• Variables become obvious from conflicting definitions |
| Results What do we
gain from all this manipulation?
| • Can run LCM (or PRE) at any point in
the optimizer |
 |
Important in our
adaptive work |
| > Can reconstruct the name space |
|
|
| > Makes results independent of choices
made in front end |
|
|
• More effective redundancy elimination
> Measured with PRE (not LCM)
> Reductions of up to 40% in total operations (over PRE)
| • Sometimes, code runs more slowly |
 |
Stronger methods can
remove them, but this
is a minor effect and ... |
| > Forward propagation moves code into
loop |
| > PRE cannot move it back out of the loop |
|
| Other Issues Code
Size
• Forward propagation is potentially exponential in space
• Measured results
> Average was 1.27x; maximum was 2.488; 1 of 50 was ≥2
• Stronger LCM methods avoid this problem by cloning, which has
its own code size problems, so …
Distribution
• Can destroy common subexpressions
• Has choice of shapes & can pick less profitable one
Interaction with other transformations
• Shouldn’t turn multiplies into shifts until later
• Reassociation should let OSR find fewer induction variables |