Optimization
Introduction
The goal of this assignment is to implement a series of optimizations for the L3 compiler.
To whet your appetite, once your optimizer is complete, it should be able to optimize the following L3 example program:
(def byte-write (fun (c) (@byte-write c))) (def compose (fun (f g) (fun (x) (f (g x))))) (def + (fun (x y) (@+ x y))) (def succ (fun (x) (+ x 1))) (def twice (fun (x) (+ x x))) (byte-write ((compose succ twice) 39)) (byte-write ((compose succ succ) 73)) (byte-write ((compose twice succ) 4))
to the following CPS/L3 code:
(let* ((v_2 (@byte-write 79)) (v_3 (@byte-write 75)) (v_4 (@byte-write 10))) (halt 0))
which, as you can see, is pretty optimal.
Initial setup
To start working on this assignment, you should download the Zip archive containing our skeleton.
Overview
After downloading our skeleton and extracting it into your project, you will have access to the following new files:
Statistics.scala
, containing a class that can collect statistics during CPS interpretation, which is useful to evaluate your optimizer (see Testing and submission),CPSOptimizer.scala
, containing a skeleton of the optimizer.
Your goal is to complete the second of these files, which contains three top-level entities:
CPSOptimizer
, an abstract class containing the generic CPS optimization framework that works for all versions of the CPS language,HighCPSOptimizer
, an object inheriting fromCPSOptimizer
and specializing it to the high-level version of the CPS language (before values representation),FlatCPSOptimizer
, an object inheriting fromCPSOptimizer
and specializing it to the flat, low-level version of the CPS language (after values representation and hoisting).
The last one is already complete, and your task consists first in finishing the CPSOptimizer
class, and then writing the HighCPSOptimizer
object. You are expected to implement the following optimizations:
- dead code elimination (DCE),
- common subexpression elimination (CSE),
- constant folding,
- simplifications due to neutral/absorbing elements,
- shrinking inlining,
- general inlining, using the heuristic described below,
- additional optimizations if you wish, e.g. those related to blocks.
Notice that we do not ask you to implement η-reduction, or contification, the latter being very tricky to get right and typically done in a separate phase of its own.
In CPSOptimizer
, optimizations are performed by two separate functions: shrink
, which does all the shrinking optimizations, and inline
, which does general (non-shrinking) inlining. Remember that shrinking optimizations can be applied at will, while general inlining cannot as it can make the code grow arbitrarily.
Optimization is performed by combining the shrink
and inline
functions as follows:
- first, the incoming tree is shrunk,
- then, a maximum size for the optimized program is computed, currently as 3/2 the size of the shrunk tree,
- then, inlining and shrinking are performed iteratively until either:
- a fixed point is reached, or
- a maximum number of iterations is reached, or
- the program grows above the maximum size.
At each iteration of the inlining/shrinking step, all functions or continuations whose size is below a limit are inlined systematically. That maximum size grows linearly with the iteration count for continuations, and according to the Fibonacci sequence for functions.
In other words, during the first inlining round, all functions and continuations of size 1 are inlined systematically. During the second round, all those of size 2 or less are inlined systematically. And so on until inlining stops for one of the reasons described above.
Both shrink
and inline
use a value of type State
to track information about the program and guide optimizations. This state must be updated as the tree is traversed, using the methods of class State
. Make sure that you understand when to invoke them.
Once your optimizer is complete, you can add it to the back-end of the compiler by modifying the definition of backEnd
in Main.scala
. Here is how it should look when both optimization phases are included:
val backEnd: Tree => TerminalPhaseResult = ( CL3ToCPSTranslator andThen HighCPSOptimizer andThen CPSValueRepresenter andThen CPSHoister andThen FlatCPSOptimizer andThen FlatCPSInterpreter )
Testing and submission
The updated tests now contain a new back-end (#4) that includes the two optimization phases as above. Therefore, running sbt’s test
command as usual should test your optimizer.
In addition to evaluating the correctness of your optimizer using the tests, you should evaluate its completeness by collecting statistics. To that end, we have provided you with the Statistics
class, which can be used to count the number and kinds of nodes that are visited during the evaluation of a program.
To collect statistics during interpretation, you should modify your Main.scala
file to create an instance of Statistics
, pass it to the CPS interpreter, and then print the result at the end, as follows:
val stats = new Statistics() val backEnd: Tree => TerminalPhaseResult = ( CL3ToCPSTranslator andThen HighCPSOptimizer andThen CPSValueRepresenter andThen CPSHoister andThen FlatCPSOptimizer andThen (new FlatCPSInterpreter(stats.log)) ) /* … further down */ match { case Right((retCode, maybeMsg)) => maybeMsg foreach println println(stats) // add this to print statistics sys.exit(retCode) }
To give you an idea of the kind of numbers you should get, here is what we obtain when running the maze
example with a size and a seed of 10, using our version of the L3 compiler:
Nodes ===== 280 541 LetP 269 833 LetC 214 229 If 82 566 AppF 50 327 AppC 1 Halt 1 LetF Value primitives ================ 227 911 block-get 15 241 block-set! 7 655 or 6 761 block-alloc 6 637 shift-right 6 231 xor 2 585 + 2 434 and 1 699 - 1 627 shift-left 1 099 * 474 byte-write 179 % 6 byte-read 2 block-length Test primitives =============== 210 655 = 1 916 <= 1 658 < Functions defined: 31 Continuations defined: 484 245
You should of course try to beat these numbers!
Notice that it is very unlikely that you will get exactly the same numbers as ours, as they also depend on how you implemented values representation. In our version, for example, we often use the exclusive or primitive to untag values, which explains the high number of xor
primitives that are executed above.
Submission is done as usual through the submission server, using the token(s) available on your private page.
Grading
A total of 100 points are assigned to this project part, out of a total of 500 for the whole semester. Your submission will be graded based on:
- its correctness, i.e. whether it passes all the functional tests that were made available to you, and additional ones that we might have,
- its completeness, i.e. whether you implemented all the optimizations listed above as being required,
- its style, which will be worth 5 points for this assignment, so please pay attention to the feedback you received for the previous assignments.