Optimization
Introduction
The goal of this assignment is to implement a series of optimizations for the L3 compiler.
To whet your appetite, once your optimizer is complete, it should be able to optimize the following L3 example program:
(def byte-write (fun (c) (@byte-write c))) (def compose (fun (f g) (fun (x) (f (g x))))) (def + (fun (x y) (@+ x y))) (def succ (fun (x) (+ x 1))) (def twice (fun (x) (+ x x))) (byte-write ((compose succ twice) 39)) (byte-write ((compose succ succ) 73)) (byte-write ((compose twice succ) 4))
to the following CPS/L3 code:
(let* ((v (@byte-write 79)) (v_1 (@byte-write 75)) (v_2 (@byte-write 10))) (halt 0))
which, as you can see, is pretty optimal.
Initial setup
To start working on the assignment, you should once again fetch the new code from our git repository, and then rebase your branch on ours. As a reminder, this can be done using the following two commands:
$ git fetch origin $ git rebase origin/master
Overview
After rebasing your branch on ours, you will have access to the following new files:
Statistics.scala
, containing a class that can collect statistics during CPS interpretation, which is useful to evaluate your optimizer (see Testing and submission),CPSOptimizer.scala
, containing a skeleton of the optimizer.
Your goal is to complete the second of these files, which contains three top-level entities:
CPSOptimizer
, an abstract class containing the generic CPS optimization framework that works for all versions of the CPS language,CPSOptimizerHigh
, an object inheriting fromCPSOptimizer
and specializing it to the high-level version of the CPS language (before values representation),CPSOptimizerLow
, an object inheriting fromCPSOptimizer
and specializing it to the low-level version of the CPS language (after values representation).
The last one is already complete, and your task consists first in finishing the CPSOptimizer
class, and then writing the CPSOptimizerHigh
object. You are expected to implement the following optimizations:
- dead code elimination (DCE),
- common subexpression elimination (CSE),
- constant folding,
- simplifications due to neutral/absorbing elements,
- shrinking inlining,
- general inlining, using the heuristic described below,
- additional optimizations if you wish, e.g. those related to blocks.
Notice that we do not ask you to implement η-reduction, or contification, the latter being very tricky to get right and typically done in a separate phase of its own.
In CPSOptimizer
, optimizations are performed by two separate functions: shrink
, which does all the shrinking optimizations, and inline
, which does general (non-shrinking) inlining. Remember that shrinking optimizations can be applied at will, while general inlining cannot as it can make the code grow arbitrarily.
Optimization is performed by combining the shrink
and inline
functions as follows:
- first, the incoming tree is shrunk,
- then, a maximum size for the optimized program is computed, currently as 3/2 the size of the shrunk tree,
- then, inlining and shrinking are performed iteratively until either:
- a fixed point is reached, or
- a maximum number of iterations is reached, or
- the program grows above the maximum size.
At each iteration of the inlining/shrinking step, all functions or continuations whose size is below a limit are inlined systematically. That maximum size grows linearly with the iteration count for continuations, and according to the Fibonacci sequence for functions.
In other words, during the first inlining round, all functions and continuations of size 1 are inlined systematically. During the second round, all those of size 2 or less are inlined systematically. And so on until inlining stops for one of the reasons described above.
Both shrink
and inline
use a value of type State
to track information about the program and guide optimizations. This state must be updated as the tree is traversed, using the methods of class State
. Make sure that you understand when to invoke them.
Once your optimizer is complete, you can add it to the back-end of the compiler by modifying the definition of backEnd
in Main.scala
. Here is how it should look when both optimization phases are included:
val backEnd: Tree => TerminalPhaseResult = ( CL3ToCPSTranslator andThen CPSOptimizerHigh andThen CPSValueRepresenter andThen CPSHoister andThen CPSOptimizerLow andThen CPSInterpreterLow )
Testing and submission
The updated tests now contain a new back-end (#4) that includes the two optimization phases as above. Therefore, running sbt's test
command as usual should test your optimizer.
In addition to evaluating the correctness of your optimizer using the tests, you should evaluate its completeness by collecting statistics. To that end, we have provided you with the Statistics
class, which can be used to count the number and kinds of nodes that are visited during the evaluation of a program.
To collect statistics during interpretation, you should modify your Main.scala
file to create an instance of Statistics
, pass it to the CPS interpreter, and then print the result at the end, as follows:
val stats = new Statistics() val backEnd: Tree => TerminalPhaseResult = ( CL3ToCPSTranslator andThen CPSOptimizerHigh andThen CPSValueRepresenter andThen CPSHoister andThen CPSOptimizerLow andThen (new CPSInterpreterLow(stats.log _)) ) /* … further down */ match { case Right((retCode, maybeMsg)) => maybeMsg foreach println println(stats) // add this to print statistics sys.exit(retCode) }
To give you an idea of the kind of numbers you should get, here is what we obtain when running the maze
example with a size and a seed of 10, using our version of the L3 compiler:
Nodes ===== 590 457 LetP 484 233 LetC 202 735 If 79 710 AppF 42 751 AppC 1 Halt 1 LetF Value primitives ================ 212 144 CPSBlockGet 118 393 CPSXOr 110 053 CPSShiftLeft 108 416 CPSBlockTag 14 924 CPSBlockSet 6 761 CPSBlockAlloc 6 631 CPSShiftRight 4 756 CPSOr 2 476 CPSAdd 2 434 CPSAnd 1 699 CPSSub 1 099 CPSMul 474 CPSByteWrite 179 CPSMod 12 CPSBlockLength 6 CPSByteRead Test primitives =============== 199 159 CPSEq 1 916 CPSLe 1 660 CPSLt Functions defined: 32 Continuations defined: 484 233
You should of course try to beat these numbers!
Notice that it is very unlikely that you will get exactly the same numbers as ours, as they also depend on how you implemented values representation. In our version, for example, we systematically use the exclusive or primitive to (un)tag values, which explains the high number of CPSXOr
primitives that are executed above.
Submission is done as usual through the submission server, using the token(s) available on your private page.
Grading
A total of 100 points are assigned to this project part, out of a total of 500 for the whole semester. Your submission will be graded based on:
- its correctness, i.e. whether it passes all the functional tests that were made available to you, and additional ones that we might have,
- its completeness, i.e. whether you implemented all the optimizations listed above as being required,
- its style, which will be worth 5 points for this assignment, so please pay attention to the feedback you received for the previous assignments.