Optimization
Introduction
The goal of this assignment is to implement a series of optimizations for the L3 compiler.
To whet your appetite, once your optimizer is complete, it should be able to optimize the following L3 example program:
(def byte-write (fun (c) (@byte-write c)))
(def compose
(fun (f g)
(fun (x) (f (g x)))))
(def + (fun (x y) (@+ x y)))
(def succ (fun (x) (+ x 1)))
(def twice (fun (x) (+ x x)))
(byte-write ((compose succ twice) 39))
(byte-write ((compose succ succ) 73))
(byte-write ((compose twice succ) 4))
to the following CPS/L3 code:
(let* ((v (@byte-write 79))
(v_1 (@byte-write 75))
(v_2 (@byte-write 10)))
(halt 0))
which, as you can see, is pretty optimal.
Initial setup
To start working on the assignment, you should once again fetch the new code from our git repository, and then rebase your branch on ours. As a reminder, this can be done using the following two commands:
$ git fetch origin $ git rebase origin/master
Overview
After rebasing your branch on ours, you will have access to the following new files:
Statistics.scala, containing a class that can collect statistics during CPS interpretation, which is useful to evaluate your optimizer (see Testing and submission),CPSOptimizer.scala, containing a skeleton of the optimizer.
Your goal is to complete the second of these files, which contains three top-level entities:
CPSOptimizer, an abstract class containing the generic CPS optimization framework that works for all versions of the CPS language,CPSOptimizerHigh, an object inheriting fromCPSOptimizerand specializing it to the high-level version of the CPS language (before values representation),CPSOptimizerLow, an object inheriting fromCPSOptimizerand specializing it to the low-level version of the CPS language (after values representation).
The last one is already complete, and your task consists first in finishing the CPSOptimizer class, and then writing the CPSOptimizerHigh object. You are expected to implement the following optimizations:
- dead code elimination (DCE),
- common subexpression elimination (CSE),
- constant folding,
- simplifications due to neutral/absorbing elements,
- shrinking inlining,
- general inlining, using the heuristic described below,
- additional optimizations if you wish, e.g. those related to blocks.
Notice that we do not ask you to implement η-reduction, or contification, the latter being very tricky to get right and typically done in a separate phase of its own.
In CPSOptimizer, optimizations are performed by two separate functions: shrink, which does all the shrinking optimizations, and inline, which does general (non-shrinking) inlining. Remember that shrinking optimizations can be applied at will, while general inlining cannot as it can make the code grow arbitrarily.
Optimization is performed by combining the shrink and inline functions as follows:
- first, the incoming tree is shrunk,
- then, a maximum size for the optimized program is computed, currently as 3/2 the size of the shrunk tree,
- then, inlining and shrinking are performed iteratively until either:
- a fixed point is reached, or
- a maximum number of iterations is reached, or
- the program grows above the maximum size.
At each iteration of the inlining/shrinking step, all functions or continuations whose size is below a limit are inlined systematically. That maximum size grows linearly with the iteration count for continuations, and according to the Fibonacci sequence for functions.
In other words, during the first inlining round, all functions and continuations of size 1 are inlined systematically. During the second round, all those of size 2 or less are inlined systematically. And so on until inlining stops for one of the reasons described above.
Both shrink and inline use a value of type State to track information about the program and guide optimizations. This state must be updated as the tree is traversed, using the methods of class State. Make sure that you understand when to invoke them.
Once your optimizer is complete, you can add it to the back-end of the compiler by modifying the definition of backEnd in Main.scala. Here is how it should look when both optimization phases are included:
val backEnd: Tree => TerminalPhaseResult = (
CL3ToCPSTranslator
andThen CPSOptimizerHigh
andThen CPSValueRepresenter
andThen CPSHoister
andThen CPSOptimizerLow
andThen CPSInterpreterLow
)
Testing and submission
The updated tests now contain a new back-end (#4) that includes the two optimization phases as above. Therefore, running sbt's test command as usual should test your optimizer.
In addition to evaluating the correctness of your optimizer using the tests, you should evaluate its completeness by collecting statistics. To that end, we have provided you with the Statistics class, which can be used to count the number and kinds of nodes that are visited during the evaluation of a program.
To collect statistics during interpretation, you should modify your Main.scala file to create an instance of Statistics, pass it to the CPS interpreter, and then print the result at the end, as follows:
val stats = new Statistics()
val backEnd: Tree => TerminalPhaseResult = (
CL3ToCPSTranslator
andThen CPSOptimizerHigh
andThen CPSValueRepresenter
andThen CPSHoister
andThen CPSOptimizerLow
andThen (new CPSInterpreterLow(stats.log _))
)
/* … further down */ match {
case Right((retCode, maybeMsg)) =>
maybeMsg foreach println
println(stats) // add this to print statistics
sys.exit(retCode)
}
To give you an idea of the kind of numbers you should get, here is what we obtain when running the maze example with a size and a seed of 10, using our version of the L3 compiler:
Nodes
=====
590 457 LetP
484 233 LetC
202 735 If
79 710 AppF
42 751 AppC
1 Halt
1 LetF
Value primitives
================
212 144 CPSBlockGet
118 393 CPSXOr
110 053 CPSShiftLeft
108 416 CPSBlockTag
14 924 CPSBlockSet
6 761 CPSBlockAlloc
6 631 CPSShiftRight
4 756 CPSOr
2 476 CPSAdd
2 434 CPSAnd
1 699 CPSSub
1 099 CPSMul
474 CPSByteWrite
179 CPSMod
12 CPSBlockLength
6 CPSByteRead
Test primitives
===============
199 159 CPSEq
1 916 CPSLe
1 660 CPSLt
Functions defined: 32
Continuations defined: 484 233
You should of course try to beat these numbers!
Notice that it is very unlikely that you will get exactly the same numbers as ours, as they also depend on how you implemented values representation. In our version, for example, we systematically use the exclusive or primitive to (un)tag values, which explains the high number of CPSXOr primitives that are executed above.
Submission is done as usual through the submission server, using the token(s) available on your private page.
Grading
A total of 100 points are assigned to this project part, out of a total of 500 for the whole semester. Your submission will be graded based on:
- its correctness, i.e. whether it passes all the functional tests that were made available to you, and additional ones that we might have,
- its completeness, i.e. whether you implemented all the optimizations listed above as being required,
- its style, which will be worth 5 points for this assignment, so please pay attention to the feedback you received for the previous assignments.