Optimization

Introduction

The goal of this assignment is to implement a series of optimizations for the L₃ compiler.

To whet your appetite, once your optimizer is complete, it should be able to optimize the following L₃ example program:

(def byte-write (fun (c) (@byte-write c)))
(def compose
     (fun (f g)
	  (fun (x) (f (g x)))))
(def + (fun (x y) (@+ x y)))
(def succ (fun (x) (+ x 1)))
(def twice (fun (x) (+ x x)))
(byte-write ((compose succ twice) 39))
(byte-write ((compose succ succ) 73))
(byte-write ((compose twice succ) 4))

to the following CPS/L₃ code:

(let* ((v (@byte-write 79))
       (v_1 (@byte-write 75))
       (v_2 (@byte-write 10)))
  (halt 0))

which, as you can see, is pretty optimal.

Initial setup

To start working on the assignment, you should once again fetch the new code from our git repository, and then rebase your branch on ours. As a reminder, this can be done using the following two commands:

$ git fetch origin
$ git rebase origin/master

Overview

After rebasing your branch on ours, you will have access to the following new files:

Statistics.scala, containing a class that can collect statistics during CPS interpretation, which is useful to evaluate your optimizer (see Testing and submission),
CPSOptimizer.scala, containing a skeleton of the optimizer.

Your goal is to complete the second of these files, which contains three top-level entities:

CPSOptimizer, an abstract class containing the generic CPS optimization framework that works for all versions of the CPS language,
CPSOptimizerHigh, an object inheriting from CPSOptimizer and specializing it to the high-level version of the CPS language (before values representation),
CPSOptimizerLow, an object inheriting from CPSOptimizer and specializing it to the low-level version of the CPS language (after values representation).

The last one is already complete, and your task consists first in finishing the CPSOptimizer class, and then writing the CPSOptimizerHigh object. You are expected to implement the following optimizations:

dead code elimination (DCE),
common subexpression elimination (CSE),
constant folding,
simplifications due to neutral/absorbing elements,
shrinking inlining,
general inlining, using the heuristic described below,
additional optimizations if you wish, e.g. those related to blocks.

Notice that we do not ask you to implement η-reduction, or contification, the latter being very tricky to get right and typically done in a separate phase of its own.

In CPSOptimizer, optimizations are performed by two separate functions: shrink, which does all the shrinking optimizations, and inline, which does general (non-shrinking) inlining. Remember that shrinking optimizations can be applied at will, while general inlining cannot as it can make the code grow arbitrarily.

Optimization is performed by combining the shrink and inline functions as follows:

first, the incoming tree is shrunk,
then, a maximum size for the optimized program is computed, currently as 3/2 the size of the shrunk tree,
then, inlining and shrinking are performed iteratively until either:
- a fixed point is reached, or
- a maximum number of iterations is reached, or
- the program grows above the maximum size.

At each iteration of the inlining/shrinking step, all functions or continuations whose size is below a limit are inlined systematically. That maximum size grows linearly with the iteration count for continuations, and according to the Fibonacci sequence for functions.

In other words, during the first inlining round, all functions and continuations of size 1 are inlined systematically. During the second round, all those of size 2 or less are inlined systematically. And so on until inlining stops for one of the reasons described above.

Both shrink and inline use a value of type State to track information about the program and guide optimizations. This state must be updated as the tree is traversed, using the methods of class State. Make sure that you understand when to invoke them.

Once your optimizer is complete, you can add it to the back-end of the compiler by modifying the definition of backEnd in Main.scala. Here is how it should look when both optimization phases are included:

val backEnd: Tree => TerminalPhaseResult = (
  CL3ToCPSTranslator
    andThen CPSOptimizerHigh
    andThen CPSValueRepresenter
    andThen CPSHoister
    andThen CPSOptimizerLow
    andThen CPSInterpreterLow
)

Testing and submission

The updated tests now contain a new back-end (#4) that includes the two optimization phases as above. Therefore, running sbt's test command as usual should test your optimizer.

In addition to evaluating the correctness of your optimizer using the tests, you should evaluate its completeness by collecting statistics. To that end, we have provided you with the Statistics class, which can be used to count the number and kinds of nodes that are visited during the evaluation of a program.

To collect statistics during interpretation, you should modify your Main.scala file to create an instance of Statistics, pass it to the CPS interpreter, and then print the result at the end, as follows:

val stats = new Statistics()
val backEnd: Tree => TerminalPhaseResult = (
  CL3ToCPSTranslator
    andThen CPSOptimizerHigh
    andThen CPSValueRepresenter
    andThen CPSHoister
    andThen CPSOptimizerLow
    andThen (new CPSInterpreterLow(stats.log _))
)

/* … further down */ match {
  case Right((retCode, maybeMsg)) =>
    maybeMsg foreach println
    println(stats) // add this to print statistics
    sys.exit(retCode)
}

To give you an idea of the kind of numbers you should get, here is what we obtain when running the maze example with a size and a seed of 10, using our version of the L₃ compiler:

Nodes
=====
  590 457  LetP
  484 233  LetC
  202 735  If
   79 710  AppF
   42 751  AppC
        1  Halt
        1  LetF

Value primitives
================
  212 144  CPSBlockGet
  118 393  CPSXOr
  110 053  CPSShiftLeft
  108 416  CPSBlockTag
   14 924  CPSBlockSet
    6 761  CPSBlockAlloc
    6 631  CPSShiftRight
    4 756  CPSOr
    2 476  CPSAdd
    2 434  CPSAnd
    1 699  CPSSub
    1 099  CPSMul
      474  CPSByteWrite
      179  CPSMod
       12  CPSBlockLength
        6  CPSByteRead

Test primitives
===============
  199 159  CPSEq
    1 916  CPSLe
    1 660  CPSLt

    Functions defined:      32
Continuations defined: 484 233

You should of course try to beat these numbers!

Notice that it is very unlikely that you will get exactly the same numbers as ours, as they also depend on how you implemented values representation. In our version, for example, we systematically use the exclusive or primitive to (un)tag values, which explains the high number of CPSXOr primitives that are executed above.

Submission is done as usual through the submission server, using the token(s) available on your private page.

Grading

A total of 100 points are assigned to this project part, out of a total of 500 for the whole semester. Your submission will be graded based on:

its correctness, i.e. whether it passes all the functional tests that were made available to you, and additional ones that we might have,
its completeness, i.e. whether you implemented all the optimizations listed above as being required,
its style, which will be worth 5 points for this assignment, so please pay attention to the feedback you received for the previous assignments.