Optimization

Introduction

The goal of this assignment is to implement a series of optimizations for the L3 compiler.

To whet your appetite, once your optimizer is complete, it should be able to optimize the following L3 example program:

(def byte-write (fun (c) (@byte-write c)))
(def compose
     (fun (f g)
          (fun (x) (f (g x)))))
(def + (fun (x y) (@+ x y)))
(def succ (fun (x) (+ x 1)))
(def twice (fun (x) (+ x x)))
(byte-write ((compose succ twice) 39))
(byte-write ((compose succ succ) 73))
(byte-write ((compose twice succ) 4))

to the following CPS/L3 code:

(let* ((v_2 (@byte-write 79))
       (v_3 (@byte-write 75))
       (v_4 (@byte-write 10)))
  (halt 0))

which, as you can see, is pretty optimal.

Initial setup

To start working on this assignment, you should download the Zip archive containing our skeleton.

Overview

After downloading our skeleton and extracting it into your project, you will have access to the following new files:

  • Statistics.scala, containing a class that can collect statistics during CPS interpretation, which is useful to evaluate your optimizer (see Testing and submission),
  • CPSOptimizer.scala, containing a skeleton of the optimizer.

Your goal is to complete the second of these files, which contains three top-level entities:

  1. CPSOptimizer, an abstract class containing the generic CPS optimization framework that works for all versions of the CPS language,
  2. HighCPSOptimizer, an object inheriting from CPSOptimizer and specializing it to the high-level version of the CPS language (before values representation),
  3. FlatCPSOptimizer, an object inheriting from CPSOptimizer and specializing it to the flat, low-level version of the CPS language (after values representation and hoisting).

The last one is already complete, and your task consists first in finishing the CPSOptimizer class, and then writing the HighCPSOptimizer object. You are expected to implement the following optimizations:

  • dead code elimination (DCE),
  • common subexpression elimination (CSE),
  • constant folding,
  • simplifications due to neutral/absorbing elements,
  • shrinking inlining,
  • general inlining, using the heuristic described below,
  • additional optimizations if you wish, e.g. those related to blocks.

Notice that we do not ask you to implement η-reduction, or contification, the latter being very tricky to get right and typically done in a separate phase of its own.

In CPSOptimizer, optimizations are performed by two separate functions: shrink, which does all the shrinking optimizations, and inline, which does general (non-shrinking) inlining. Remember that shrinking optimizations can be applied at will, while general inlining cannot as it can make the code grow arbitrarily.

Optimization is performed by combining the shrink and inline functions as follows:

  • first, the incoming tree is shrunk,
  • then, a maximum size for the optimized program is computed, currently as 3/2 the size of the shrunk tree,
  • then, inlining and shrinking are performed iteratively until either:
    • a fixed point is reached, or
    • a maximum number of iterations is reached, or
    • the program grows above the maximum size.

At each iteration of the inlining/shrinking step, all functions or continuations whose size is below a limit are inlined systematically. That maximum size grows linearly with the iteration count for continuations, and according to the Fibonacci sequence for functions.

In other words, during the first inlining round, all functions and continuations of size 1 are inlined systematically. During the second round, all those of size 2 or less are inlined systematically. And so on until inlining stops for one of the reasons described above.

Both shrink and inline use a value of type State to track information about the program and guide optimizations. This state must be updated as the tree is traversed, using the methods of class State. Make sure that you understand when to invoke them.

Once your optimizer is complete, you can add it to the back-end of the compiler by modifying the definition of backEnd in Main.scala. Here is how it should look when both optimization phases are included:

val backEnd: Tree => TerminalPhaseResult = (
  CL3ToCPSTranslator
    andThen HighCPSOptimizer
    andThen CPSValueRepresenter
    andThen CPSHoister
    andThen FlatCPSOptimizer
    andThen FlatCPSInterpreter
)

Testing and submission

The updated tests now contain a new back-end (#4) that includes the two optimization phases as above. Therefore, running sbt’s test command as usual should test your optimizer.

In addition to evaluating the correctness of your optimizer using the tests, you should evaluate its completeness by collecting statistics. To that end, we have provided you with the Statistics class, which can be used to count the number and kinds of nodes that are visited during the evaluation of a program.

To collect statistics during interpretation, you should modify your Main.scala file to create an instance of Statistics, pass it to the CPS interpreter, and then print the result at the end, as follows:

val stats = new Statistics()
val backEnd: Tree => TerminalPhaseResult = (
  CL3ToCPSTranslator
    andThen HighCPSOptimizer
    andThen CPSValueRepresenter
    andThen CPSHoister
    andThen FlatCPSOptimizer
    andThen (new FlatCPSInterpreter(stats.log))
)

/* … further down */ match {
  case Right((retCode, maybeMsg)) =>
    maybeMsg foreach println
    println(stats) // add this to print statistics
    sys.exit(retCode)
}

To give you an idea of the kind of numbers you should get, here is what we obtain when running the maze example with a size and a seed of 10, using our version of the L3 compiler:

Nodes
=====
  280 541  LetP
  269 833  LetC
  214 229  If
   82 566  AppF
   50 327  AppC
        1  Halt
        1  LetF

Value primitives
================
  227 911  block-get
   15 241  block-set!
    7 655  or
    6 761  block-alloc
    6 637  shift-right
    6 231  xor
    2 585  +
    2 434  and
    1 699  -
    1 627  shift-left
    1 099  *
      474  byte-write
      179  %
        6  byte-read
        2  block-length

Test primitives
===============
  210 655  =
    1 916  <=
    1 658  <

    Functions defined:      31
Continuations defined: 484 245

You should of course try to beat these numbers!

Notice that it is very unlikely that you will get exactly the same numbers as ours, as they also depend on how you implemented values representation. In our version, for example, we often use the exclusive or primitive to untag values, which explains the high number of xor primitives that are executed above.

Submission is done as usual through the submission server, using the token(s) available on your private page.

Grading

A total of 100 points are assigned to this project part, out of a total of 500 for the whole semester. Your submission will be graded based on:

  1. its correctness, i.e. whether it passes all the functional tests that were made available to you, and additional ones that we might have,
  2. its completeness, i.e. whether you implemented all the optimizations listed above as being required,
  3. its style, which will be worth 5 points for this assignment, so please pay attention to the feedback you received for the previous assignments.