|
|
MLRISC Intermediate Representation
|
MLRISC Intermediate Representation
| |
Examples
|
The MLRISC intermediate language is called
MLTREE At the lowest level, the core of MLTREE is a
Register Transfer Language (RTL)
but represented in tree form. The tree
form makes it convenient to use tree pattern matching tools like
BURG (where appropriate) to do target instruction selection. Thus a
tree such as:
MV(32, t,
ADDT(32, MULT(32, REG(32, b), REG(32, b)),
MULT(32, MULT(REG(32, a), LI(4)), REG(32, c))))
computes t := b*b + 4*a*c to 32-bit precision.
The nodes ADDT and
MULT are the trapping form of addition and multiplication,
and LI is used for integer constants. An infinite number
of registers are assumed by the model, however depending on the
target machine the first 0..K registers map onto the first
K registers on the target machine. Everything else is
assumed to be a pseudo-register. The REG node is used to
indicate a general purpose register.
The core MLTREE language makes no assumptions about instructions or
calling convections of the target architecture. Trees can be
created and combined in almost any form, with certain meaningless
trees such as LOAD(32, FLOAD(64, LI 0)) being forbidden by the
MLTREE type structure.
Such pure trees are nice but inadequate in real compilers. One
needs to be able to propagate front end specific information, such
as frame sizes and frame offsets where the actual values are only
available after register allocation and spilling. One could add
support for frames in MLRISC, however this becomes a slippery slope
because some compilers (e.g. SML/NJ) do not have a conventional
notion of frames --- indeed there is no runtime stack in the
execution of SML/NJ. A frame organization for one person may not
meet the needs for another, and so on. In MLRISC, the special
requirements of different compilers is communicated into the MLTREE
language, and subsequently into the optimizations phases, by
specializing the MLTREE data structure with client specific
information. There are currently five dimensions over
which one could specialize the MLTREE language.
- Constants
- Constants are an
abstraction for integer literals whose value is known after
certain phases of code generation. Frame sizes and offsets are an
example.
- Regions
- While the data
dependencies between arithmetic operations is implicit in the
instruction, the data dependencies between memory operations is
not. Regions are an abstract view of memory that make this
dependence explicit and is specially useful for instruction
reordering.
- Pseudo-ops
- Pseudo-ops are
intended to correspond to pseudo-op directives provided by native
assemblers to lay out data, jump tables, and perform alignment.
- Annotations
-
Annotations are used
for injecting semantics and other program information from the front-end
into the backend. For example, a probability annotation can be
attached to a branch instruction. Similarly, line number annotations
can be attached to basic blocks to aid debugging.
In many language implementations function local variables are
spilled to activation frames on the stack. Spill slots contribute
to the size of a function's frame. When an instruction produces a
spill, we may need to update the frame associated to that
instruction (increase the size of its spilling area). The frame
for the current function can be injected in an annotation, which
can be later examined by the spill callback during register allocation.
Annotations are
implemented as an universal type and can be arbitrarily extended.
Individual annotations can be associated
with compiler objects of varying granularity,
from compilation units, to regions, basic blocks, flow edges,
and down to the instructions.
- User Defined Extensions
-
In the most extreme case, the basic constructors defined in the MLTREE
language may be inadequate for the task at hand.
MLTREE allows the client to arbitrarily extend
the set of statements and expressions to more closely match the
source language and the target architecture(s).
For example, when using MLRISC for the backend of a DSP compiler
it may be useful to extend the set of MLRISC operators to include
fix point and saturated arithmetic.
Similarly, when developing a language for loop parallelization, it may
be useful to extend the MLTREE language with higher-level loop
constructs.
Examples
In the SML/NJ compiler, an encoding of a list of registers
is passed to the garbage collector as the roots of live
variables. This encoding cannot be computed until register
allocation has been performed, therefore the integer literal
encoding is represented as an abstract
constant.
Again, in the SML/NJ compiler, most stores are for initializing
records in the allocation space, therefore representing every slot in
the allocation space as a unique region allows one to commute
most store instructions. Similarly, most loads are from
immutable records, and a simple analysis marks these are
being accesses to read-only memory. Read-only memory is
characterized as having multiple uses but no
definitions.
In the TIL compiler, a trace table is generated for
every call site that records the set of live variables, their
location (register or stack offset), and the type associated with
the variable. This table is integrated into the program using the
abstract pseudo-op mechanism. An interesting aspect of these tables
is that they may need adjustment based on the results of register
spilling.
The more convention use of the psuedo-op abstraction is to
propagate function prologue and epilogue information.
The constants abstraction are created by a tree node called
CONST. In the SML/NJ compiler, the tree that communicates
garbage collection information looks like:
MV(32, maskReg, CONST{r110,r200,r300,r400 ...})
where maskReg is a dedicated register. On the DEC Alpha,
this would get translated to:
LDA maskReg, {encode(r110,r200,r300,r400, ...)}
which indicates that the alpha instruction set (and optimization
suite) know about these types of values. Further, after
register allocation, the LDA instruction may not be
sufficient as the encoding may result in a value that is too large
as an operand to LDA. Two instructions may ultimately be
required to load the encoding into the maskReg
register. This expansion is done during
span-dependency resolution.
All these examples are intended to indicate that one
intermediate representation and optimization suite does not fit
all, but that the intermediate representation and optimization
suite needs to be specialized to the needs of the client.
|
|
Generated by
mltex2html
|
Last modified: Mon Jun 8 14:18:05 UTC 2009 by buildd@vernadsky
|
|