|
|
The MLRISC IR
Introduction
In this section we will describe the MLRISC intermediate representation.
Control Flow Graph
The control flow graph is the main view of the IR.
A control flow graph satisfies the following signature:
signature CONTROL_FLOW_GRAPH = sig
structure I : INSTRUCTIONS
structure P : PSEUDO_OPS
structure C : CELLS
structure W : FIXED_POINT
sharing I.C = C
definitions
end
The following structures nested within a CFG:
- I : INSTRUCTIONS is the instruction structure.
- P : PSEUDO_OPS is the structure with the definition
of pseudo ops.
- C : CELLS is the cells structure describing the
register conventions of the architecture.
- W : FIXED_POINT is a structure that contains
a fixed point type used in execution frequency annotations.
The type weight below is used in execution frequency annotations:
type weight = W.fixed_point
There are a few different kinds of basic blocks, described
by the type block_kind below:
datatype block_kind =
START
| STOP
| FUNCTION_ENTRY
| NORMAL
| HYPERBLOCK
A basic block is defined as the datatype block, defined below:
and data = LABEL of Label.label
| PSEUDO of P.pseudo_op
and block =
BLOCK of
{ id : int,
kind : block_kind,
name : B.name,
freq : weight ref,
data : data list ref,
labels : Label.label list ref,
insns : I.instruction list ref,
annotations : Annotations.annotations ref
}
Edges in a CFG are annotated with the type edge_info,
defined below:
and edge_kind = ENTRY
| EXIT
| JUMP
| FALLSTHRU
| BRANCH of bool
| SWITCH of int
| SIDEEXIT of int
and edge_info =
EDGE of { k : edge_kind,
w : weight ref,
a : Annotations.annotations ref
}
Type cfg below defines a control flow graph:
type edge = edge_info edge
type node = block node
datatype info =
INFO of { regmap : C.regmap,
annotations : Annotations.annotations ref,
firstBlock : int ref,
reorder : bool ref
}
type cfg = (block,edge_info,info) graph
Low-level Interface
The following subsection describes the low-level interface to a CFG.
These functions should be used with care since they do not
always maintain high-level structural invariants imposed on
the representation. In general, higher level interfaces exist
so knowledge of this interface is usually not necessary for customizing
MLRISC.
Various kinds of annotations on basic blocks are defined below:
exception LIVEOUT of C.cellset
exception CHANGED of unit -> unit
exception CHANGEDONCE of unit -> unit
The annotation LIVEOUT is used record live-out information
on an escaping block.
The annotations CHANGED and CHANGEDONCE are used
internally for maintaining views on a CFG. These should not be used
directly.
The following are low-level functions for building new basic blocks.
The functions newXXX build empty basic blocks of a specific
type. The function defineLabel returns a label to a basic block;
and if one does not exist then a new label will be generated automatically.
The functions emit and show_block are low-level
routines for displaying a basic block.
val newBlock : int * B.name -> block
val newStart : int -> block
val newStop : int -> block
val newFunctionEntry : int -> block
val copyBlock : int * block -> block
val defineLabel : block -> Label.label
val emit : C.regmap -> block -> unit
val show_block : C.regmap -> block -> string
Methods for building a CFG are listed as follows:
val cfg : info -> cfg
val new : C.regmap -> cfg
val subgraph : cfg -> cfg
val init : cfg -> unit
val changed : cfg -> unit
val removeEdge : cfg -> edge -> unit
Again, these methods should be used only with care.
The following functions allow the user to extract low-level information
from a flowgraph. Function regmap returns the current register map.
Function regmap returns a function that lookups the current register
map. Function liveOut returns liveOut information from a block;
it returns the empty cellset if the block is not an escaping block.
Function fallsThruFrom takes a node id and locates the
block (if any) that flows into without going through a branch
instruction. Similarly, the function fallsThruTo takes
a node id and locates the block (if any) that flows into
with going through a branch instruction. If falls through to
in any feasible code layout must preceed .
val regmap : cfg -> C.regmap
val reglookup : cfg -> C.register -> C.register
val liveOut : block -> C.cellset
val fallsThruFrom : cfg * node_id -> node_id option
val fallsThruTo : cfg * node_id -> node_id option
To support graph viewing of a CFG, the following low-level
primitives are provided:
val viewStyle : cfg -> (block,edge_info,info) GraphLayout.style
val viewLayout : cfg -> GraphLayout.layout
val headerText : block -> string
val footerText : block -> string
val subgraphLayout : { cfg : cfg, subgraph : cfg } -> GraphLayout.layout
Finally, a miscellany function for control dependence graph building.
val cdgEdge : edge_info -> bool
IR
The MLRISC intermediate representation is a composite
view of various compiler data structures, including the control
flow graph, (post-)dominator trees, control dependence graph, and
loop nesting tree. Basic compiler optimizations in MLRISC
operate on this data structure; advance optimizations
operate on more complex representations which use this
representation as the base layer.
This IR provides a few additional functionalities:
- Edge frequencies -- execution frequencies
are maintained on all control flow edges.
- Extensible annotations -- semantics information can be
represented as annotations on the graph.
- Multiple facets --
Facets are high-level views that automatically keep themselves
up-to-date. Computed facets are cached and out-of-date facets
are recomputed by demand.
The IR defines a mechanism to attach multiple facets to the IR.
The signature of the IR is listed below
signature MLRISC_IR = sig
structure I : INSTRUCTIONS
structure CFG : CONTROL_FLOW_GRAPH
structure Dom : DOMINATOR_TREE
structure CDG : CONTROL_DEPENDENCE_GRAPH
structure Loop : LOOP_STRUCTURE
structure Util : CFG_UTIL
sharing Util.CFG = CFG
sharing CFG.I = I
sharing Loop.Dom = CDG.Dom = Dom
type cfg = CFG.cfg
type IR = CFG.cfg
type dom = (CFG.block,CFG.edge_info,CFG.info) Dom.dominator_tree
type pdom = (CFG.block,CFG.edge_info,CFG.info) Dom.postdominator_tree
type cdg = (CFG.block,CFG.edge_info,CFG.info) CDG.cdg
type loop = (CFG.block,CFG.edge_info,CFG.info) Loop.loop_structure
val dom : IR -> dom
val pdom : IR -> pdom
val cdg : IR -> cdg
val loop : IR -> loop
val changed : IR -> unit
val memo : (IR -> 'facet) -> IR -> 'facet
val addLayout : string -> (IR -> GraphLayout.layout) -> unit
val view : string -> IR -> unit
val views : string list -> IR -> unit
val viewSubgraph : IR -> cfg -> unit
end
The following facets are predefined: dominator, post-dominator tree,
control dependence graph and loop nesting structure.
The functions dom, pdom, cdg, loop
are facet extraction methods that
compute up-to-date views of these facets.
The following protocol is used for facets:
- When the IR is changed,
the function changed should be called to
signal that all facets attached to the IR should be updated.
- To add a new facet of type F that is computed by demand,
the programmer has to provide a facet construction
function f : IR -> F. Call the function mem
to register the new facet. For example, let val g = memo f.
Then the function g can be used to as a new facet extraction
function for facet F.
- To register a graph viewing function, call
the function addLayout and provide an appropriate
graph layout function. For example, we can say
addLayout "F" layoutF to register a graph layout function
for a facet called ``F''.
To view an IR, the functions view, views or
viewSubgraph can be used. They have the following interpretation:
- view computes a layout for one facet of the IR and displays
it. The predefined facets are called
``dom'', ``pdom'', ``cdg'', ``loop.'' The IR can be
viewed as the facet ``cfg.'' In addition, there is a layout
named ``doms'' which displays the dominator tree and the post-dominator
tree together, with the post-dominator inverted.
- views computes a set of facets and displays it together
in one single picture.
- viewSubgraph layouts a subgraph of the IR.
This creates a picture with the subgraph highlighted and embedded
in the whole IR.
Building a CFG
There are two basic methods of building a CFG:
- convert a sequence of machine instructions
into a CFG through the emitter interface, described below, and
- convert it from a cluster, which is
the basic linearized representation used in the MLRISC system.
The first method requires you to perform instruction selection
from a compiler front-end, but allows you to bypass all other
MLRISC phases if desired. The second method allows you
to take advantage of various MLRISC's instruction selection modules
currently available. We describe these methods in this section.
Directly from Instructions
Signature CODE_EMITTER below describes an abstract emitter interface
for accepting a linear stream of instructions from a source
and perform a sequence of actions based on this
stream\footnote{Unlike the signature { EMITTER_NEW} or
{ FLOWGRAPH_GEN}, it has the advantage that it is not
tied into any form of specific flowgraph representation.}.
signature CODE_EMITTER = sig
structure I : INSTRUCTIONS
structure C : CELLS
structure P : PSEUDO_OPS
sharing I.C = C
type emitter =
{ defineLabel : Label.label -> unit,
entryLabel : Label.label -> unit,
exitBlock : C.cellset -> unit,
pseudoOp : P.pseudo_op -> unit,
emitInstr : I.instruction -> unit,
comment : string -> unit,
init : int -> unit,
finish : unit -> unit
}
end
The code emitter interface has the following informal protocol.
-
init()
- Initializes the emitter and signals that
the back-end should
allocate space for bytes of machine code.
The number is ignored for non-machine code back-ends.
-
defineLabel()
- Defines a new label at the current position.
-
entryLabel()
- Defines a new entry label at the current position.
An entry label defines an entry point into the current flow graph.
Note that multiple entry points are allowed
-
exitBlock()
- Defines an exit at the current position.
The cellset represents the live-out information
-
pseudOp()
- Emits an pseudo op at the current position
-
emitInstr()
- Emits an instruction at the current position
-
blockName()
- Changes the block name to
-
comment()
- Emits a comment at the current position
-
finish
- Signals that the use of the emitter is finished.
The emitter is free to perform its post-processing functions.
When this is finished the CFG is built.
The functor ControlFlowGraphGen below can be
used to create a CFG builder that uses the CODE_EMITTER interface.
signature CONTROL_FLOW_GRAPH_GEN = sig
structure CFG : CONTROL_FLOW_GRAPH
structure Emitter : CODE_EMITTER
sharing Emitter.I = CFG.I
sharing Emitter.P = CFG.P
val emitter : CFG.cfg -> Emitter.emitter
end
functor ControlFlowGraphGen
(structure CFG : CONTROL_FLOW_GRAPH
structure Emitter : CODE_EMITTER
structure P : INSN_PROPERTIES
sharing CFG.I = Emitter.I = P.I
sharing CFG.P = Emitter.P
sharing CFG.B = Emitter.B
) : CONTROL_FLOW_GRAPH_GEN
Cluster to CFG
The core MLRISC system implements many instruction selection
front-ends. The result of an instruction selection module is a linear
code layout block called a cluster. The functor Cluster2CFG below
generates a translator that translates a cluster into a CFG:
signature CLUSTER2CFG = sig
structure CFG : CONTROL_FLOW_GRAPH
structure F : FLOWGRAPH
sharing CFG.I = F.I
sharing CFG.P = F.P
sharing CFG.B = F.B
val cluster2cfg : F.cluster -> CFG.cfg
end
functor Cluster2CFG
(structure CFG : CONTROL_FLOW_GRAPH
structure F : FLOWGRAPH
structure P : INSN_PROPERTIES
sharing CFG.I = F.I = P.I
sharing CFG.P = F.P
sharing CFG.B = F.B
) : CLUSTER2CFG
CFG to Cluster
The basic MLRISC system also implements many back-end functions
such as register allocation, assembly output and machine code output.
These modules all utilize the cluster representation. The
functor CFG2Cluster
below generates a translator
that converts a CFG into a cluster. With the previous functor,
the CFG and the cluster presentation can be freely inter-converted.
signature CFG2CLUSTER = sig
structure CFG : CONTROL_FLOW_GRAPH
structure F : FLOWGRAPH
sharing CFG.I = F.I
sharing CFG.P = F.P
sharing CFG.B = F.B
val cfg2cluster : { cfg : CFG.cfg, relayout : bool } -> F.cluster
end
functor CFG2Cluster
(structure CFG : CONTROL_FLOW_GRAPH
structure F : FLOWGRAPH
sharing CFG.I = F.I
sharing CFG.P = F.P
sharing CFG.B = F.B
val patchBranch : {instr:CFG.I.instruction, backwards:bool} ->
CFG.I.instruction list
) : CFG2CLUSTER
When a CFG originates from a cluster, we try to preserve
the same code layout through out all optimizations when possible.
The function cfg2cluster takes an optional flag
that specifies we should force the recomputation of
the code layout of a control flow graph when translating a CFG
back into a cluster.
Basic CFG Transformations
Basic CFG transformations are implemented in the functor
CFGUtil. These transformations include splitting edges, merging
edges, removing unreachable code and tail duplication.
functor CFGUtil
(structure CFG : CONTROL_FLOW_GRAPH
structure P : INSN_PROPERTIES
sharing P.I = CFG.I
) : CFG_UTIL
The signature of CFGUtil is defined below:
signature CFG_UTIL = sig
structure CFG : CONTROL_FLOW_GRAPH
val updateJumpLabel : CFG.cfg -> node_id -> unit
val mergeEdge : CFG.cfg -> CFG.edge -> bool
val eliminateJump : CFG.cfg -> node_id -> bool
val insertJump : CFG.cfg -> node_id -> bool
val splitEdge : CFG.cfg -> { edge : CFG.edge, jump : bool }
-> { edge : CFG.edge, node : CFG.node }
val isMerge : CFG.cfg -> node_id -> bool
val isSplit : CFG.cfg -> node_id -> bool
val hasSideExits : CFG.cfg -> node_id -> bool
val isCriticalEdge : CFG.cfg -> CFG.edge -> bool
val splitAllCriticalEdges : CFG.cfg -> unit
val ceed : CFG.cfg -> node_id * node_id -> bool
val tailDuplicate : CFG.cfg -> { subgraph : CFG.cfg, root : node_id }
-> { nodes : CFG.node list,
edges : CFG.edge list }
val removeUnreachableCode : CFG.cfg -> unit
val mergeAllEdges : CFG.cfg -> unit
end
These functions have the following meanings:
- updateJumpLabel . This function
updates the label of the branch instruction in a block
to be consistent with the control flow edges with source .
This is an nop if the CFG is already consistent.
- mergeEdge . This function merges edge
in the graph if possible. This is successful only if
there are no other edges flowing into and no other edges
flowing out from . It returns true if the merge
operation is successful. If successful, the nodes and
will be coalesced into the block . The jump instruction (if any)
in the node will also be elided.
- eliminateJump . This function eliminate the
jump instruction at the end of block if it is feasible.
- insertJump . This function inserts a jump
instruction in block if it is feasible.
- splitEdge . This function
split the control flow edge , and return a new edge and the
new block as return values. It addition, it takes as
argument a flag jump. If this flag is true,
then a jump instruction is always placed in the
split; otherwise, we try to eliminate the jump when feasible.
- isMerge . This function tests whether block
is a merge node. A merge node is a node that
has two or more incoming flow edges.
- isSplit . This function tests whether block
is a split node. A split node is a node that
has two or more outgoing flow edges.
- hasSideExits . This function tests whether
a block has side exits . This assumes that
is a hyperblock.
- isCriticalEdge . This function tests whether
the edge is a critical edge. The
edge is critical iff
there are is merge node and is a split node.
- splitAllCriticalEdges . This function goes
through the CFG and splits
all critical edges in the CFG.
This can introduce extra jumps and basic blocks in the program.
- mustPreceed . This function
checks whether two blocks and are necessarily adjacent.
Blocks and must be adjacent iff must preceed
in any feasible code layout.
- tailDuplicate.
val tailDuplicate : CFG.cfg -> { subgraph : CFG.cfg, root : node_id }
-> { nodes : CFG.node list,
edges : CFG.edge list }
This function tail-duplicates the region subgraph
until it only has a single entry root.
Return the set of new nodes and new edges.
The region is represented as a subgraph view of the CFG.
Figure Basic CFG Transformations illustrates
this transformation.
- removeUnreachableCode . This function
removes all unreachable code from the graph.
- mergeAllEdges . This function tries to merge all
the edges in the flowgraph . Merging is performed in the
non-increasing order of edge frequencies.
Dataflow Analysis
MLRISC provides a simple customizable module for performing
iterative dataflow analysis. A dataflow analyzer
has the following signature:
signature DATAFLOW_ANALYZER = sig
structure CFG : CONTROL_FLOW_GRAPH
type dataflow_info
val analyze : CFG.cfg * dataflow_info -> dataflow_info
end
A dataflow problem is described by the signature DATAFLOW_PROBLEM,
described below:
signature DATAFLOW_PROBLEM = sig
structure CFG : CONTROL_FLOW_GRAPH
type domain
type dataflow_info
val forward : bool
val bot : domain
val == : domain * domain -> bool
val join : domain list -> domain
val prologue : CFG.cfg * dataflow_info ->
CFG.block node ->
{ input : domain,
output : domain,
transfer : domain -> domain
}
val epilogue : CFG.cfg * dataflow_info ->
{ node : CFG.block node,
input : domain,
output : domain
} -> unit
end
This description contains the following items
- type domain is the abstract lattice domain .
- type dataflow_info is where the dataflow information
is stored.
- forward is true iff the dataflow problem is in the
forward direction
- bot is the bottom element of .
- == is the equality function on .
- join is the least-upper-bound function on .
- prologue is a user-supplied function that performs
pre-processing and setup. For each CFG node , this function
computes
- input -- which is the initial input value of
- output -- which is the initial output value of
- transfer -- which is the transfer function on .
- epilogue is a function that performs post-processing.
It visits each node in the flowgraph and return the resulting
input and output value for .
To generate a new dataflow analyzer from a dataflow problem,
the functor Dataflow can be used:
functor Dataflow(P : DATAFLOW_PROBLEM) : DATAFLOW_ANALYZER =
Static Branch Prediction
Branch Optimizations
|
|
Generated by
mltex2html
|
Last modified: Mon Jun 8 14:18:05 UTC 2009 by buildd@vernadsky
|
|