MiniIR Design Draft
Two sources:
- LLVM etc.'s production-level IR
- Use it to do PPA book exercises in an iterative style
Considerations
- Don't use SSA (but you can compile it to SSA, certainly)
- Use
jump
instead of imperative control flow statement => While keeps instruction-centered language design - Comments
- Better share a similar syntax to LLVM IR (highlight support, familiarity for expert etc.)
- Easy to hand code
What can I borrow from LLVM?
- Global variable and function
- Module system and presence of linkage implication (use a keyword like
extern
) - Supporting high-level types like structure and array
-
Put pointer in a more reasonable position
Notes -- LLVM IR doc
identifier: %
is used for local var, @
is used for global var; three kinds: named, unnamed(temp var), constant
LLVM programs are composed of Module
s. Each module consists of functions, global bars, and symtable entries.
LLVM linker => name resolving & merging
linkage types:
private
: module-locally-visible, but as a global varinternal
: local symbol, likestatic
external
- ...
calling conventions
ccc
: C calling conv, varargs, mistypingfastcc
: fast calling conv (reg based), tail call opt, fixed args, exact-typingcoldcc
: cold calling conv. for not called much functionscc 10
: GHC conv. reg-based, tail-call, reg-pinning ...cc 11
: high performance erlang conv, similar to GHCwebkit_jscc
- ...
Visibility style:
default
: visible to other module, "extern linkage", can be overridenhidden
: not place in dynamic symbol tableprotected
: in dyntable, but reference within the define module will bind locally.
thread local storage model:
localdynamic
: only used within current shared libinitialexec
: will not be loaded dynamicallylocalexec
: used within defined
structure types: "identified" and "literal"
global var always defines a pointer to their "content"; unnamed_addr
means that the address is not important, only the content. syntax:
[@G<Name> =] [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal]
[unnamed_addr] [AddrSpace] [ExternallyInitialized]
<global | constant> <Type> [<InitializerConstant>]
[, section "name"] [, comdat [($name)]]
[, aligh <Alignment> ]
Functions:
define
param attr, func attr, GC name, predix, prologue, personality
a func def contains a list of basic blocks, forming the CFG for the function
each BB starts with a label, contains a list of insts, ends with a terminator inst (like return or branch).
alias: don't create new data, aliasing either a global value or a constant expr.
Comdat IR provides access to COFF or ELF object COMDAT functionality.
named metadata: a string of characters with metadata prefix
LVM IR allows metadata to be attached to instructions in the program that can convey extra information about the code to the optimizers and code generator. One example application of metadata is source-level debug information. There are two metadata primitives: strings and nodes.
param attr:
zeroext
: zero-extentded to the extend required by ABI of caller/calleesignext
inreg
: emit to registerbyval
: passed by value, a copy is madeinalloca
: a pointer to stack memory pointer.sret
: as a valid structure returned by function *align
noalias
:objects [ accessed via pointer values based on the argument or return value ] are not also accessed [ during the execution of the function][ via pointer values not based on the argument or return value ]nocapture
: callee does not make any copies of the pointer that outlive the callee itselfnest
: can be excised using trampoline instrinsicsreturned
: argument that will be returned => tail callnonnull
indicated that the parameter or return value is not NULLdereferenceable
: no trapping
prologue data: arbitrary code to be inserted before the function body; can be used for hot-patching and instrumentation
func attr: ...
data layout
traget triple
any mem access must be done through a pointer specified with a addr range.
Pointer aliasing rules:
- associated with based-on value
- global var addr is associated with addr range of var's storage
- res value of an allocation inst is associated with the addr range of allocated storage
- null pointer is assoced with no addr
a pointer value is based on another pointer value according to:
- first operand of
getelementptr
- operand of
bitcast
inttoptr
is based on all values contributed to the computation of pointer value- based on is transitive
volatile mem access: load
s, store
s etc. marked with volatile
, its number or order of execution will not changed by optimizer
mem model for conc op
atomic mem ordering constraints
fast-math flags
use-list order directives
type system => typing means optimization
void
function type: pointer to function, vararg, structure returing
- first class types
- single value
- integer
- float
- MMX
- pointer type
- vector type
- label type
- token type
- metadata type
- aggregate type
- array
- structure
- opaque
- single value
- Constants
- Simple
- boolean
- integer
- float
- null pointer
- Complex
- structure
- array
- vector (NOTE: vector is a simple type, while array is an aggregate type)
- zero initializer
- metadata node
- Simple
The address of global var and functions are always implicitly valid (link-time) constants. Can be referenced with identifier for constant.
undef
can be used anywhere a constant is expected. It indicated that the program is well-defined for whatever value is used. For linear algebra and xor
, it can be used; but the logical operator like and
and or
could not be optimized freely.
Poison values: it seems meaning that the undefined value can propagate in a certain style. dependency could create more poison value.