diff --git a/Ghidra/Features/Decompiler/src/main/doc/sleigh.xml b/Ghidra/Features/Decompiler/src/main/doc/sleigh.xml index c6b1b624c2..c60683a52f 100644 --- a/Ghidra/Features/Decompiler/src/main/doc/sleigh.xml +++ b/Ghidra/Features/Decompiler/src/main/doc/sleigh.xml @@ -4,7 +4,7 @@
Last updated September 5, 2019
Last updated October 28, 2020
Originally published December 16, 2005
Table of Contents
This document describes the syntax for the SLEIGH processor specification language, which was developed for the GHIDRA @@ -129,7 +129,7 @@
SLEIGH is a language for describing the instruction sets of general purpose microprocessors, in order to facilitate the reverse @@ -162,7 +162,7 @@ Italics are used when defining terms and for named entities. Bold is used for SL
Although p-code is a distinct language from SLEIGH, because a major purpose of SLEIGH is to specify the translation from machine code to @@ -221,7 +221,7 @@ respectively.
An address space for p-code is a generalization of the indexed memory (RAM) that a typical processor has access to, and @@ -322,7 +322,7 @@ must be provided and enforced by the specification designer.
P-code is intended to emulate a target processor by substituting a
sequence of p-code operations for each machine instruction. Thus every
diff --git a/GhidraDocs/languages/html/sleigh_constructors.html b/GhidraDocs/languages/html/sleigh_constructors.html
index 0a0b7f8e7d..21c175f0fd 100644
--- a/GhidraDocs/languages/html/sleigh_constructors.html
+++ b/GhidraDocs/languages/html/sleigh_constructors.html
@@ -60,7 +60,7 @@ multiple constructors into a single table are addressed in
A single complex statement in the specification file describes a
constructor. This statement is always made up of five distinct
@@ -92,7 +92,7 @@ in turn.
Every constructor must be part of a table, which is the element with
an actual family symbol identifier associated with it. So each
@@ -230,7 +230,7 @@ no such requirement.
The ‘^’ character in the display section is used to separate
identifiers from other characters where there shouldn’t be white space
@@ -278,7 +278,7 @@ to match the constructor being defined.
The patterns required for processor specifications can almost always
be described as a mask and value pair. Given a specific instruction
@@ -337,7 +337,7 @@ requires two or more mask/value style checks to correctly implement.
The principle way of defining a constructor operand, left undefined
from the display section, is done in the bit pattern section. If an
@@ -396,7 +396,7 @@ statement of the grouping of old symbols into the new constructor.
There are some additional complexities to designing a specification
for a processor with variable length instructions. Some initial
@@ -419,7 +419,7 @@ designer control over how tokens fit together.
The most important operator for patterns defining variable length
instructions is the concatenation operator ‘;’. When building a
@@ -481,7 +481,7 @@ operator, so parentheses may be necessary to get the intended meaning.
The ellipsis operator ‘...’ is used to satisfy the token matching
requirements of the ‘&’ and ‘|’ operators (described in the previous
@@ -557,7 +557,7 @@ don’t quite match the assembly.
Occasionally there is a need for an empty pattern when building
tables. An empty pattern matches everything. There is a predefined
@@ -567,7 +567,7 @@ to indicate an empty pattern.
A constraint does not have to be of the form “field = constant”,
although this is almost always what is needed. In certain situations,
@@ -821,7 +821,7 @@ assignment to such a variable changes the context in which the current
instruction is being disassembled and can potentially have a drastic
effect on how the rest of the instruction is disassembled. An
assignment of this form is considered local to the instruction and
-will not effect how other instructions are parsed. The context
+will not affect how other instructions are parsed. The context
variable is reset to its original value before parsing other
instructions. The disassembly action may also contain one or
more globalset directives, which
@@ -939,7 +939,7 @@ varnode is r1.
Expressions are built out of symbols and the binary and unary
operators listed in Table 5, “Semantic Expression Operators and Syntax” in the
@@ -954,7 +954,7 @@ within expressions to affect this order.
For the most part these operators should be familiar to software
developers. The only real differences arise from the fact that
@@ -1017,7 +1017,7 @@ set to something other than one.
Most processors have instructions that extend small values into big
values, and many instructions do these minor data manipulations
@@ -1039,7 +1039,7 @@ the sext operator.
There are two forms of syntax indicating a truncation of the input
varnode. In one the varnode is followed by a colon ‘:’ and an integer
@@ -1169,7 +1169,7 @@ the offset portion of the address, and to copy the desired value, the
SLEIGH provides basic support for instructions where encoding and context
don't provide a complete description of the semantics. This is the case
@@ -1231,7 +1231,7 @@ define pcodeop arctan;
We describe the types of semantic statements that are allowed in SLEIGH.
SLEIGH supports fairly standard storage statement
syntax to complement the load operator. The left-hand side of an
@@ -1336,7 +1336,7 @@ attribute is set to something other than one.
The semantic section doesn’t just specify how to generate p-code for a
constructor. Except for those constructors in the root table, this
@@ -1366,7 +1366,7 @@ the table symbol mode. When this construc
matched, as part of a more complicated instruction, the
symbol mode will represent the original semantic
value of reg but with the standard post-increment
-side effect.
+side-effect.
The table symbol associated with the constructor becomes
@@ -1388,7 +1388,7 @@ varnode being modified to be exported as an integer constant.
The only other operator allowed as part of
an export statement, is the ‘*’
@@ -1447,7 +1447,7 @@ levels.
This section discusses statements that generate p-code branching
operations. These are listed in Table 7, “Branching Statements”, in the Appendix.
@@ -1802,7 +1802,7 @@ each followed by a variation which corrects the error.
The semantic section must be present for every constructor in the
specification. But the designer can leave the semantics explicitly
@@ -1962,7 +1962,7 @@ should generally be avoided.
When the SLEIGH parser analyzes an instruction, it starts with the
root symbol instruction, and decides which of the
@@ -2045,7 +2045,7 @@ and p-code for these encodings by walking the trees.
If the nodes of each tree are replaced with the display information of
the corresponding specific symbol, we see how the disassembly
@@ -2068,7 +2068,7 @@ statements corresponding to the original instruction encodings.
A similar procedure produces the resulting p-code translation of the
instruction. If each node in the specific symbol tree is replaced with
@@ -2147,7 +2147,7 @@ directive however should not be used in a macro.
Because the nodes of a specific symbol tree are traversed in a
depth-first order, the p-code for a child node in general comes before
@@ -2202,7 +2202,7 @@ normal action of the instruction.
For processors with a pipe-lined architecture, multiple instructions
are typically executing simultaneously. This can lead to processor
diff --git a/GhidraDocs/languages/html/sleigh_context.html b/GhidraDocs/languages/html/sleigh_context.html
index fa762bb070..f8c72d7a6e 100644
--- a/GhidraDocs/languages/html/sleigh_context.html
+++ b/GhidraDocs/languages/html/sleigh_context.html
@@ -85,7 +85,7 @@ whose encodings are otherwise the same.
Suppose a processor supports the use of two different sets of
registers in its main addressing mode, based on the setting of a
@@ -317,7 +317,7 @@ blr is opcode=35 & reg=15 & LRset=1 { return [lr]; }
An alternative to the noflow attribute is to simply issue
multiple directives within a single constructor, so an explicit end to a context change
can be given. The value of the variable exported to the global state
-is the one in affect at the point where the directive is issued. Thus,
+is the one in effect at the point where the directive is issued. Thus,
after one globalset, the same context
variable can be assigned a different value, followed by
another globalset for a different
@@ -328,7 +328,7 @@ Because context in SLEIGH is controlled by a disassembly process,
there are some basic caveats to the use of
the globalset directive. With
flowing context changes,
-there is no guarantee of what global state will be in affect at a
+there is no guarantee of what global state will be in effect at a
particular address. During disassembly, at any given
point, the process may not have uncovered all the relevant directives,
and the known directives may not necessarily be consistent. In
diff --git a/GhidraDocs/languages/html/sleigh_definitions.html b/GhidraDocs/languages/html/sleigh_definitions.html
index 49f23dd0a8..96265a83e0 100644
--- a/GhidraDocs/languages/html/sleigh_definitions.html
+++ b/GhidraDocs/languages/html/sleigh_definitions.html
@@ -44,18 +44,19 @@ define endian=little;
This defines how the processor interprets contiguous sequences of
-bytes as integers. It effects how integer fields within an instruction
-are interpreted (see Section 6.1, “Defining Tokens and Fields”), and
-it also effects the details of how the processor is supposed to
-implement atomic operations like integer addition and integer
-compare. The specification designer should only need to worry about
-these details when labeling instruction fields, otherwise the
-specification language will hide endianess issues.
+bytes as integers or other values and globally affects values across
+all address spaces. It also affects how integer fields
+within an instruction are interpreted, (see Section 6.1, “Defining Tokens and Fields”),
+although it is possible to override this setting in the rare case that endianess is
+different for data versus instruction encoding.
+The specification designer generally only needs to worry about
+endianess when labeling instruction fields and when defining overlapping registers,
+otherwise the specification language hides endianess issues.
An alignment definition looks like
The definition of an address space looks like
Many processors define registers that either consist of a single bit
or otherwise don't use an integral number of bytes. A recurring
@@ -298,7 +299,7 @@ used as an alternate syntax for defining overlapping registers.
The specification designer can define new p-code operations using
a define pcodeop statement. This
diff --git a/GhidraDocs/languages/html/sleigh_layout.html b/GhidraDocs/languages/html/sleigh_layout.html
index 1f312277df..8b641bacdf 100644
--- a/GhidraDocs/languages/html/sleigh_layout.html
+++ b/GhidraDocs/languages/html/sleigh_layout.html
@@ -36,7 +36,7 @@ by the compiler.
Comments start with the ‘#’ character and continue to the end of the
line. Comments can appear anywhere except the display section of a
@@ -46,7 +46,7 @@ interpreted as something that should be printed in disassembly.
Identifiers are made up of letters a-z, capitals A-Z, digits 0-9 and
the characters ‘.’ and ‘_’. An identifier can use these characters in
@@ -55,7 +55,7 @@ any order and for any length, but it must not start with a digit.
String literals can be used, when specifying names and when specifying
how disassembly should be printed, so that special characters are
@@ -66,7 +66,7 @@ meaning.
Integers are specified either in a decimal format or in a standard
C-style hexadecimal format by prepending the
@@ -92,7 +92,7 @@ integers internally with 64 bits of precision.
White space characters include space, tab, line-feed, vertical
line-feed, and carriage-return (‘ ‘, ‘\t’, ‘\r’, ‘\v’,
diff --git a/GhidraDocs/languages/html/sleigh_preprocessing.html b/GhidraDocs/languages/html/sleigh_preprocessing.html
index 1eb1e45f3a..5f47bc64bc 100644
--- a/GhidraDocs/languages/html/sleigh_preprocessing.html
+++ b/GhidraDocs/languages/html/sleigh_preprocessing.html
@@ -54,7 +54,7 @@ own @include directives.
SLEIGH allows simple (unparameterized) macro definitions and
expansions. A macro definition occurs on one line and starts with
@@ -85,7 +85,7 @@ definition of a macro from that point on in the file.
SLEIGH supports several directives that allow conditional inclusion of
parts of a specification, based on the existence of a macro, or its
@@ -103,7 +103,7 @@ and @endif.
The @ifdef directive is followed by a
macro identifier and evaluates to true if the macro is defined.
@@ -129,7 +129,7 @@ or @elif directive (See below).
The @if directive is followed by a
boolean expression with macros as the variables and strings as the
@@ -158,7 +158,7 @@ is defined.
An @else directive splits the lines
bounded by an @if directive and
diff --git a/GhidraDocs/languages/html/sleigh_symbols.html b/GhidraDocs/languages/html/sleigh_symbols.html
index 6eb2b83374..a3ab15203c 100644
--- a/GhidraDocs/languages/html/sleigh_symbols.html
+++ b/GhidraDocs/languages/html/sleigh_symbols.html
@@ -105,7 +105,7 @@ the predefined identifier instruction.
Almost all identifiers live in the same global "scope". The global scope includes
+ define token instr ( 32 ) endian=little op0=(0,15) ...
++The token instr is overridden to be little endian. +This override applies to all fields defined for the token but affects no other tokens. +
++After each field declaration, there can be zero or more of the following attribute keywords:
@@ -74,7 +88,7 @@ different names.Fields are the most basic form of family symbol; they define a natural map from instruction bits to a specific symbol as follows. We take the @@ -99,7 +113,7 @@ the dec attribute is not supported]
The default interpretation of a field is probably the most natural but of course processors interpret fields within an instruction in a wide @@ -110,7 +124,7 @@ interpretations must be built up out of tables.
Probably the most common processor interpretation of a field is as an encoding of a particular register. In SLEIGH this @@ -149,7 +163,7 @@ of the instruction.
Sometimes a processor interprets a field as an integer but not the integer given by the default interpretation. A different integer @@ -171,7 +185,7 @@ unspecified positions in the list using a ‘_’]
It is possible to just modify the display characteristics of a field without changing the semantic meaning. The need for this is rare, but