diff --git a/Ghidra/Features/Decompiler/src/main/doc/sleigh.xml b/Ghidra/Features/Decompiler/src/main/doc/sleigh.xml index c6b1b624c2..c60683a52f 100644 --- a/Ghidra/Features/Decompiler/src/main/doc/sleigh.xml +++ b/Ghidra/Features/Decompiler/src/main/doc/sleigh.xml @@ -4,7 +4,7 @@ SLEIGH A Language for Rapid Processor Specification Originally published December 16, 2005 - Last updated September 5, 2019 + Last updated October 28, 2020 @@ -573,13 +573,14 @@ define endian=little; This defines how the processor interprets contiguous sequences of -bytes as integers. It effects how integer fields within an instruction -are interpreted (see ), and -it also effects the details of how the processor is supposed to -implement atomic operations like integer addition and integer -compare. The specification designer should only need to worry about -these details when labeling instruction fields, otherwise the -specification language will hide endianess issues. +bytes as integers or other values and globally affects values across +all address spaces. It also affects how integer fields +within an instruction are interpreted, (see ), +although it is possible to override this setting in the rare case that endianess is +different for data versus instruction encoding. +The specification designer generally only needs to worry about +endianess when labeling instruction fields and when defining overlapping registers, +otherwise the specification language hides endianess issues. @@ -966,7 +967,7 @@ individual constructor (defined in @@ -1057,8 +1058,22 @@ there are one or more field declarations specifying the name of the field and the range of bits within the token making up the field. The size of a field does not need to be a multiple of 8. The range is inclusive where the least significant bit in the token -is labeled 0. The endianess of the processor will effect this labeling -when defining tokens that are bigger than 1 byte. After each field +is labeled 0. When defining tokens that are bigger than 1 byte, the +global endianess setting (See ) +will affect this labeling. Although it is rarely required, it is possible to override +the global endianess setting for a specific token by appending either the qualifier +endian=little or endian=big +immediately after the token name and size. For instance: + + + define token instr ( 32 ) endian=little op0=(0,15) ... + + +The token instr is overridden to be little endian. +This override applies to all fields defined for the token but affects no other tokens. + + +After each field declaration, there can be zero or more of the following attribute keywords: @@ -2023,7 +2038,7 @@ assignment to such a variable changes the context in which the current instruction is being disassembled and can potentially have a drastic effect on how the rest of the instruction is disassembled. An assignment of this form is considered local to the instruction and -will not effect how other instructions are parsed. The context +will not affect how other instructions are parsed. The context variable is reset to its original value before parsing other instructions. The disassembly action may also contain one or more globalset directives, which @@ -2547,7 +2562,7 @@ the table symbol mode. When this constructor is matched, as part of a more complicated instruction, the symbol mode will represent the original semantic value of reg but with the standard post-increment -side effect. +side-effect. The table symbol associated with the constructor becomes @@ -3724,7 +3739,7 @@ blr is opcode=35 & reg=15 & LRset=1 { return [lr]; } An alternative to the noflow attribute is to simply issue multiple directives within a single constructor, so an explicit end to a context change can be given. The value of the variable exported to the global state -is the one in affect at the point where the directive is issued. Thus, +is the one in effect at the point where the directive is issued. Thus, after one globalset, the same context variable can be assigned a different value, followed by another globalset for a different @@ -3735,7 +3750,7 @@ Because context in SLEIGH is controlled by a disassembly process, there are some basic caveats to the use of the globalset directive. With flowing context changes, -there is no guarantee of what global state will be in affect at a +there is no guarantee of what global state will be in effect at a particular address. During disassembly, at any given point, the process may not have uncovered all the relevant directives, and the known directives may not necessarily be consistent. In diff --git a/GhidraDocs/languages/html/sleigh.html b/GhidraDocs/languages/html/sleigh.html index cc532db93e..a02b507fe4 100644 --- a/GhidraDocs/languages/html/sleigh.html +++ b/GhidraDocs/languages/html/sleigh.html @@ -25,9 +25,9 @@

-SLEIGH

+SLEIGH

A Language for Rapid Processor Specification

-

Last updated September 5, 2019

+

Last updated October 28, 2020

Originally published December 16, 2005


@@ -35,51 +35,51 @@

Table of Contents

-
1. Introduction to P-Code
+
1. Introduction to P-Code
-
1.1. Address Spaces
+
1.1. Address Spaces
1.2. Varnodes
-
1.3. Operations
+
1.3. Operations
2. Basic Specification Layout
-
2.1. Comments
-
2.2. Identifiers
-
2.3. Strings
-
2.4. Integers
-
2.5. White Space
+
2.1. Comments
+
2.2. Identifiers
+
2.3. Strings
+
2.4. Integers
+
2.5. White Space
3. Preprocessing
3.1. Including Files
-
3.2. Preprocessor Macros
-
3.3. Conditional Compilation
+
3.2. Preprocessor Macros
+
3.3. Conditional Compilation
4. Basic Definitions
4.1. Endianess Definition
-
4.2. Alignment Definition
-
4.3. Space Definitions
+
4.2. Alignment Definition
+
4.3. Space Definitions
4.4. Naming Registers
-
4.5. Bit Range Registers
-
4.6. User-Defined Operations
+
4.5. Bit Range Registers
+
4.6. User-Defined Operations
5. Introduction to Symbols
-
5.1. Notes on Namespaces
+
5.1. Notes on Namespaces
5.2. Predefined Symbols
6. Tokens and Fields
6.1. Defining Tokens and Fields
-
6.2. Fields as Family Symbols
-
6.3. Attaching Alternate Meanings to Fields
+
6.2. Fields as Family Symbols
+
6.3. Attaching Alternate Meanings to Fields
6.4. Context Variables
7. Constructors
-
7.1. The Five Sections of a Constructor
-
7.2. The Table Header
+
7.1. The Five Sections of a Constructor
+
7.2. The Table Header
7.3. The Display Section
7.4. The Bit Pattern Section
7.5. Disassembly Actions Section
@@ -87,12 +87,12 @@
7.7. The Semantic Section
7.8. Tables
7.9. P-code Macros
-
7.10. Build Directives
-
7.11. Delay Slot Directives
+
7.10. Build Directives
+
7.11. Delay Slot Directives
8. Using Context
-
8.1. Basic Use of Context Variables
+
8.1. Basic Use of Context Variables
8.2. Local Context Change
8.3. Global Context Change
@@ -101,7 +101,7 @@

-History

+History

This document describes the syntax for the SLEIGH processor specification language, which was developed for the GHIDRA @@ -129,7 +129,7 @@

-Overview

+Overview

SLEIGH is a language for describing the instruction sets of general purpose microprocessors, in order to facilitate the reverse @@ -162,7 +162,7 @@ Italics are used when defining terms and for named entities. Bold is used for SL

-1. Introduction to P-Code

+1. Introduction to P-Code

Although p-code is a distinct language from SLEIGH, because a major purpose of SLEIGH is to specify the translation from machine code to @@ -221,7 +221,7 @@ respectively.

-1.1. Address Spaces

+1.1. Address Spaces

An address space for p-code is a generalization of the indexed memory (RAM) that a typical processor has access to, and @@ -322,7 +322,7 @@ must be provided and enforced by the specification designer.

-1.3. Operations

+1.3. Operations

P-code is intended to emulate a target processor by substituting a sequence of p-code operations for each machine instruction. Thus every diff --git a/GhidraDocs/languages/html/sleigh_constructors.html b/GhidraDocs/languages/html/sleigh_constructors.html index 0a0b7f8e7d..21c175f0fd 100644 --- a/GhidraDocs/languages/html/sleigh_constructors.html +++ b/GhidraDocs/languages/html/sleigh_constructors.html @@ -60,7 +60,7 @@ multiple constructors into a single table are addressed in

-7.1. The Five Sections of a Constructor

+7.1. The Five Sections of a Constructor

A single complex statement in the specification file describes a constructor. This statement is always made up of five distinct @@ -92,7 +92,7 @@ in turn.

-7.2. The Table Header

+7.2. The Table Header

Every constructor must be part of a table, which is the element with an actual family symbol identifier associated with it. So each @@ -230,7 +230,7 @@ no such requirement.

-7.3.2. The '^' character

+7.3.2. The '^' character

The ‘^’ character in the display section is used to separate identifiers from other characters where there shouldn’t be white space @@ -278,7 +278,7 @@ to match the constructor being defined.

-7.4.1. Constraints

+7.4.1. Constraints

The patterns required for processor specifications can almost always be described as a mask and value pair. Given a specific instruction @@ -337,7 +337,7 @@ requires two or more mask/value style checks to correctly implement.

-7.4.3. Defining Operands and Invoking Subtables

+7.4.3. Defining Operands and Invoking Subtables

The principle way of defining a constructor operand, left undefined from the display section, is done in the bit pattern section. If an @@ -396,7 +396,7 @@ statement of the grouping of old symbols into the new constructor.

-7.4.4. Variable Length Instructions

+7.4.4. Variable Length Instructions

There are some additional complexities to designing a specification for a processor with variable length instructions. Some initial @@ -419,7 +419,7 @@ designer control over how tokens fit together.

-7.4.4.1. The ';' Operator
+7.4.4.1. The ';' Operator

The most important operator for patterns defining variable length instructions is the concatenation operator ‘;’. When building a @@ -481,7 +481,7 @@ operator, so parentheses may be necessary to get the intended meaning.

-7.4.4.2. The '...' Operator
+7.4.4.2. The '...' Operator

The ellipsis operator ‘...’ is used to satisfy the token matching requirements of the ‘&’ and ‘|’ operators (described in the previous @@ -557,7 +557,7 @@ don’t quite match the assembly.

-7.4.6. Empty Patterns

+7.4.6. Empty Patterns

Occasionally there is a need for an empty pattern when building tables. An empty pattern matches everything. There is a predefined @@ -567,7 +567,7 @@ to indicate an empty pattern.

-7.4.7. Advanced Constraints

+7.4.7. Advanced Constraints

A constraint does not have to be of the form “field = constant”, although this is almost always what is needed. In certain situations, @@ -821,7 +821,7 @@ assignment to such a variable changes the context in which the current instruction is being disassembled and can potentially have a drastic effect on how the rest of the instruction is disassembled. An assignment of this form is considered local to the instruction and -will not effect how other instructions are parsed. The context +will not affect how other instructions are parsed. The context variable is reset to its original value before parsing other instructions. The disassembly action may also contain one or more globalset directives, which @@ -939,7 +939,7 @@ varnode is r1.

-7.7.1. Expressions

+7.7.1. Expressions

Expressions are built out of symbols and the binary and unary operators listed in Table 5, “Semantic Expression Operators and Syntax” in the @@ -954,7 +954,7 @@ within expressions to affect this order.

-7.7.1.1. Arithmetic, Logical and Boolean Operators
+7.7.1.1. Arithmetic, Logical and Boolean Operators

For the most part these operators should be familiar to software developers. The only real differences arise from the fact that @@ -1017,7 +1017,7 @@ set to something other than one.

-7.7.1.3. Extension
+7.7.1.3. Extension

Most processors have instructions that extend small values into big values, and many instructions do these minor data manipulations @@ -1039,7 +1039,7 @@ the sext operator.

-7.7.1.4. Truncation
+7.7.1.4. Truncation

There are two forms of syntax indicating a truncation of the input varnode. In one the varnode is followed by a colon ‘:’ and an integer @@ -1169,7 +1169,7 @@ the offset portion of the address, and to copy the desired value, the

-7.7.1.7. Managed Code Operations
+7.7.1.7. Managed Code Operations

SLEIGH provides basic support for instructions where encoding and context don't provide a complete description of the semantics. This is the case @@ -1231,7 +1231,7 @@ define pcodeop arctan;

-7.7.2. Statements

+7.7.2. Statements

We describe the types of semantic statements that are allowed in SLEIGH.

@@ -1305,7 +1305,7 @@ and may be enforced in future compiler versions.
-7.7.2.2. Storage Statements
+7.7.2.2. Storage Statements

SLEIGH supports fairly standard storage statement syntax to complement the load operator. The left-hand side of an @@ -1336,7 +1336,7 @@ attribute is set to something other than one.

-7.7.2.3. Exports
+7.7.2.3. Exports

The semantic section doesn’t just specify how to generate p-code for a constructor. Except for those constructors in the root table, this @@ -1366,7 +1366,7 @@ the table symbol mode. When this construc matched, as part of a more complicated instruction, the symbol mode will represent the original semantic value of reg but with the standard post-increment -side effect. +side-effect.

The table symbol associated with the constructor becomes @@ -1388,7 +1388,7 @@ varnode being modified to be exported as an integer constant.

-7.7.2.4. Dynamic References
+7.7.2.4. Dynamic References

The only other operator allowed as part of an export statement, is the ‘*’ @@ -1447,7 +1447,7 @@ levels.

-7.7.2.5. Branching Statements
+7.7.2.5. Branching Statements

This section discusses statements that generate p-code branching operations. These are listed in Table 7, “Branching Statements”, in the Appendix. @@ -1802,7 +1802,7 @@ each followed by a variation which corrects the error.

-7.7.4. Unimplemented Semantics

+7.7.4. Unimplemented Semantics

The semantic section must be present for every constructor in the specification. But the designer can leave the semantics explicitly @@ -1962,7 +1962,7 @@ should generally be avoided.

-7.8.2. Specific Symbol Trees

+7.8.2. Specific Symbol Trees

When the SLEIGH parser analyzes an instruction, it starts with the root symbol instruction, and decides which of the @@ -2045,7 +2045,7 @@ and p-code for these encodings by walking the trees.

-7.8.2.1. Disassembly Trees
+7.8.2.1. Disassembly Trees

If the nodes of each tree are replaced with the display information of the corresponding specific symbol, we see how the disassembly @@ -2068,7 +2068,7 @@ statements corresponding to the original instruction encodings.

-7.8.2.2. P-code Trees
+7.8.2.2. P-code Trees

A similar procedure produces the resulting p-code translation of the instruction. If each node in the specific symbol tree is replaced with @@ -2147,7 +2147,7 @@ directive however should not be used in a macro.

-7.10. Build Directives

+7.10. Build Directives

Because the nodes of a specific symbol tree are traversed in a depth-first order, the p-code for a child node in general comes before @@ -2202,7 +2202,7 @@ normal action of the instruction.

-7.11. Delay Slot Directives

+7.11. Delay Slot Directives

For processors with a pipe-lined architecture, multiple instructions are typically executing simultaneously. This can lead to processor diff --git a/GhidraDocs/languages/html/sleigh_context.html b/GhidraDocs/languages/html/sleigh_context.html index fa762bb070..f8c72d7a6e 100644 --- a/GhidraDocs/languages/html/sleigh_context.html +++ b/GhidraDocs/languages/html/sleigh_context.html @@ -85,7 +85,7 @@ whose encodings are otherwise the same.

-8.1. Basic Use of Context Variables

+8.1. Basic Use of Context Variables

Suppose a processor supports the use of two different sets of registers in its main addressing mode, based on the setting of a @@ -317,7 +317,7 @@ blr is opcode=35 & reg=15 & LRset=1 { return [lr]; } An alternative to the noflow attribute is to simply issue multiple directives within a single constructor, so an explicit end to a context change can be given. The value of the variable exported to the global state -is the one in affect at the point where the directive is issued. Thus, +is the one in effect at the point where the directive is issued. Thus, after one globalset, the same context variable can be assigned a different value, followed by another globalset for a different @@ -328,7 +328,7 @@ Because context in SLEIGH is controlled by a disassembly process, there are some basic caveats to the use of the globalset directive. With flowing context changes, -there is no guarantee of what global state will be in affect at a +there is no guarantee of what global state will be in effect at a particular address. During disassembly, at any given point, the process may not have uncovered all the relevant directives, and the known directives may not necessarily be consistent. In diff --git a/GhidraDocs/languages/html/sleigh_definitions.html b/GhidraDocs/languages/html/sleigh_definitions.html index 49f23dd0a8..96265a83e0 100644 --- a/GhidraDocs/languages/html/sleigh_definitions.html +++ b/GhidraDocs/languages/html/sleigh_definitions.html @@ -44,18 +44,19 @@ define endian=little;

This defines how the processor interprets contiguous sequences of -bytes as integers. It effects how integer fields within an instruction -are interpreted (see Section 6.1, “Defining Tokens and Fields”), and -it also effects the details of how the processor is supposed to -implement atomic operations like integer addition and integer -compare. The specification designer should only need to worry about -these details when labeling instruction fields, otherwise the -specification language will hide endianess issues. +bytes as integers or other values and globally affects values across +all address spaces. It also affects how integer fields +within an instruction are interpreted, (see Section 6.1, “Defining Tokens and Fields”), +although it is possible to override this setting in the rare case that endianess is +different for data versus instruction encoding. +The specification designer generally only needs to worry about +endianess when labeling instruction fields and when defining overlapping registers, +otherwise the specification language hides endianess issues.

-4.2. Alignment Definition

+4.2. Alignment Definition

An alignment definition looks like

@@ -72,7 +73,7 @@ instruction as an error.

-4.3. Space Definitions

+4.3. Space Definitions

The definition of an address space looks like

@@ -227,7 +228,7 @@ define register offset=0 size=1

-4.5. Bit Range Registers

+4.5. Bit Range Registers

Many processors define registers that either consist of a single bit or otherwise don't use an integral number of bytes. A recurring @@ -298,7 +299,7 @@ used as an alternate syntax for defining overlapping registers.

-4.6. User-Defined Operations

+4.6. User-Defined Operations

The specification designer can define new p-code operations using a define pcodeop statement. This diff --git a/GhidraDocs/languages/html/sleigh_layout.html b/GhidraDocs/languages/html/sleigh_layout.html index 1f312277df..8b641bacdf 100644 --- a/GhidraDocs/languages/html/sleigh_layout.html +++ b/GhidraDocs/languages/html/sleigh_layout.html @@ -36,7 +36,7 @@ by the compiler.

-2.1. Comments

+2.1. Comments

Comments start with the ‘#’ character and continue to the end of the line. Comments can appear anywhere except the display section of a @@ -46,7 +46,7 @@ interpreted as something that should be printed in disassembly.

-2.2. Identifiers

+2.2. Identifiers

Identifiers are made up of letters a-z, capitals A-Z, digits 0-9 and the characters ‘.’ and ‘_’. An identifier can use these characters in @@ -55,7 +55,7 @@ any order and for any length, but it must not start with a digit.

-2.3. Strings

+2.3. Strings

String literals can be used, when specifying names and when specifying how disassembly should be printed, so that special characters are @@ -66,7 +66,7 @@ meaning.

-2.4. Integers

+2.4. Integers

Integers are specified either in a decimal format or in a standard C-style hexadecimal format by prepending the @@ -92,7 +92,7 @@ integers internally with 64 bits of precision.

-2.5. White Space

+2.5. White Space

White space characters include space, tab, line-feed, vertical line-feed, and carriage-return (‘ ‘, ‘\t’, ‘\r’, ‘\v’, diff --git a/GhidraDocs/languages/html/sleigh_preprocessing.html b/GhidraDocs/languages/html/sleigh_preprocessing.html index 1eb1e45f3a..5f47bc64bc 100644 --- a/GhidraDocs/languages/html/sleigh_preprocessing.html +++ b/GhidraDocs/languages/html/sleigh_preprocessing.html @@ -54,7 +54,7 @@ own @include directives.

-3.2. Preprocessor Macros

+3.2. Preprocessor Macros

SLEIGH allows simple (unparameterized) macro definitions and expansions. A macro definition occurs on one line and starts with @@ -85,7 +85,7 @@ definition of a macro from that point on in the file.

-3.3. Conditional Compilation

+3.3. Conditional Compilation

SLEIGH supports several directives that allow conditional inclusion of parts of a specification, based on the existence of a macro, or its @@ -103,7 +103,7 @@ and @endif.

-3.3.1. @ifdef and @ifndef

+3.3.1. @ifdef and @ifndef

The @ifdef directive is followed by a macro identifier and evaluates to true if the macro is defined. @@ -129,7 +129,7 @@ or @elif directive (See below).

-3.3.2. @if

+3.3.2. @if

The @if directive is followed by a boolean expression with macros as the variables and strings as the @@ -158,7 +158,7 @@ is defined.

-3.3.3. @else and @elif

+3.3.3. @else and @elif

An @else directive splits the lines bounded by an @if directive and diff --git a/GhidraDocs/languages/html/sleigh_symbols.html b/GhidraDocs/languages/html/sleigh_symbols.html index 6eb2b83374..a3ab15203c 100644 --- a/GhidraDocs/languages/html/sleigh_symbols.html +++ b/GhidraDocs/languages/html/sleigh_symbols.html @@ -105,7 +105,7 @@ the predefined identifier instruction.

-5.1. Notes on Namespaces

+5.1. Notes on Namespaces

Almost all identifiers live in the same global "scope". The global scope includes

@@ -138,7 +138,7 @@ individual constructor (defined in hides the global symbol while that scope -is in affect. +is in effect.

diff --git a/GhidraDocs/languages/html/sleigh_tokens.html b/GhidraDocs/languages/html/sleigh_tokens.html index 79521cfd7b..dc572dfdb4 100644 --- a/GhidraDocs/languages/html/sleigh_tokens.html +++ b/GhidraDocs/languages/html/sleigh_tokens.html @@ -56,8 +56,22 @@ there are one or more field declarations specifying the name of the field and the range of bits within the token making up the field. The size of a field does not need to be a multiple of 8. The range is inclusive where the least significant bit in the token -is labeled 0. The endianess of the processor will effect this labeling -when defining tokens that are bigger than 1 byte. After each field +is labeled 0. When defining tokens that are bigger than 1 byte, the +global endianess setting (See Section 4.1, “Endianess Definition”) +will affect this labeling. Although it is rarely required, it is possible to override +the global endianess setting for a specific token by appending either the qualifier +endian=little or endian=big +immediately after the token name and size. For instance: +

+
+  define token instr ( 32 ) endian=little op0=(0,15) ...
+
+

+The token instr is overridden to be little endian. +This override applies to all fields defined for the token but affects no other tokens. +

+

+After each field declaration, there can be zero or more of the following attribute keywords:

@@ -74,7 +88,7 @@ different names.

-6.2. Fields as Family Symbols

+6.2. Fields as Family Symbols

Fields are the most basic form of family symbol; they define a natural map from instruction bits to a specific symbol as follows. We take the @@ -99,7 +113,7 @@ the dec attribute is not supported]

-6.3. Attaching Alternate Meanings to Fields

+6.3. Attaching Alternate Meanings to Fields

The default interpretation of a field is probably the most natural but of course processors interpret fields within an instruction in a wide @@ -110,7 +124,7 @@ interpretations must be built up out of tables.

-6.3.1. Attaching Registers

+6.3.1. Attaching Registers

Probably the most common processor interpretation of a field is as an encoding of a particular register. In SLEIGH this @@ -149,7 +163,7 @@ of the instruction.

-6.3.2. Attaching Other Integers

+6.3.2. Attaching Other Integers

Sometimes a processor interprets a field as an integer but not the integer given by the default interpretation. A different integer @@ -171,7 +185,7 @@ unspecified positions in the list using a ‘_’]

-6.3.3. Attaching Names

+6.3.3. Attaching Names

It is possible to just modify the display characteristics of a field without changing the semantic meaning. The need for this is rare, but