Updated README

Added "Getting Started", Versioning Information, and more details about how to add processor support.
This commit is contained in:
Maddie Stone
2017-10-03 14:27:01 -04:00
committed by GitHub
parent bef03ae0fc
commit f034874d8d

119
README.md
View File

@@ -1,16 +1,109 @@
# IDAPython Embedded Toolkit
The IDAPython Embedded Toolkit is a set of script to automate many of the steps associated
with statically analyzing, or reverse engineering, the firmware of embedded devices in IDA Pro.
## Description
IDAPython is a way to script different actions in the IDA Pro disassembler with Python. This
repository of scripts automates many different processes necessary when analyzing the
firmware running on microcontroller and microprocessor CPUs. The scripts are written to be
easily modified to run on a variety of architectures. Read the instructions in the header of each
script to determine what ought to be modified for each architecture.
## Presentations
The IDAPython Embedded Toolkit has been presented at the following venues:
* DerbyCon "IDAPython: The Wonder Woman of Embedded Device Reversing" -- September 2017<br/>
Recording of Talk: http://www.irongeek.com/i.php?page=videos/derbycon7/t215-idapython-the-wonder-woman-of-embedded-device-reversing-maddie-stone <br/>
Slides and Demo Videos from Presentation are available in the [presentations](presentations/) folder
* RECON Montreal "The Life-Changing Magic of IDAPython: Embedded Device Edition" -- June 2017 <br/>
Slides and Demo Videos from Presentation are available in the [presentations](presentations/) folder
## How to Run
Install IDAPython per: https://github.com/idapython/src
# Getting Started
To understand how and why the IDAPython Embedded Toolkit was created, check out the slides and recording from
the DerbyCon or RECON Presentations.
Once your IDA database is open, go to File > Script file... and select the script to run.
The IDAPython Embedded Toolkit is a set of IDAPython scripts written to be processor/architecture-agnostic
and automate the triage, analysis, and annotation processes associated with reversing the firmware
image of an embedded device. The currently available scripts:
* TRIAGE<a name="triage"></a>
* [Define Code & Functions](https://github.com/maddiestone/IDAPythonEmbeddedToolkit/blob/master/define_code_functions.py)
* [Define Data](https://github.com/maddiestone/IDAPythonEmbeddedToolkit/blob/master/define_data_as_types.py)
* [Define Strings](https://github.com/maddiestone/IDAPythonEmbeddedToolkit/blob/master/make_strings.py)
* ANALYSIS<a name="analysis"></a>
* [Calculate Indirect Offset Memory Accesses](https://github.com/maddiestone/IDAPythonEmbeddedToolkit/blob/master/data_offset_calc.py)
* [Find Memory Accesses](https://github.com/maddiestone/IDAPythonEmbeddedToolkit/blob/master/find_mem_accesses.py)
* ANNOTATE<a name="annotate"></a>
* [Identify GPIO Usage](https://github.com/maddiestone/IDAPythonEmbeddedToolkit/blob/master/identify_port_use_locations.py)
* [Identify "Dead" Code](https://github.com/maddiestone/IDAPythonEmbeddedToolkit/blob/master/label_funcs_with_no_xrefs.py)
* [Trace Operand Use](https://github.com/maddiestone/IDAPythonEmbeddedToolkit/blob/master/identify_operand_locations.py)
Each script is written to be processor/architecture-agnostic, but in some scripts, this requires a regular expression
to address each architecture's specific-syntax. Before running the scripts, verify that the architecture of the firmware
image to be analyzed is supported in the script.Please see [Architecture Agnostic Structure of Scripts](#archagnostic) for more details.
The IDAPython Embedded Toolkit only becomes more powerful, the more processors that are supported, so please submit a pull request
as you add new processors.
To run a script, you must have IDA Pro 6.95 installed. Open the IDA database on which you'd like to run a script and then
select File > Script File... and select the script to run.
## Versioning
Currently, the IDAPython Embedded Toolkit has only been tested on IDA Pro 6.95. Testing on IDA
Pro 7.0 is currently in process.
## Installation/ Usage
If you completed the default installation for IDA Pro, then IDAPython should be installed.
You can verify by checking your IDA directory for a Python/ folder. If that is there, IDAPython
is installed.
Otherwise, install IDAPython per: https://github.com/idapython/src
Once IDAPython is installed, the IDAPython Embedded Toolkit scripts may be run by opening an
IDA database and selecting File > Script file... from the upper menu. Then, select the script to run.
Each script is run individually by selecting it through this process.
## Architecture Agnostic Structure of Scripts<a name="archagnostic"></a>
The scripts in the IDAPython Embedded Toolkit are written to be architecture and processor-agnostic.
This is done by finding the common structure and processes that are not dependent on architecture-specific syntax.
For the scripts that require processor-specific syntax (for example: Special Function Register Names or Instruction Syntax),
regular expressions are used for each architecture. For more information on how to write regular expressions in Python:
https://docs.python.org/2/library/re.html
Thanks to the contribution by @tmr232, each script auto-identifies the architecture in use and selects the correct set of
regular expressions using the IDAPython function:
`processor_name = idaapi.get_inf_structure().procName `
### Add a Processor to a Script
If the processor-in-use does not have regular expressions defined within the script, then the script will exit with an
"Unsupported Processor Type" error. To make the script work, you simply need to add the required regular expression. To do this:
1. Determine IDA's string representation of the processor. In the bottom console bar, type the following
command as shown in the image below: `idaapi.get_inf_structure().procName` The command will output a string. That string is the processor name. <br/><br/>
![Image of Command to Get Processor](images/getProcessorScreenShot.png)
<br/><br/>
2. Add an elif statement to the script with the processor name output in Step 1.
3. Copy the regular expression assignments from another one of the processor's and customize them for the new processor being added.
The Python documentation for regular expressions is [here.](https://docs.python.org/2/library/re.html) Each script that utilizes
processor-specific regular expressions describes what the regular expression is describing in the header of the script.
Example of the Regular Expressions for Processor-Specific Syntax in define_code_functions.py
```
################### USER DEFINED VALUES ###################
# Enter a regular expression for how this architecture usually
# begins and ends functions. If the architecture does not
# dictate how to start or end a function use r".*" to allow
# for any instruction.
#
processor_name = idaapi.get_inf_structure().procName
if processor_name == '8051': # 8051 Architecture Prologue and Epilogue smart_prolog = re.compile(r".*")
smart_epilog = re.compile(r"reti{0,1}")
elif processor_name == 'PIC18Cxx': # PIC18 Architecture Prologue and Epilogue
smart_prolog = re.compile(r".*")
smart_epilog = re.compile(r"return 0")
elif processor_name == 'm32r': # Mitsubishi M32R Architecutre Prologue and Epilogue
smart_prolog = re.compile(r"push +lr")
smart_epilog = re.compile(r"jmp +lr.*")
elif processor_name == 'TMS32028': # Texas Instruments TMS320C28x
smart_prolog = re.compile(r".*")
smart_epilog = re.compile(r"lretr")
elif processor_name == 'AVR': # AVR
smart_prolog = re.compile(r"push +r")
smart_epilog = re.compile(r"reti{0,1}")
else:
print "[define_code_functions.py] UNSUPPORTED PROCESSOR. Processor = %s is unsupported. Exiting." % processor_name
raise NotImplementedError('Unsupported Processor Type.')
```
## Scripts in the IDAPython Embedded Toolkit
* **data_offset_calc.py -- Resolve Indirect Offset Memory Accesses**
@@ -26,6 +119,14 @@ creates a data cross references (add_dref), and creates a comment of the resolve
new_opnd_display: A string representation of how the calculated and resolved
value should be displayed as the operand in the instruction
For example, let's say we have firmware where fp = 0x808000 and the majority of memory accesses are as
offsets from fp. This script will calculate that the instruction is reading 0x80C114, create a cross-reference
to that location, and replace the operand in the instruction with this calculated value as shown below.
```
ld R1, @(0x4114, fp) --> ld R1, @[0x80C114]
add3 R10, fp, 0x4147 --> add3 R10, fp, 0x4147; @[0x80C147]
```
* **define_code_functions.py -- Define Code and Functions**
This script scans an area of the database from the user input "start address" to "end address"
defining the bytes as code and attempting to define functions from that code. The script