GP-4489: Add psutil and protobuf to downloads, dist. Build py packages for dist.

This commit is contained in:
Dan
2024-04-26 23:34:30 -04:00
committed by ghidra1
parent 75d5737cce
commit fc17ca970c
24 changed files with 705 additions and 163 deletions

View File

@@ -62,6 +62,11 @@ Build Javadoc:
gradle createJavadocs
```
Build Python3 packages for the Debugger:
```
gradle buildPyPackage
```
Build Ghidra to `build/dist` in an uncompressed form. This will be a distribution intended only to
run on the platform on which it was built.
```
@@ -182,13 +187,18 @@ If you'd like some details of our fine tuning, take a look at [building_fid.txt]
## Debugger Development
We have recently changed the Debugger's back-end architecture.
We no longer user JNA to access native Debugger APIs.
We only use it for pseudo-terminal access.
Instead, we use Python3 and a protobuf-based TCP connection for back-end integration.
### Additional Dependencies
In addition to Ghidra's normal dependencies, you may want the following:
* WinDbg for Windows x64
* GDB 8.0 or later for Linux amd64/x86_64
* LLDB 13.0 for macOS
* GDB 13 or later for Linux
* LLDB 10 or later for macOS
The others (e.g., JNA) are handled by Gradle via Maven Central.
@@ -199,121 +209,137 @@ These all currently reside in the `Ghidra/Debug` directory, but will likely be r
`Framework` and `Feature` directories later. Each project is listed "bottom up" with a brief
description and status.
* ProposedUtils - a collection of utilities proposed to be moved to other respective projects
* AnnotationValidator - an experimental annotation processor for database access objects
* ProposedUtils - a collection of utilities proposed to be moved to other respective projects.
* AnnotationValidator - an experimental annotation processor for database access objects.
* Framework-TraceModeling - a database schema and set of interfaces for storing machine state over
time
time.
* Framework-AsyncComm - a collection of utilities for asynchronous communication (packet formats
and completable-future conveniences).
* Framework-Debugging - specifies interfaces for debugger models and provides implementation
conveniences.
conveniences. This is mostly deprecated.
* Debugger - the collection of Ghidra plugins and services comprising the Debugger UI.
* Debugger-rmi-trace - the wire protocol, client, services, and UI components for Trace RMI, the new back-end architecture.
* Debugger-agent-dbgeng - the connector for WinDbg (via dbgeng.dll) on Windows x64.
* Debugger-agent-dbgmodel - an experimental connector for WinDbg Preview (with TTD, via
dbgmodel.dll) on Windows x64.
* Debugger-agent-dbgmodel-traceloader - an experimental "importer" for WinDbg trace files.
* Debugger-agent-gdb - the connector for GDB (8.0 or later recommended) on UNIX.
* Debugger-swig-lldb - the Java language bindings for LLDB's SBDebugger, also proposed upstream.
* Debugger-agent-lldb - the connector for LLDB (13.0 required) on macOS, UNIX, and Windows.
dbgmodel.dll) on Windows x64. This is deprecated, as most of these features are implemented in Debugger-agent-dbgeng for the new architecture.
* Debugger-agent-dbgmodel-traceloader - an experimental "importer" for WinDbg trace files. This is deprecated.
* Debugger-agent-gdb - the connector for GDB (13 or later recommended) on UNIX.
* Debugger-swig-lldb - the Java language bindings for LLDB's SBDebugger, also proposed upstream. This is deprecated. We now use the Python3 language bindings for LLDB.
* Debugger-agent-lldb - the connector for LLDB (10 or later recommended) on macOS, UNIX, and Windows.
* Debugger-gadp - the connector for our custom wire protocol the Ghidra Asynchronous Debugging
Protocol.
* Debugger-jpda - an in-development connector for Java and Dalvik debugging via JDI (i.e., JDWP).
Protocol. This is deprecated. It's replaced by Debugger-rmi-trace.
* Debugger-jpda - an in-development connector for Java and Dalvik debugging via JDI (i.e., JDWP). This is deprecated and not yet replaced.
The Trace Modeling schema records machine state and markup over time.
It rests on the same database framework as Programs, allowing trace recordings to be stored in a
Ghidra project and shared via a server, if desired. Trace "recording" is a de facto requirement for
displaying information in Ghidra's UI. However, only the machine state actually observed by the user
(or perhaps a script) is recorded. For most use cases, the Trace is small and ephemeral, serving
only to mediate between the UI components and the target's model. It supports many of the same
markup (e.g., disassembly, data types) as Programs, in addition to tracking active threads, loaded
modues, breakpoints, etc.
It rests on the same database framework as Programs, allowing trace recordings to be stored in a Ghidra project and shared via a server, if desired.
Trace "recording" is a de facto requirement for displaying information in Ghidra's UI.
The back-end connector has full discretion over what is recorded by using Trace RMI.
Typically, only the machine state actually observed by the user (or perhaps a script) is recorded.
For most use cases, the Trace is small and ephemeral, serving only to mediate between the UI components and the target's model.
It supports many of the same markup (e.g., disassembly, data types) as Programs, in addition to tracking active threads, loaded modues, breakpoints, etc.
Every model (or "adapter" or "connector" or "agent") implements the API specified in
Framework-Debugging. As a general rule in Ghidra, no component is allowed to access a native API and
reside in the same JVM as the Ghidra UI. This allows us to contain crashes, preventing data loss. To
accommodate this requirement -- given that debugging native applications is almost certainly going
to require access to native APIs -- we've developed the Ghidra Asynchronous Debugging Protocol. This
protocol is tightly coupled to Framework-Debugging, essentially exposing its methods via RMI. The
protocol is built using Google's Protobuf library, providing a potential path for agent
implementations in alternative languages. GADP provides both a server and a client implementation.
The server can accept any model which adheres to the specification and expose it via TCP; the client
does the converse. When a model is instantiated in this way, it is called an "agent," because it is
executing in its own JVM. The other connectors, which do not use native APIs, may reside in Ghidra's
JVM and typically implement alternative wire protocols, e.g., JDWP. In both cases, the
implementations inherit from the same interfaces.
Every back end (or "adapter" or "connector" or "agent") employs the Trace RMI client to populate a trace database.
As a general rule in Ghidra, no component is allowed to access a native API and reside in the same JVM as the Ghidra UI.
This allows us to contain crashes, preventing data loss.
To accommodate this requirement — given that debugging native applications is almost certainly going to require access to native APIs — we've developed the Trace RMI protocol.
This also allows us to better bridge the language gap between Java and Python, which is supported by most native debuggers.
This protocol is loosely coupled to Framework-TraceModeling, essentially exposing its methods via RMI, as well as some methods for controlling the UI.
The protocol is built using Google's Protobuf library, providing a potential path for back-end implementations in alternative languages.
We provide the Trace RMI server as a Ghidra component implemented in Java and the Trace RMI client as a Python3 package.
A back-end implementation may be a stand-alone executable or script that accesses the native debugger's API, or a script or plugin for the native debugger.
It then connects to Ghidra via Trace RMI to populate the trace database with information gleaned from that API.
It should provide a set of diagnostic commands to control and monitor that connection.
It should also use the native API to detect session and target changes so that Ghidra's UI consistently reflects the debugging session.
The Debugger services maintain a collection of active connections and inspect each model for
potential targets. When a target is found, the service inspects the target environment and attempts
to find a suitable opinion. Such an opinion, if found, instructs Ghidra how to map the objects,
addresses, registers, etc. from the target namespace into Ghidra's. The target is then handed to a
Trace Recorder which begins collecting information needed to populate the UI, e.g., the program
counter, stack pointer, and the bytes of memory they refer to.
The old system relied on a "recorder" to discover targets and map them to traces in the proper Ghidra language.
That responsibility is now delegated to the back end.
Typically, it examines the target's architecture and immediately creates a trace upon connection.
### Developing a new connector
So Ghidra does not yet support your favorite debugger?
It is tempting, exciting, but also daunting to develop your own connector.
Please finish reading this guide, and look carefully at the ones we have so far, and perhaps ask to
see if we are already developing one. Of course, in time you might also search the internet to see
if others are developing one. There are quite a few caveats and gotchas, the most notable being that
this interface is still in quite a bit of flux. When things go wrong, it could be because of,
without limitation: 1) a bug on your part, 2) a bug on our part, 3) a design flaw in the interfaces,
or 4) a bug in the debugger/API you're adapting. We are still in the process of writing up this
documentation. In the meantime, we recommend using the GDB and dbgeng.dll agents as examples.
We believe the new system is much less daunting than the previous.
Still, please finish reading this guide, and look carefully at the ones we have so far, and perhaps ask to see if we are already developing one.
Of course, in time you might also search the internet to see if others are developing one.
There are quite a few caveats and gotchas, the most notable being that this interface is still in some flux.
When things go wrong, it could be because of, without limitation:
You'll also need to provide launcher(s) so that Ghidra knows how to configure and start your
connector. Please provide launchers for your model in both configurations: as a connector in
Ghidra's JVM, and as a GADP agent. If your model requires native API access, you should only permit
launching it as a GADP agent, unless you give ample warning in the launcher's description. Look at
the existing launchers for examples. There are many model implementation requirements that cannot be
expressed in Java interfaces. Failing to adhere to those requirements may cause different behaviors
with and without GADP. Testing with GADP tends to reveal those implementation errors, but also
obscures the source of client method calls behind network messages. We've also codified (or
attempted to codify) these requirements in a suite of abstract test cases. See the `ghidra.dbg.test`
package of Framework-Debugging, and again, look at existing implementations.
1. A bug on your part
2. A bug on our part
3. A design flaw in the interfaces
4. A bug in the debugger/API you're adapting
We are still (yes, still) in the process of writing up this documentation.
In the meantime, we recommend using the GDB and dbgeng agents as examples.
Be sure to look at the Python code `src/main/py`!
The deprecated Java code `src/main/java` is still included as we transition.
You'll also need to provide launcher(s) so that Ghidra knows how to configure and start your connector.
These are just shell scripts.
We use bash scripts on Linux and macOS, and we use batch files on Windows.
Try to include as many common use cases as makes sense for the debugger.
This provides the most flexibility to users and examples to power users who might create derivative launchers.
Look at the existing launchers for examples.
For testing, please follow the examples for GDB.
We no longer provide abstract classes that prescribe requirements.
Instead, we just provide GDB as an example.
Usually, we split our tests into three categories:
* Commands
* Methods
* Hooks
The Commands tests check that the user CLI commands, conventionally implemented in `commands.py`, work correctly.
In general, do the minimum connection setup, execute the command, and check that it produces the expected output and causes the expected effects.
The Methods tests check that the remote methods, conventionally implemented in `methods.py`, work correctly.
Many methods are just wrappers around CLI commands, some provided by the native debugger and some provided by `commands.py`.
These work similarly to the commands test, except that they invoke methods instead of executing commands.
Again, check the return value (rarely applicable) and that it causes the expected effects.
The Hooks tests check that the back end is able to listen for session and target changes, e.g., knowing when the target stops.
*The test should not "cheat" by executing commands or invoking methods that should instead be triggered by the listener.*
It should execute the minimal commands to setup the test, then trigger an event.
It should then check that the event in turn triggered the expected effects, e.g., updating PC upon the target stopping.
Whenever you make a change to the Python code, you'll need to re-assemble the package's source.
```
gradle assemblePyPackage
```
This is required in case your package includes generated source, as is the case for Debugger-rmi-trace.
If you want to create a new Ghidra module for your connector (recommended) use an existing one's `build.gradle` as a template.
A key part is applying the `hasPythonPackage.gradle` script.
### Adding a new platform
If an existing connector exists for a suitable debugger on the desired platform, then adding it may
be very simple. For example, both the x86 and ARM platforms are supported by GDB, so even though
we're currently focused on x86 support, we've provided the opinions needed for Ghidra to debug ARM
platforms (and several others) via GDB. These opinions are kept in the "Debugger" project, not their
respective "agent" projects. We imagine there are a number of platforms that could be supported
almost out of the box, except that we haven't written the necessary opinions, yet. Take a look at
the existing ones for examples.
If a connector already exists for a suitable debugger on the desired platform, then adding it may be very simple.
For example, many platforms are supported by GDB, so even though we're currently focused on x86-64 (and to some extent arm64) support, we've provided the mappings for many.
These mappings are conventionally kept in each connector's `arch.py` file.
In general, to write a new opinion, you need to know: 1) What the platform is called (including
variant names) by the debugger, 2) What the processor language is called by Ghidra, 3) If
applicable, the mapping of target address spaces into Ghidra's address spaces, 4) If applicable, the
mapping of target register names to those in Ghidra's processor language. In most cases (3) and (4)
are already implemented by default mappers, so you can use those same mappers in your opinion. Once
you have the opinion written, you can try debugging and recording a target. If Ghidra finds your
opinion applicable to that target, it will attempt to record, and then you can work out the kinds
from there. Again, we have a bit of documentation to do regarding common pitfalls.
In general, to update `arch.py`, you need to know:
1. What the platform is called (including variant names) by the debugger
2. What the processor language is called by Ghidra
3. If applicable, the mapping of target address spaces into Ghidra's address spaces
4. If applicable, the mapping of target register names to those in Ghidra's processor language
In most cases (3) and (4) are already implemented by the included mappers.
Naturally, you'll want to test the special cases, preferably in automated tests.
### Emulation
The most obvious integration path for 3rd-party emulators is to write a "connector." However, p-code
emulation is now an integral feature of the Ghidra UI, and it has a fairly accessible API. Namely,
for interpolation between machines states recorded in a trace, and extrapolation into future machine
states. Integration of such emulators may still be useful to you, but we recommend trying the p-code
emulator to see if it suits your needs for emulation in Ghidra before pursuing integration of
another emulator.
The most obvious integration path for 3rd-party emulators is to write a "connector."
However, p-code emulation is an integral feature of the Ghidra UI, and it has a fairly accessible API.
Namely, for interpolation between machines states recorded in a trace, and extrapolation into future machine states.
Integration of such emulators may still be useful to you, but we recommend trying the p-code emulator to see if it suits your needs for emulation in Ghidra before pursuing integration of another emulator.
We also provide out-of-the-box QEMU integration via GDB.
### Contributing
Whether submitting help tickets and pull requests, please tag those related to the debugger with
"Debugger" so that we can triage them more quickly.
To set up your environment, in addition to the usual Gradle tasks, process the Protobuf
specification for GADP:
```bash
gradle generateProto
```
If you already have an environment set up in Eclipse, please re-run `gradle prepDev eclipse` and
import the new projects.
When submitting help tickets and pull requests, please tag those related to the debugger with "Debugger" so that we can triage them more quickly.
[java]: https://dev.java