GP-4489: Add psutil and protobuf to downloads, dist. Build py packages for dist.

2026-01-08 05:34:00 -05:00 · 2024-04-26 23:34:30 -04:00
parent 75d5737cce
commit fc17ca970c
24 changed files with 705 additions and 163 deletions
--- a/DevGuide.md
+++ b/DevGuide.md
@@ -62,6 +62,11 @@ Build Javadoc:
 gradle createJavadocs
 ```

+Build Python3 packages for the Debugger:
+```
+gradle buildPyPackage
+```
+
 Build Ghidra to `build/dist` in an uncompressed form.  This will be a distribution intended only to 
 run on the platform on which it was built.
 ```
@@ -182,13 +187,18 @@ If you'd like some details of our fine tuning, take a look at [building_fid.txt]

 ## Debugger Development

+We have recently changed the Debugger's back-end architecture.
+We no longer user JNA to access native Debugger APIs.
+We only use it for pseudo-terminal access.
+Instead, we use Python3 and a protobuf-based TCP connection for back-end integration.
+
 ### Additional Dependencies

 In addition to Ghidra's normal dependencies, you may want the following:

 * WinDbg for Windows x64
- * GDB 8.0 or later for Linux amd64/x86_64
- * LLDB 13.0 for macOS
+ * GDB 13 or later for Linux
+ * LLDB 10 or later for macOS

 The others (e.g., JNA) are handled by Gradle via Maven Central.

@@ -199,121 +209,137 @@ These all currently reside in the `Ghidra/Debug` directory, but will likely be r
 `Framework` and `Feature` directories later. Each project is listed "bottom up" with a brief 
 description and status.

- * ProposedUtils - a collection of utilities proposed to be moved to other respective projects
- * AnnotationValidator - an experimental annotation processor for database access objects
+ * ProposedUtils - a collection of utilities proposed to be moved to other respective projects.
+ * AnnotationValidator - an experimental annotation processor for database access objects.
 * Framework-TraceModeling - a database schema and set of interfaces for storing machine state over
- time
+ time.
 * Framework-AsyncComm - a collection of utilities for asynchronous communication (packet formats
 and completable-future conveniences).
 * Framework-Debugging - specifies interfaces for debugger models and provides implementation
- conveniences.
+ conveniences. This is mostly deprecated.
 * Debugger - the collection of Ghidra plugins and services comprising the Debugger UI.
+ * Debugger-rmi-trace - the wire protocol, client, services, and UI components for Trace RMI, the new back-end architecture.
 * Debugger-agent-dbgeng - the connector for WinDbg (via dbgeng.dll) on Windows x64.
 * Debugger-agent-dbgmodel - an experimental connector for WinDbg Preview (with TTD, via 
- dbgmodel.dll) on Windows x64.
- * Debugger-agent-dbgmodel-traceloader - an experimental "importer" for WinDbg trace files.
- * Debugger-agent-gdb - the connector for GDB (8.0 or later recommended) on UNIX.
- * Debugger-swig-lldb - the Java language bindings for LLDB's SBDebugger, also proposed upstream.
- * Debugger-agent-lldb - the connector for LLDB (13.0 required) on macOS, UNIX, and Windows.
+ dbgmodel.dll) on Windows x64. This is deprecated, as most of these features are implemented in Debugger-agent-dbgeng for the new architecture.
+ * Debugger-agent-dbgmodel-traceloader - an experimental "importer" for WinDbg trace files. This is deprecated.
+ * Debugger-agent-gdb - the connector for GDB (13 or later recommended) on UNIX.
+ * Debugger-swig-lldb - the Java language bindings for LLDB's SBDebugger, also proposed upstream. This is deprecated. We now use the Python3 language bindings for LLDB.
+ * Debugger-agent-lldb - the connector for LLDB (10 or later recommended) on macOS, UNIX, and Windows.
 * Debugger-gadp - the connector for our custom wire protocol the Ghidra Asynchronous Debugging 
- Protocol.
- * Debugger-jpda - an in-development connector for Java and Dalvik debugging via JDI (i.e., JDWP).
+ Protocol. This is deprecated. It's replaced by Debugger-rmi-trace.
+ * Debugger-jpda - an in-development connector for Java and Dalvik debugging via JDI (i.e., JDWP). This is deprecated and not yet replaced.

 The Trace Modeling schema records machine state and markup over time.
-It rests on the same database framework as Programs, allowing trace recordings to be stored in a
-Ghidra project and shared via a server, if desired. Trace "recording" is a de facto requirement for
-displaying information in Ghidra's UI. However, only the machine state actually observed by the user
-(or perhaps a script) is recorded. For most use cases, the Trace is small and ephemeral, serving
-only to mediate between the UI components and the target's model. It supports many of the same 
-markup (e.g., disassembly, data types) as Programs, in addition to tracking active threads, loaded
-modues, breakpoints, etc.
+It rests on the same database framework as Programs, allowing trace recordings to be stored in a Ghidra project and shared via a server, if desired.
+Trace "recording" is a de facto requirement for displaying information in Ghidra's UI.
+The back-end connector has full discretion over what is recorded by using Trace RMI.
+Typically, only the machine state actually observed by the user (or perhaps a script) is recorded.
+For most use cases, the Trace is small and ephemeral, serving only to mediate between the UI components and the target's model.
+It supports many of the same markup (e.g., disassembly, data types) as Programs, in addition to tracking active threads, loaded modues, breakpoints, etc.

-Every model (or "adapter" or "connector" or "agent") implements the API specified in 
-Framework-Debugging. As a general rule in Ghidra, no component is allowed to access a native API and
-reside in the same JVM as the Ghidra UI. This allows us to contain crashes, preventing data loss. To
-accommodate this requirement -- given that debugging native applications is almost certainly going 
-to require access to native APIs -- we've developed the Ghidra Asynchronous Debugging Protocol. This
-protocol is tightly coupled to Framework-Debugging, essentially exposing its methods via RMI. The 
-protocol is built using Google's Protobuf library, providing a potential path for agent 
-implementations in alternative languages. GADP provides both a server and a client implementation. 
-The server can accept any model which adheres to the specification and expose it via TCP; the client
-does the converse. When a model is instantiated in this way, it is called an "agent," because it is
-executing in its own JVM. The other connectors, which do not use native APIs, may reside in Ghidra's
-JVM and typically implement alternative wire protocols, e.g., JDWP. In both cases, the 
-implementations inherit from the same interfaces.
+Every back end (or "adapter" or "connector" or "agent") employs the Trace RMI client to populate a trace database.
+As a general rule in Ghidra, no component is allowed to access a native API and reside in the same JVM as the Ghidra UI.
+This allows us to contain crashes, preventing data loss.
+To accommodate this requirement &mdash; given that debugging native applications is almost certainly going to require access to native APIs &mdash; we've developed the Trace RMI protocol.
+This also allows us to better bridge the language gap between Java and Python, which is supported by most native debuggers.
+This protocol is loosely coupled to Framework-TraceModeling, essentially exposing its methods via RMI, as well as some methods for controlling the UI.
+The protocol is built using Google's Protobuf library, providing a potential path for back-end implementations in alternative languages.
+We provide the Trace RMI server as a Ghidra component implemented in Java and the Trace RMI client as a Python3 package.
+A back-end implementation may be a stand-alone executable or script that accesses the native debugger's API, or a script or plugin for the native debugger.
+It then connects to Ghidra via Trace RMI to populate the trace database with information gleaned from that API.
+It should provide a set of diagnostic commands to control and monitor that connection.
+It should also use the native API to detect session and target changes so that Ghidra's UI consistently reflects the debugging session.

-The Debugger services maintain a collection of active connections and inspect each model for 
-potential targets. When a target is found, the service inspects the target environment and attempts
-to find a suitable opinion. Such an opinion, if found, instructs Ghidra how to map the objects, 
-addresses, registers, etc. from the target namespace into Ghidra's. The target is then handed to a 
-Trace Recorder which begins collecting information needed to populate the UI, e.g., the program 
-counter, stack pointer, and the bytes of memory they refer to.
+The old system relied on a "recorder" to discover targets and map them to traces in the proper Ghidra language.
+That responsibility is now delegated to the back end.
+Typically, it examines the target's architecture and immediately creates a trace upon connection.

 ### Developing a new connector

 So Ghidra does not yet support your favorite debugger?
-It is tempting, exciting, but also daunting to develop your own connector.
-Please finish reading this guide, and look carefully at the ones we have so far, and perhaps ask to
-see if we are already developing one. Of course, in time you might also search the internet to see 
-if others are developing one. There are quite a few caveats and gotchas, the most notable being that
-this interface is still in quite a bit of flux. When things go wrong, it could be because of, 
-without limitation: 1) a bug on your part, 2) a bug on our part, 3) a design flaw in the interfaces,
-or 4) a bug in the debugger/API you're adapting. We are still in the process of writing up this
-documentation. In the meantime, we recommend using the GDB and dbgeng.dll agents as examples.
+We believe the new system is much less daunting than the previous.
+Still, please finish reading this guide, and look carefully at the ones we have so far, and perhaps ask to see if we are already developing one.
+Of course, in time you might also search the internet to see if others are developing one.
+There are quite a few caveats and gotchas, the most notable being that this interface is still in some flux.
+When things go wrong, it could be because of, without limitation:

-You'll also need to provide launcher(s) so that Ghidra knows how to configure and start your 
-connector. Please provide launchers for your model in both configurations: as a connector in 
-Ghidra's JVM, and as a GADP agent. If your model requires native API access, you should only permit
-launching it as a GADP agent, unless you give ample warning in the launcher's description. Look at 
-the existing launchers for examples. There are many model implementation requirements that cannot be
-expressed in Java interfaces. Failing to adhere to those requirements may cause different behaviors 
-with and without GADP. Testing with GADP tends to reveal those implementation errors, but also 
-obscures the source of client method calls behind network messages. We've also codified (or 
-attempted to codify) these requirements in a suite of abstract test cases. See the `ghidra.dbg.test`
-package of Framework-Debugging, and again, look at existing implementations.
+1. A bug on your part
+2. A bug on our part
+3. A design flaw in the interfaces
+4. A bug in the debugger/API you're adapting
+
+We are still (yes, still) in the process of writing up this documentation.
+In the meantime, we recommend using the GDB and dbgeng agents as examples.
+Be sure to look at the Python code `src/main/py`!
+The deprecated Java code `src/main/java` is still included as we transition.
+
+You'll also need to provide launcher(s) so that Ghidra knows how to configure and start your connector.
+These are just shell scripts.
+We use bash scripts on Linux and macOS, and we use batch files on Windows.
+Try to include as many common use cases as makes sense for the debugger.
+This provides the most flexibility to users and examples to power users who might create derivative launchers.
+Look at the existing launchers for examples.
+
+For testing, please follow the examples for GDB.
+We no longer provide abstract classes that prescribe requirements.
+Instead, we just provide GDB as an example.
+Usually, we split our tests into three categories:
+
+ * Commands
+ * Methods
+ * Hooks
+
+The Commands tests check that the user CLI commands, conventionally implemented in `commands.py`, work correctly.
+In general, do the minimum connection setup, execute the command, and check that it produces the expected output and causes the expected effects.
+
+The Methods tests check that the remote methods, conventionally implemented in `methods.py`, work correctly.
+Many methods are just wrappers around CLI commands, some provided by the native debugger and some provided by `commands.py`.
+These work similarly to the commands test, except that they invoke methods instead of executing commands.
+Again, check the return value (rarely applicable) and that it causes the expected effects.
+
+The Hooks tests check that the back end is able to listen for session and target changes, e.g., knowing when the target stops.
+*The test should not "cheat" by executing commands or invoking methods that should instead be triggered by the listener.*
+It should execute the minimal commands to setup the test, then trigger an event.
+It should then check that the event in turn triggered the expected effects, e.g., updating PC upon the target stopping.
+
+Whenever you make a change to the Python code, you'll need to re-assemble the package's source.
+
+```
+gradle assemblePyPackage
+```
+
+This is required in case your package includes generated source, as is the case for Debugger-rmi-trace.
+If you want to create a new Ghidra module for your connector (recommended) use an existing one's `build.gradle` as a template.
+A key part is applying the `hasPythonPackage.gradle` script.

 ### Adding a new platform

-If an existing connector exists for a suitable debugger on the desired platform, then adding it may
-be very simple. For example, both the x86 and ARM platforms are supported by GDB, so even though 
-we're currently focused on x86 support, we've provided the opinions needed for Ghidra to debug ARM
-platforms (and several others) via GDB. These opinions are kept in the "Debugger" project, not their
-respective "agent" projects. We imagine there are a number of platforms that could be supported 
-almost out of the box, except that we haven't written the necessary opinions, yet. Take a look at 
-the existing ones for examples.
+If a connector already exists for a suitable debugger on the desired platform, then adding it may be very simple.
+For example, many platforms are supported by GDB, so even though we're currently focused on x86-64 (and to some extent arm64) support, we've provided the mappings for many.
+These mappings are conventionally kept in each connector's `arch.py` file.

-In general, to write a new opinion, you need to know: 1) What the platform is called (including 
-variant names) by the debugger, 2) What the processor language is called by Ghidra, 3) If 
-applicable, the mapping of target address spaces into Ghidra's address spaces, 4) If applicable, the
-mapping of target register names to those in Ghidra's processor language. In most cases (3) and (4) 
-are already implemented by default mappers, so you can use those same mappers in your opinion. Once 
-you have the opinion written, you can try debugging and recording a target. If Ghidra finds your 
-opinion applicable to that target, it will attempt to record, and then you can work out the kinds 
-from there. Again, we have a bit of documentation to do regarding common pitfalls.
+In general, to update `arch.py`, you need to know:
+
+1. What the platform is called (including variant names) by the debugger
+2. What the processor language is called by Ghidra
+3. If applicable, the mapping of target address spaces into Ghidra's address spaces
+4. If applicable, the mapping of target register names to those in Ghidra's processor language
+
+In most cases (3) and (4) are already implemented by the included mappers.
+Naturally, you'll want to test the special cases, preferably in automated tests.

 ### Emulation

-The most obvious integration path for 3rd-party emulators is to write a "connector." However, p-code
-emulation is now an integral feature of the Ghidra UI, and it has a fairly accessible API. Namely, 
-for interpolation between machines states recorded in a trace, and extrapolation into future machine
-states. Integration of such emulators may still be useful to you, but we recommend trying the p-code
-emulator to see if it suits your needs for emulation in Ghidra before pursuing integration of 
-another emulator.
+The most obvious integration path for 3rd-party emulators is to write a "connector."
+However, p-code emulation is an integral feature of the Ghidra UI, and it has a fairly accessible API.
+Namely, for interpolation between machines states recorded in a trace, and extrapolation into future machine states.
+Integration of such emulators may still be useful to you, but we recommend trying the p-code emulator to see if it suits your needs for emulation in Ghidra before pursuing integration of another emulator.
+We also provide out-of-the-box QEMU integration via GDB.

 ### Contributing

-Whether submitting help tickets and pull requests, please tag those related to the debugger with 
-"Debugger" so that we can triage them more quickly.
-
-To set up your environment, in addition to the usual Gradle tasks, process the Protobuf 
-specification for GADP:
-
-```bash
-gradle generateProto
-```
-
-If you already have an environment set up in Eclipse, please re-run `gradle prepDev eclipse` and 
-import the new projects.
+When submitting help tickets and pull requests, please tag those related to the debugger with "Debugger" so that we can triage them more quickly.


 [java]: https://dev.java