20 Commits

Author SHA1 Message Date
Nicolas Iooss
4f3f1059dc Add eBPF instruction CALLX for indirect calls
When clang encounters indirect calls in eBPF programs, it emits a call
instruction with a register parameter (`BPF_X`) instead of an immediate
value (`BPF_K`). This encoding (`BPF_JMP | BPF_CALL | BPF_X = 0x8d`) is
decoded by llvm-objdump as `callx`.

For example, here is a simple C program with an indirect call:

    extern void (*ptr_to_some_function)(void);
    void call_ptr_to_some_function(void) {
        ptr_to_some_function();
    }

Compiling and disassembling it gives with clang 14.0 (and LLVM 14.0):

    $ clang -O2 -target bpf -c indirect_call.c -o indirect_call.ebpf
    $ llvm-objdump -rd indirect_call.ebpf

    indirect_call.ebpf:  file format elf64-bpf

    Disassembly of section .text:

    0000000000000000 <call_ptr_to_some_function>:
           0:  18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00  r1 = 0 ll
                    0000000000000000:  R_BPF_64_64  ptr_to_some_function
           2:  79 11 00 00 00 00 00 00  r1 = *(u64 *)(r1 + 0)
           3:  8d 00 00 00 01 00 00 00  callx r1
           4:  95 00 00 00 00 00 00 00  exit

Contrary to usual eBPF instructions, `callx`'s register operand is
encoded in the immediate field. This encoding is actually specific to
LLVM (and clang). GCC used the destination register to store the target
register.

LLVM 19.1 was modified to use GCC's encoding:
https://github.com/llvm/llvm-project/pull/81546 ("BPF: Change callx insn
encoding"). For example, in an Alpine Linux 3.21 system:

    $ clang -target bpf --version
    Alpine clang version 19.1.4
    Target: bpf
    Thread model: posix
    InstalledDir: /usr/lib/llvm19/bin

    $ clang -O2 -target bpf -c indirect_call.c -o indirect_call.ebpf
    $ llvm-objdump -rd indirect_call.ebpf

    indirect_call.ebpf:  file format elf64-bpf

    Disassembly of section .text:

    0000000000000000 <call_ptr_to_some_function>:
           0:  18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00  r1 = 0x0 ll
                    0000000000000000:  R_BPF_64_64  ptr_to_some_function
           2:  79 11 00 00 00 00 00 00  r1 = *(u64 *)(r1 + 0x0)
           3:  8d 01 00 00 00 00 00 00  callx r1
           4:  95 00 00 00 00 00 00 00  exit

The instruction is now encoded `8d 01 00...`.

For reference, here are similar commands using GCC showing it is using
the same encoding (here, compiler option `-mxbpf` is required to enable
several features including indirect calls, cf.
https://gcc.gnu.org/onlinedocs/gcc-12.4.0/gcc/eBPF-Options.html ).

    $ bpf-gcc --version
    bpf-gcc (12-20220319-1ubuntu1+2) 12.0.1 20220319 (experimental) [master r12-7719-g8ca61ad148f]

    $ bpf-gcc -O2 -c indirect_call.c -o indirect_call.ebpf -mxbpf
    $ bpf-objdump -mxbpf -rd indirect_call.ebpf

    indirect_call_gcc-12.ebpf:     file format elf64-bpfle

    Disassembly of section .text:

    0000000000000000 <call_ptr_to_some_function>:
       0:  18 00 00 00 00 00 00 00   lddw %r0,0
       8:  00 00 00 00 00 00 00 00
          0: R_BPF_INSN_64  ptr_to_some_function
      10:  79 01 00 00 00 00 00 00   ldxdw %r1,[%r0+0]
      18:  8d 01 00 00 00 00 00 00   call %r1
      20:  95 00 00 00 00 00 00 00   exit

Add both `callx` instruction encodings to eBPF processor.

By the way, the eBPF Verifier used by Linux kernel currently forbids
indirect calls (it fails when `BPF_SRC(insn->code) != BPF_K`, in
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/verifier.c?h=v6.14#n19141
). But other deployments of eBPF may already support this feature.
2025-07-30 16:26:42 +02:00
Nicolas Iooss
24d19f6e8c Add eBPF ISA v4 instructions
In 2023, the eBPF instruction set was modified to add several
instructions related to signed operations (load with sign-extension,
signed division, etc.), a 32-bit jump instruction and some byte-swap
instructions. This became version 4 of eBPF ISA.

Here are some references about this change:

- https://pchaigno.github.io/bpf/2021/10/20/ebpf-instruction-sets.html
  (a blog post about eBPF instruction set extensions)
- https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/
  (documentation sent to Linux Kernel mailing list)
- https://www.rfc-editor.org/rfc/rfc9669.html#name-sign-extension-load-operati
  (IETF's BPF Instruction Set Architecture standard defined the new
  instructions)
- https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n1859
  (implementation of signed division and remainder in Linux kernel.
  This shows that 32-bit signed DIV and signed MOD are zero-extending
  the result in DST)
- https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n2135
  (implementation of signed memory load in Linux kernel)
- https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f9a1ea821ff25353a0e80d971e7958cd55b47a3
  (commit which added signed memory load instructions in Linux kernel)

This can be tested with a recent enough version of clang and LLVM (this
works with clang 19.1.4 on Alpine 3.21).
For example for signed memory load instructions:

    signed int sext_8bit(signed char x) {
        return x;
    }

produces:

    $ clang -O0 -target bpf -mcpu=v4 -c test.c -o test.ebpf
    $ llvm-objdump -rd test.ebpf
    ...
    0000000000000000 <sext_8bit>:
           0:  73 1a ff ff 00 00 00 00  *(u8 *)(r10 - 0x1) = r1
           1:  91 a1 ff ff 00 00 00 00  r1 = *(s8 *)(r10 - 0x1)
           2:  bc 10 00 00 00 00 00 00  w0 = w1
           3:  95 00 00 00 00 00 00 00  exit

(The second instruction is a signed memory load)

Instruction MOVS (Sign extend register MOV) uses offset to encode the
conversion (whether the source register is to be considered as signed
8-bit, 16-bit or 32-bit integer). The mnemonic for these instructions is
quite unclear:

- They are all named MOVS in the proposal
  https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/
- LLVM and Linux disassemblers only display pseudo-code (`r0 = (s8)r1`)
- RFC 9669 (https://datatracker.ietf.org/doc/rfc9669/) uses MOVSX for
  all instructions.
- GCC uses MOVS for all instructions:
  https://github.com/gcc-mirror/gcc/blob/releases/gcc-14.1.0/gcc/config/bpf/bpf.md?plain=1#L326-L365

To make the disassembled code clearer, decode such instructions with a
size suffix: MOVSB, MOVSH, MOVSW.

The decoding of instructions 32-bit JA, BSWAP16, BSWAP32 and BSWAP64 is
straightforward.
2025-07-29 12:45:06 +00:00
Ryan Kurtz
0d8a39a07a Merge remote-tracking branch
'origin/GP-5857_ghidorahrex_PR-7979_niooss-ledger_ebpf-fix-load-zext'
into patch (Closes #7979)
2025-07-29 08:24:03 -04:00
Ryan Kurtz
b4239911c9 Merge remote-tracking branch
'origin/GP-5858_ghidorahrex_PR-7929_niooss-ledger_fix-ebpf-call-operand'
into patch (Closes #7929)
2025-07-29 08:21:27 -04:00
Nicolas Iooss
e2de11d5b2 Fix eBPF zero-extend load instructions
When a loading less than 8 bytes to a register, the value is supposed to
be zero-extended. This is what the eBPF execution engine in the Linux
kernel does, in
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n2113
This is also what is specified in RFC 9669 which standardised BPF ISA:
https://www.rfc-editor.org/rfc/rfc9669.html#name-regular-load-and-store-oper

Add the missing `zext` calls in the semantic section of instructions
LDXW, LDXH and LDXB. While at it, add them to other load instructions.

For information, the issue can be seen when analyzing this C program:

    unsigned int div_by_1000(unsigned int value) {
        return value / 1000;
    }

Compiling it with clang gives:

    $ clang -O0 -target bpf -c division.c -o division.ebpf
    $ bpf-objdump -rd division.ebpf
    division.ebpf:     file format elf64-bpfle

    Disassembly of section .text:

    0000000000000000 <div_by_1000>:
       0:    63 1a fc ff 00 00 00 00     stxw [%fp+-4],%r1
       8:    61 a0 fc ff 00 00 00 00     ldxw %r0,[%fp+-4]
      10:    37 00 00 00 e8 03 00 00     div %r0,0x3e8
      18:    95 00 00 00 00 00 00 00     exit

Ghidra decompiles this program as:

    ulonglong div_by_1000(uint param_1)
    {
      undefined4 in_stack_00000000;
      return CONCAT44(in_stack_00000000,param_1) / 1000;
    }

This `in_stack_00000000` comes from the way the parameter is loaded from
the stack. The listing shows the following disassembly and p-code
operations:

    ram:00100008 61 a0 fc ff 00       LDXW       R0,[R10 + -0x4=>Stack[-0x4]]
                 00 00 00
                            $U3e00:8 = INT_ADD R10, -4:8
                            R0 = LOAD ram($U3e00:8)

This shows the value is indeed loaded from 8 bytes at `$U3e00:8` instead
of 4.

After adding `zext` calls, Ghidra decodes the same instruction as:

    ram:00100008 61 a0 fc ff 00       LDXW       R0,[R10 + -0x4=>local_4]
                 00 00 00
                            $U4100:8 = INT_ADD R10, -4:8
                            $U4180:4 = LOAD ram($U4100:8)
                            R0 = INT_ZEXT $U4180:4

This only loads 4 bytes from the stack, as expected.
Moreover the decompilation view is now correct:

    ulonglong div_by_1000(uint param_1)
    {
      return (ulonglong)param_1 / 1000;
    }
2025-07-07 16:28:00 +02:00
Nicolas Iooss
c1d96a2140 Fix eBPF CALL operand decoding
The operand of the CALL instruction missed multiplying the immediate
value by 8. Without this, calls are not decoded correctly.

Such a CALL instruction can be emitted when compiling this simple
`single_call.c` program:

    static int one(void) {
        return 1;
    }

    int call_one(void) {
        return one();
    }

with:

    clang -O0 -target bpf -c single_call.c -o single_call.ebpf

Disassembling with LLVM shows:

    $ llvm-objdump -d single_call.ebpf
    single_call.ebpf:	file format elf64-bpf

    Disassembly of section .text:

    0000000000000000 <call_one>:
           0:	85 10 00 00 01 00 00 00	call 1
           1:	95 00 00 00 00 00 00 00	exit

    0000000000000010 <one>:
           2:	b7 00 00 00 01 00 00 00	r0 = 1
           3:	95 00 00 00 00 00 00 00	exit

The first instruction ("call 1") calls the function located at 0x10 (at
index `2:` in the listing). Ghidra considered the call to target
address 9 instead (as `inst_next = 8` and `imm = 1`). Fix this by
multiplying `imm` by 8 when encountering a `disp32` operand (which is
only used by instruction `CALL`).

Adjust ELF Relocation R_BPF_64_32 to take into account for this
multiplication by 8. Actually it is documented to compute (S + A) / 8 - 1
so the division by 8 was missing.
2025-07-07 16:26:31 +02:00
Nicolas Iooss
adb0eac98a Add support for big endian eBPF programs 2025-07-07 16:13:37 +02:00
Nicolas Iooss
52cb7a36e6 Fix the semantics of eBPF byte swap instructions
eBPF byte swap operations (BE16, BE32, BE64, LE16, LE32, LE64) have
semantics that depend on the endianness of the host processor executing
the eBPF program. For example, on a Little-Endian CPU, BE16 swaps the 2
lowest significant bytes of the given destination register.

The semantic section of LE16 contains:

    { dst=((dst) >> 8) | ((dst) << 8); }

This contains several issues:

- It assumes the instruction always swaps the bytes. This should only
  happen on Big-Endian host CPU.
- If `dst` does not contain a 16-bit value (meaning `dst >> 16 != 0`),
  the computed value is wrong. The value should be properly masked. For
  example the Linux kernel defines in
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/swab.h?h=v6.14#L14

    #define ___constant_swab16(x) ((__u16)(             \
            (((__u16)(x) & (__u16)0x00ffU) << 8) |      \
            (((__u16)(x) & (__u16)0xff00U) >> 8)))

As the endianness of the CPU has to be the same as the eBPF program
(defined in the ELF header), introduce a macro `ENDIAN` and use it to
implement the byte swap operations.
2025-07-07 16:13:36 +02:00
ghidra1
cd6d45c64f GP-0 Corrected NPE for eBPF ELF import. (Closes #8034) 2025-05-08 17:18:19 -04:00
Ryan Kurtz
faf55a8de6 GP-5078: Improvements to Ghidra Module directory layout 2024-10-31 10:34:26 -04:00
ghidra1
1c7232d5a6 Merge remote-tracking branch
'origin/GP-4737_ghidra1_ElfArmHandleUnresolvedRelocSymbol'
(Closes #6673)
2024-07-01 13:40:19 -04:00
ghidra1
036ef9d0db GP-4737 - Improve ELF relocation handling of unresolved symbol 2024-07-01 13:06:54 -04:00
ghidra1
28846ef279 GP-0 Corrected formatting issue 2024-06-26 16:55:02 -04:00
ghidra1
eb5e6a323a GP-4682 cleanup eBPF analyzers and BPF helper function identification 2024-06-24 12:39:52 -04:00
ghidra1
ce9418d831 GP-4398 minor formatting 2024-03-06 10:58:27 -05:00
mumbel
9a22180efa Add issing ELF reloc 2024-03-05 22:19:27 -06:00
ghidra1
3ead54f0ac GP-4239 Transitioned to new AbstractElfRelocationHandler implementation which uses ElfRelocationType enums specific to each handler. 2024-02-12 10:52:25 -05:00
Ryan Kurtz
70405b07b0 GP-2257: Fixing compilation error 2023-05-01 06:54:27 -04:00
emteere
e0e9c0d137 GP-2257 minor refactoring to collapse constructors, added sleigh lint
flag, removed killed by call causing CONCATs
2023-04-29 21:56:45 +00:00
Nalen98
79102c13c4 eBPF processor support
Signed-off-by: Nalen98 <nalenaskeyx@gmail.com>
2023-04-10 00:54:28 +03:00