init hcq args state (#6046)

* init hcq args state

* cleaner

* amd

* fillargs

* fixes

* myoy

* docs

* fix

* not needed

* spacing
This commit is contained in:
nimlgen
2024-08-13 17:11:58 +03:00
committed by GitHub
parent 9d2ea94fe9
commit fa84e6ec48
5 changed files with 75 additions and 53 deletions

View File

@@ -11,7 +11,7 @@ To interact with devices, there are 2 types of queues: `HWComputeQueue` and `HWC
For example, the following Python code enqueues a wait, execute, and signal command on the HCQ-compatible device:
```python
HWComputeQueue().wait(signal_to_wait, value_to_wait) \
.exec(program, kernargs_ptr, global_dims, local_dims) \
.exec(program, args_state, global_dims, local_dims) \
.signal(signal_to_fire, value_to_fire) \
.submit(your_device)
```
@@ -118,13 +118,22 @@ Backends must adhere to the `HCQBuffer` protocol when returning allocation resul
### HCQ Compatible Program
The `HCQProgram` is a helper base class for defining programs compatible with HCQ-compatible devices. Currently, the arguments consist of pointers to buffers, followed by `vals` fields. The convention expects a packed struct containing the passed pointers, followed by `vals` located at `kernargs_args_offset`.
`HCQProgram` is a base class for defining programs compatible with HCQ-enabled devices. It provides a flexible framework for handling different argument layouts (see `HCQArgsState`).
::: tinygrad.device.HCQProgram
options:
members: true
show_source: false
#### Arguments State
`HCQArgsState` is a base class for managing the argument state for HCQ programs. Backend implementations should create a subclass of `HCQArgsState` to manage arguments for the given program.
::: tinygrad.device.HCQArgsState
options:
members: true
show_source: false
### Synchronization
HCQ-compatible devices use a global timeline signal for synchronizing all operations. This mechanism ensures proper ordering and completion of tasks across the device. By convention, `self.timeline_value` points to the next value to signal. So, to wait for all previous operations on the device to complete, wait for `self.timeline_value - 1` value. The following Python code demonstrates the typical usage of signals to synchronize execution to other operations on the device: