unify to HWQueue [pr] (#7812)

* unify to HWCommandQueue [pr]

* all is HWQueue
This commit is contained in:
George Hotz
2024-11-21 10:33:08 +08:00
committed by GitHub
parent 11cea00090
commit 9df5a62c5e
9 changed files with 60 additions and 70 deletions

View File

@@ -6,19 +6,19 @@ The main aspect of HCQ-compatible runtimes is how they interact with devices. In
### Command Queues
To interact with devices, there are 2 types of queues: `HWComputeQueue` and `HWCopyQueue`. Commands which are defined in a base `HWCommandQueue` class should be supported by both queues. These methods are timestamp and synchronization methods like [signal](#tinygrad.runtime.support.hcq.HWCommandQueue.signal) and [wait](#tinygrad.runtime.support.hcq.HWCommandQueue.wait).
To interact with devices you create a `HWQueue`. Some methods are required, like timestamp and synchronization methods like [signal](#tinygrad.runtime.support.hcq.HWQueue.signal) and [wait](#tinygrad.runtime.support.hcq.HWQueue.wait), while others are dependent on it being a compute or copy queue.
For example, the following Python code enqueues a wait, execute, and signal command on the HCQ-compatible device:
```python
HWComputeQueue().wait(signal_to_wait, value_to_wait) \
.exec(program, args_state, global_dims, local_dims) \
.signal(signal_to_fire, value_to_fire) \
.submit(your_device)
HWQueue().wait(signal_to_wait, value_to_wait) \
.exec(program, args_state, global_dims, local_dims) \
.signal(signal_to_fire, value_to_fire) \
.submit(your_device)
```
Each runtime should implement the required functions that are defined in the `HWCommandQueue`, `HWComputeQueue`, and `HWCopyQueue` classes.
Each runtime should implement the required functions that are defined in the `HWQueue` classes.
::: tinygrad.runtime.support.hcq.HWCommandQueue
::: tinygrad.runtime.support.hcq.HWQueue
options:
members: [
"signal",
@@ -28,21 +28,9 @@ Each runtime should implement the required functions that are defined in the `HW
"update_wait",
"bind",
"submit",
]
show_source: false
::: tinygrad.runtime.support.hcq.HWComputeQueue
options:
members: [
"memory_barrier",
"exec",
"update_exec",
]
show_source: false
::: tinygrad.runtime.support.hcq.HWCopyQueue
options:
members: [
"copy",
"update_copy",
]
@@ -82,9 +70,9 @@ The following Python code demonstrates the usage of signals:
```python
signal = your_device.signal_t()
HWComputeQueue().timestamp(signal) \
.signal(signal, value_to_fire) \
.submit(your_device)
HWQueue().timestamp(signal) \
.signal(signal, value_to_fire) \
.submit(your_device)
signal.wait(value_to_fire)
signaled_value = signal.value # should be the same as `value_to_fire`
@@ -134,17 +122,17 @@ Backends must adhere to the `HCQBuffer` protocol when returning allocation resul
members: true
show_source: false
**Lifetime**: The `HCQArgsState` is passed to `HWComputeQueue.exec` and is guaranteed not to be freed until `HWComputeQueue.submit` for the same queue is called.
**Lifetime**: The `HCQArgsState` is passed to `HWQueue.exec` and is guaranteed not to be freed until `HWQueue.submit` for the same queue is called.
### Synchronization
HCQ-compatible devices use a global timeline signal for synchronizing all operations. This mechanism ensures proper ordering and completion of tasks across the device. By convention, `self.timeline_value` points to the next value to signal. So, to wait for all previous operations on the device to complete, wait for `self.timeline_value - 1` value. The following Python code demonstrates the typical usage of signals to synchronize execution to other operations on the device:
```python
HWComputeQueue().wait(your_device.timeline_signal, your_device.timeline_value - 1) \
.exec(...)
.signal(your_device.timeline_signal, your_device.timeline_value) \
.submit(your_device)
HWQueue().wait(your_device.timeline_signal, your_device.timeline_value - 1) \
.exec(...)
.signal(your_device.timeline_signal, your_device.timeline_value) \
.submit(your_device)
your_device.timeline_value += 1
# Optionally wait for execution
@@ -153,5 +141,5 @@ your_device.timeline_signal.wait(your_device.timeline_value - 1)
## HCQGraph
[HCQGraph](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime/graph/hcq.py) is a core feature that implements `GraphRunner` for HCQ-compatible devices. `HCQGraph` builds a static `HWComputeQueue` and `HWCopyQueue` for all operations per device. To optimize enqueue time, only the necessary parts of the queues are updated for each run using the update APIs of the queues, avoiding a complete rebuild.
[HCQGraph](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime/graph/hcq.py) is a core feature that implements `GraphRunner` for HCQ-compatible devices. `HCQGraph` builds static `HWQueue` for all operations per device. To optimize enqueue time, only the necessary parts of the queues are updated for each run using the update APIs of the queues, avoiding a complete rebuild.
Optionally, queues can implement a `bind` API, which allows further optimization by eliminating the need to copy the queues into the device ring.