mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-01-08 22:48:25 -05:00
unify to HWQueue [pr] (#7812)
* unify to HWCommandQueue [pr] * all is HWQueue
This commit is contained in:
@@ -6,19 +6,19 @@ The main aspect of HCQ-compatible runtimes is how they interact with devices. In
|
||||
|
||||
### Command Queues
|
||||
|
||||
To interact with devices, there are 2 types of queues: `HWComputeQueue` and `HWCopyQueue`. Commands which are defined in a base `HWCommandQueue` class should be supported by both queues. These methods are timestamp and synchronization methods like [signal](#tinygrad.runtime.support.hcq.HWCommandQueue.signal) and [wait](#tinygrad.runtime.support.hcq.HWCommandQueue.wait).
|
||||
To interact with devices you create a `HWQueue`. Some methods are required, like timestamp and synchronization methods like [signal](#tinygrad.runtime.support.hcq.HWQueue.signal) and [wait](#tinygrad.runtime.support.hcq.HWQueue.wait), while others are dependent on it being a compute or copy queue.
|
||||
|
||||
For example, the following Python code enqueues a wait, execute, and signal command on the HCQ-compatible device:
|
||||
```python
|
||||
HWComputeQueue().wait(signal_to_wait, value_to_wait) \
|
||||
.exec(program, args_state, global_dims, local_dims) \
|
||||
.signal(signal_to_fire, value_to_fire) \
|
||||
.submit(your_device)
|
||||
HWQueue().wait(signal_to_wait, value_to_wait) \
|
||||
.exec(program, args_state, global_dims, local_dims) \
|
||||
.signal(signal_to_fire, value_to_fire) \
|
||||
.submit(your_device)
|
||||
```
|
||||
|
||||
Each runtime should implement the required functions that are defined in the `HWCommandQueue`, `HWComputeQueue`, and `HWCopyQueue` classes.
|
||||
Each runtime should implement the required functions that are defined in the `HWQueue` classes.
|
||||
|
||||
::: tinygrad.runtime.support.hcq.HWCommandQueue
|
||||
::: tinygrad.runtime.support.hcq.HWQueue
|
||||
options:
|
||||
members: [
|
||||
"signal",
|
||||
@@ -28,21 +28,9 @@ Each runtime should implement the required functions that are defined in the `HW
|
||||
"update_wait",
|
||||
"bind",
|
||||
"submit",
|
||||
]
|
||||
show_source: false
|
||||
|
||||
::: tinygrad.runtime.support.hcq.HWComputeQueue
|
||||
options:
|
||||
members: [
|
||||
"memory_barrier",
|
||||
"exec",
|
||||
"update_exec",
|
||||
]
|
||||
show_source: false
|
||||
|
||||
::: tinygrad.runtime.support.hcq.HWCopyQueue
|
||||
options:
|
||||
members: [
|
||||
"copy",
|
||||
"update_copy",
|
||||
]
|
||||
@@ -82,9 +70,9 @@ The following Python code demonstrates the usage of signals:
|
||||
```python
|
||||
signal = your_device.signal_t()
|
||||
|
||||
HWComputeQueue().timestamp(signal) \
|
||||
.signal(signal, value_to_fire) \
|
||||
.submit(your_device)
|
||||
HWQueue().timestamp(signal) \
|
||||
.signal(signal, value_to_fire) \
|
||||
.submit(your_device)
|
||||
|
||||
signal.wait(value_to_fire)
|
||||
signaled_value = signal.value # should be the same as `value_to_fire`
|
||||
@@ -134,17 +122,17 @@ Backends must adhere to the `HCQBuffer` protocol when returning allocation resul
|
||||
members: true
|
||||
show_source: false
|
||||
|
||||
**Lifetime**: The `HCQArgsState` is passed to `HWComputeQueue.exec` and is guaranteed not to be freed until `HWComputeQueue.submit` for the same queue is called.
|
||||
**Lifetime**: The `HCQArgsState` is passed to `HWQueue.exec` and is guaranteed not to be freed until `HWQueue.submit` for the same queue is called.
|
||||
|
||||
### Synchronization
|
||||
|
||||
HCQ-compatible devices use a global timeline signal for synchronizing all operations. This mechanism ensures proper ordering and completion of tasks across the device. By convention, `self.timeline_value` points to the next value to signal. So, to wait for all previous operations on the device to complete, wait for `self.timeline_value - 1` value. The following Python code demonstrates the typical usage of signals to synchronize execution to other operations on the device:
|
||||
|
||||
```python
|
||||
HWComputeQueue().wait(your_device.timeline_signal, your_device.timeline_value - 1) \
|
||||
.exec(...)
|
||||
.signal(your_device.timeline_signal, your_device.timeline_value) \
|
||||
.submit(your_device)
|
||||
HWQueue().wait(your_device.timeline_signal, your_device.timeline_value - 1) \
|
||||
.exec(...)
|
||||
.signal(your_device.timeline_signal, your_device.timeline_value) \
|
||||
.submit(your_device)
|
||||
your_device.timeline_value += 1
|
||||
|
||||
# Optionally wait for execution
|
||||
@@ -153,5 +141,5 @@ your_device.timeline_signal.wait(your_device.timeline_value - 1)
|
||||
|
||||
## HCQGraph
|
||||
|
||||
[HCQGraph](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime/graph/hcq.py) is a core feature that implements `GraphRunner` for HCQ-compatible devices. `HCQGraph` builds a static `HWComputeQueue` and `HWCopyQueue` for all operations per device. To optimize enqueue time, only the necessary parts of the queues are updated for each run using the update APIs of the queues, avoiding a complete rebuild.
|
||||
[HCQGraph](https://github.com/tinygrad/tinygrad/tree/master/tinygrad/runtime/graph/hcq.py) is a core feature that implements `GraphRunner` for HCQ-compatible devices. `HCQGraph` builds static `HWQueue` for all operations per device. To optimize enqueue time, only the necessary parts of the queues are updated for each run using the update APIs of the queues, avoiding a complete rebuild.
|
||||
Optionally, queues can implement a `bind` API, which allows further optimization by eliminating the need to copy the queues into the device ring.
|
||||
|
||||
Reference in New Issue
Block a user