mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-01-08 22:48:25 -05:00
@@ -2,7 +2,7 @@
|
||||
|
||||
## Overview
|
||||
|
||||
The main aspect of HCQ-compatible runtimes is how they interact with devices. In HCQ, all interactions with devices occur in a hardware-friendly manner using [command queues](#commandqueues). This approach allows commands to be issued directly to devices, bypassing runtime overhead such as HIP, or CUDA. Additionally, by using the HCQ API, these runtimes can benefit from various optimizations and features, including [HCQGraph](#hcqgraph) and built-in profiling capabilities.
|
||||
The main aspect of HCQ-compatible runtimes is how they interact with devices. In HCQ, all interactions with devices occur in a hardware-friendly manner using [command queues](#commandqueues). This approach allows commands to be issued directly to devices, bypassing runtime overhead such as HIP or CUDA. Additionally, by using the HCQ API, these runtimes can benefit from various optimizations and features, including [HCQGraph](#hcqgraph) and built-in profiling capabilities.
|
||||
|
||||
### Command Queues
|
||||
|
||||
@@ -116,6 +116,15 @@ Backends must adhere to the `HCQCompatAllocRes` protocol when returning allocati
|
||||
members: true
|
||||
show_source: false
|
||||
|
||||
### HCQ Compatible Program
|
||||
|
||||
The `HCQCompatProgram` is a helper base class for defining programs compatible with HCQ-compatible devices. Currently, the arguments consist of pointers to buffers, followed by `vals` fields. The convention expects a packed struct containing the passed pointers, followed by `vals` located at `kernargs_args_offset`.
|
||||
|
||||
::: tinygrad.device.HCQCompatProgram
|
||||
options:
|
||||
members: true
|
||||
show_source: false
|
||||
|
||||
### Synchronization
|
||||
|
||||
HCQ-compatible devices use a global timeline signal for synchronizing all operations. This mechanism ensures proper ordering and completion of tasks across the device. By convention, `self.timeline_value` points to the next value to signal. So, to wait for all previous operations on the device to complete, wait for `self.timeline_value - 1` value. The following Python code demonstrates the typical usage of signals to synchronize execution to other operations on the device:
|
||||
|
||||
Reference in New Issue
Block a user