mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-04-29 03:00:14 -04:00
* duration * handwritten tests * rdna3 pickle * rdna4 pickle * asserts * rm that * wmma work * r4 * this shows the overlap well * ohh okay it goes back * are ds_load and ds_store different queues on RDNA4? * print msg, v_mul_lo_u32 is 4 cycles? * discover * wmma something * wmma comment * less * less * better comments * work * inst st * delay column * better cli * emit_alt * update test_handwritten * work
SQTT Profiling
Getting SQ Thread Trace
VIZ=2 to enable SQTT profiling.
SQTT_ITRACE_SE_MASK=X to select shader engines for instruction tracing, -1 = all, 0 = disabled, >0 = SE bitmask, default 0b11.
SQTT_BUFFER_SIZE=X to change size of SQTT buffer (per shader engine, 6 SEs on 7900xtx) in megabytes, default 256.
Viewing the traces
- Web UI:
tinygrad/viz/serve.py - Command line:
python -m tinygrad.renderer.amd.sqtt