Commit Graph

11076 Commits

Author SHA1 Message Date
George Hotz
28c2776999 fix on OSX 2025-11-17 14:19:39 -08:00
George Hotz
e13580a1d7 print special ops in postrange 2025-11-17 13:35:29 -08:00
George Hotz
98e9e73286 hotfix: amd_uop_matmul getenvs 2025-11-17 13:26:01 -08:00
qazal
e7e1935225 cleanup sqtt/test_timing (#13315) 2025-11-18 04:28:05 +08:00
wozeparrot
33773fda87 tk initial mi350 (#13289) 2025-11-17 11:46:32 -08:00
nimlgen
e2cee64050 Revert "hcq: add tag to exec events (#13311)" (#13314)
This reverts commit f63ded5817.
2025-11-17 22:15:31 +03:00
chenyu
646372490c move tiktoken import in llama3 (#13316)
only Tokenizer requires that
2025-11-17 14:09:37 -05:00
qazal
a37f221e44 viz: visualize waves in the timeline (#13292)
* viz: visualize waves in the timeline

* timeline in format

* per step

* rm that
2025-11-17 22:04:21 +08:00
nimlgen
f63ded5817 hcq: add tag to exec events (#13311)
* hcq: add tag to exec events

* f

* fix

* fix
2025-11-17 16:59:30 +03:00
qazal
50a443f558 viz: add shader engine to wave exec payload (#13310)
* viz: show sqtt shader engine

* order it from smallest unit

* easier to config
2025-11-17 19:11:34 +08:00
nimlgen
9bb17c53ea amd: timer fix (#13267) 2025-11-17 13:59:03 +03:00
George Hotz
55be95da15 cleanup sqtt raw parser (#13309)
* cleanup sqtt raw parser

* better names (don't merge yet)

* clean up amd

* a few more names

* one more filter
2025-11-16 13:11:51 -08:00
George Hotz
cabd4add48 more work parsing SQTT, separate VIZ/PROFILE (#13308)
* more work parsing SQTT

* more minimal runner

* sep VIZ/PROFILE

* parse print new

* improve parser

* more filter

* that

* split them

* lil cleanup

* skip flaky test

* AQL in mmapeak
2025-11-16 10:40:39 -08:00
qazal
13efdf8c31 test s_nop stall (#13307) 2025-11-17 00:59:39 +08:00
George Hotz
295600dc5a saturday coffee shop work parsing the att format (#13295)
* saturday coffee shop work parsing the att format

* add examples

* parser

* classes of packets

* fully vibe coded parser

* vibing

* empty

* some vibe names

* vibes

* most of these are wrong

* more vibes

* better names

* parsing

* parse

* cleanup parser

* touchups
2025-11-16 08:25:51 -08:00
Christopher Milan
a9ed241172 properly suppress NIRRenderer.__del__ error (#13299) 2025-11-16 18:58:04 +03:00
qazal
c70b06ec19 sqtt test_timing work (#13304)
* sqtt test_timing cleanups

* only the instruction

* v_mfma_f32_16x16x32_f16 16 cycles, only after second one though
2025-11-16 23:49:24 +08:00
chenyu
8f0e747b3a Tensor._tri with arange (#13297) 2025-11-16 10:21:16 -05:00
chenyu
6372c95094 disable benchmark MobileNetV2 on DSP (#13305)
failed on tinyc2
2025-11-16 09:42:52 -05:00
Christopher Milan
61625a3898 fix objc finalizing bug (#13296) 2025-11-16 12:43:04 +03:00
nimlgen
acbe6361ab qcom: suppress_finalizing to free (#13294) 2025-11-16 11:49:23 +03:00
wozeparrot
ef42334239 tk: load store cleanup (#13290) 2025-11-15 17:08:23 -08:00
chenyu
e8844853ed Tensor.eye with arange (#13287)
with rangify we can write these with arange
2025-11-15 12:32:27 -05:00
Christopher Milan
5b823af696 Remove (pypi) clang dep for autogen (#13284)
* no more clang

* regen comgr_3

* ci doesn't need pypi clang

* fix objc

* REGEN for libclang

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-15 09:05:11 -08:00
George Hotz
df53c62a9f bump line count 2025-11-15 08:16:20 -08:00
nimlgen
d37e1fe065 nv: wait for wpr to reset (#13282)
* nv: wait for wpr to reet

* fix

* comment

* wai

* f

* fix
2025-11-15 20:00:49 +08:00
George Hotz
22c08b470c fold using outerworld range (#13286)
* scan using outerworld range

* almost

* sched

* simple range

* mypy

* woooo outer range

* spec passes

* print the numbers

* lol it runs

* real test
2025-11-14 20:43:41 -08:00
George Hotz
567066f51f tests for cast there and back (#13195)
* fix cast folding in llama

* dtypes that work everywhere

* Skip test_cast_there_and_back for backend casts

Skip test due to backend casting issues.
2025-11-14 16:56:09 -08:00
George Hotz
6c5fa349e1 add (unused) outer range (#13285) 2025-11-14 16:47:52 -08:00
Christopher Milan
d1bb08c5a1 In-tree autogen: objective c (#13223)
* checkout changes from autogen branch

* move assert

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-14 14:08:42 -08:00
George Hotz
e5351699bd openpilot warp (#13283)
* openpilot image warp test

* 0.4 ms on metal, 1 ms on CPU

* new inputs each time

* reshape
2025-11-14 13:55:32 -08:00
qazal
7c110e1a57 viz: minor cleanups for sqtt (#13275)
* small prg cleanup

* test_timing
2025-11-15 01:08:56 +08:00
chenyu
888aaab151 test_tiny cleanup (#13276) 2025-11-14 11:11:32 -05:00
nimlgen
3e63831b98 nv: support 580+ drivers (#13269)
* nv: 580+ support

* start

* f

* fake

* fix
2025-11-14 21:44:16 +08:00
qazal
2ee701a009 roc: fix CEnum access (#13270)
* roc: add decoder to ci

* also add installer

* use CEnum syntax

* try 2

* add to setup

* revert ci change

* the other enum too
2025-11-14 21:41:24 +08:00
nimlgen
c80d459d99 autogen: fix packed args structs (#13274)
* autogen: fix packed args structs

* and test this
2025-11-14 20:24:06 +08:00
nimlgen
14eb48b13a autogen: rename nv_gpu to nv_570 (#13273)
* autogen: rename nv_gpu to nv_570

* rename
2025-11-14 20:07:19 +08:00
nimlgen
734bfa07b4 nv: refactor uvm calls (#13272) 2025-11-14 19:53:04 +08:00
nimlgen
f72b1fbca4 nv: read numClasses (#13271)
* nv: read numClasses

* fix

* d
2025-11-14 19:43:25 +08:00
nimlgen
84f065f2a2 autogen: warning and msg (#13268)
* autogen: warning and msg

* f
2025-11-14 19:10:26 +08:00
George Hotz
44d84228ff move comgr_3 logic back to the old place (#13266)
* move comgr_3 logic back to the old place

* explicit
2025-11-13 20:05:54 -08:00
Christopher Milan
09f3aae169 In-tree autogen: all C libraries (#13220)
* checkout files from autogen branch

* ioctl with payload

* fix am generations

* properly fix generations

This reverts commit b2a54f4f41.

* revert discovery.h

* support pragma pack(1)

* typo

* better getter

* typo

* NVCEC0_QMDV05_00_RELEASE[01]_ENABLE

* align support

* anon handling fix

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-13 18:57:44 -08:00
wozeparrot
777cbec5b3 tk: rename rt tile dims to base (#13265) 2025-11-13 18:43:02 -08:00
wozeparrot
7eb0d8e744 feat: mixins on tiles (#13246) 2025-11-13 16:52:52 -08:00
George Hotz
ba84d415fe work from benchmarking tinybox red v2 (#13264)
* work from benchmarking tinybox red v2

* gpuburn
2025-11-13 16:38:40 -08:00
wozeparrot
547304c471 tk: group cleanup (#13262) 2025-11-13 14:19:51 -08:00
wozeparrot
4ada51618f tk: don't flatten in clear (#13249) 2025-11-13 13:38:01 -08:00
George Hotz
6b1bae6614 ruff format mixin (#13261) 2025-11-13 10:10:38 -08:00
Faizaan Gagan
3049f3edda support _rebuild_tensor method interception (#13253) 2025-11-13 09:41:21 -08:00
Harald Schäfer
3af231904e openpilot compile tests: assert pre-rangify speeds (#12775)
* assert pre-rangify speeds

* typo

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-13 09:39:06 -08:00