Commit Graph

4659 Commits

Author SHA1 Message Date
George Hotz
e4fead8a86 write scan in uops (#13321)
* write scan in uops

* ops range

* no need for variable

* meh, later

* shorter
2025-11-17 16:58:08 -08:00
George Hotz
6d3385c284 print special ops in postrange (#13318)
* print special ops in postrange

* fix on OSX
2025-11-17 14:43:23 -08:00
wozeparrot
33773fda87 tk initial mi350 (#13289) 2025-11-17 11:46:32 -08:00
nimlgen
9bb17c53ea amd: timer fix (#13267) 2025-11-17 13:59:03 +03:00
George Hotz
cabd4add48 more work parsing SQTT, separate VIZ/PROFILE (#13308)
* more work parsing SQTT

* more minimal runner

* sep VIZ/PROFILE

* parse print new

* improve parser

* more filter

* that

* split them

* lil cleanup

* skip flaky test

* AQL in mmapeak
2025-11-16 10:40:39 -08:00
George Hotz
295600dc5a saturday coffee shop work parsing the att format (#13295)
* saturday coffee shop work parsing the att format

* add examples

* parser

* classes of packets

* fully vibe coded parser

* vibing

* empty

* some vibe names

* vibes

* most of these are wrong

* more vibes

* better names

* parsing

* parse

* cleanup parser

* touchups
2025-11-16 08:25:51 -08:00
chenyu
8f0e747b3a Tensor._tri with arange (#13297) 2025-11-16 10:21:16 -05:00
chenyu
e8844853ed Tensor.eye with arange (#13287)
with rangify we can write these with arange
2025-11-15 12:32:27 -05:00
George Hotz
22c08b470c fold using outerworld range (#13286)
* scan using outerworld range

* almost

* sched

* simple range

* mypy

* woooo outer range

* spec passes

* print the numbers

* lol it runs

* real test
2025-11-14 20:43:41 -08:00
George Hotz
567066f51f tests for cast there and back (#13195)
* fix cast folding in llama

* dtypes that work everywhere

* Skip test_cast_there_and_back for backend casts

Skip test due to backend casting issues.
2025-11-14 16:56:09 -08:00
George Hotz
6c5fa349e1 add (unused) outer range (#13285) 2025-11-14 16:47:52 -08:00
George Hotz
e5351699bd openpilot warp (#13283)
* openpilot image warp test

* 0.4 ms on metal, 1 ms on CPU

* new inputs each time

* reshape
2025-11-14 13:55:32 -08:00
chenyu
888aaab151 test_tiny cleanup (#13276) 2025-11-14 11:11:32 -05:00
nimlgen
3e63831b98 nv: support 580+ drivers (#13269)
* nv: 580+ support

* start

* f

* fake

* fix
2025-11-14 21:44:16 +08:00
nimlgen
c80d459d99 autogen: fix packed args structs (#13274)
* autogen: fix packed args structs

* and test this
2025-11-14 20:24:06 +08:00
nimlgen
14eb48b13a autogen: rename nv_gpu to nv_570 (#13273)
* autogen: rename nv_gpu to nv_570

* rename
2025-11-14 20:07:19 +08:00
nimlgen
f72b1fbca4 nv: read numClasses (#13271)
* nv: read numClasses

* fix

* d
2025-11-14 19:43:25 +08:00
Christopher Milan
09f3aae169 In-tree autogen: all C libraries (#13220)
* checkout files from autogen branch

* ioctl with payload

* fix am generations

* properly fix generations

This reverts commit b2a54f4f41.

* revert discovery.h

* support pragma pack(1)

* typo

* better getter

* typo

* NVCEC0_QMDV05_00_RELEASE[01]_ENABLE

* align support

* anon handling fix

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-13 18:57:44 -08:00
wozeparrot
7eb0d8e744 feat: mixins on tiles (#13246) 2025-11-13 16:52:52 -08:00
Ayman Jabr
256f81bb02 Fix tracemeta 0 (#13049)
* chore: tclesius branch resolved

* fix: indentation

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-13 09:07:11 -08:00
George Hotz
bcdfc109b5 hotfix: disable flaky test 2025-11-13 06:19:28 -08:00
wozeparrot
759557f633 feat: move tk tests to testextra (#13242) 2025-11-12 17:06:53 -08:00
chenyu
3f939f3d3c update pm_simplify_valid (#13241)
* update pm_simplify_valid

fixed openpilot conv regression

* IMAGE training is broken
2025-11-12 19:40:02 -05:00
George Hotz
ab9fa964d8 DISABLE_COMPILER_CACHE -> CCACHE (#13234)
* DISABLE_COMPILER_CACHE -> CCACHE

* Fix cachekey assignment in Compiler constructor
2025-11-12 15:07:09 -08:00
Jan Akhremchik
bc8e537423 Add NONZERO op to onnx backend (#13211) 2025-11-12 08:55:51 -08:00
qazal
7a6853fa40 viz: show python callstack in the first graph (#13218) 2025-11-12 20:52:28 +08:00
wozeparrot
371c1f2355 tk: move tiles to class (#13224) 2025-11-11 21:53:46 -08:00
Christopher Milan
41a098a82d In-tree autogen: libc.py (#13217)
* checkout changes from autogen branch

* parents

* pylint happy

* move sys to system in helpers.py

* typo

* typo
2025-11-11 19:13:48 -08:00
wozeparrot
222bb12ddf tk softmax (#13205) 2025-11-11 15:13:16 -08:00
qazal
bc55bc4849 cleanup test_viz profiler tests (#13221) 2025-11-12 03:46:48 +08:00
wozeparrot
73497af4c0 clean: use np for allclose (#13204) 2025-11-10 23:02:43 -08:00
chenyu
22b8579234 one last regressed dm kernel (#13201) 2025-11-10 23:30:52 -05:00
chenyu
829cdafccc update openpilot slow conv uop ast (#13197)
the two remaining slow ones
2025-11-10 17:03:20 -05:00
wozeparrot
6252831ceb feat: initial tk library (#13160) 2025-11-09 22:54:29 -08:00
chenyu
2ba8b4946f external_benchmark_op_cat.py (#13168)
* external_benchmark_op_cat.py

cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS

* fix
2025-11-08 01:54:10 -05:00
George Hotz
ffb9e8396f fix indexing bug with convs
* minimal difference for ONE_POOL=1

* fix indexing bug

* improve indexing debugger

* more debugger improvements

* always for reshape
2025-11-07 16:45:19 -08:00
Ahmed Harmouche
3ecff3a8da Fix dim splitting bug for len(dim) == len(limited) case (#13142)
* Fix gpudims bug on webgpu

* Fix split dim bug

* Remove webgpu_bug from examples

* Add test for shape correctness

* Fix 3D indexing

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-11-07 12:31:06 -05:00
nimlgen
b8e48effcb device: no compilers message with reasons (#13146)
* device: no compilers message with reasons

* typings

* mypy
2025-11-07 23:01:45 +08:00
chenyu
bb8cf948f2 variation of (x%c)+(x//c)*c = x (#13135)
when x is in the form of y//b, the idiv term might have combined
2025-11-06 18:53:28 -05:00
George Hotz
42b34cf83d bottom up linearizer (#13133)
* bottom up linearizer

* late stores

* more complete

* remove broken heuristic

* upcast size

* opt

* more conservative

* it needs that

* disable opencl half on QCOM

* fix

* make that a real test

* cpu test okay

* ptx skip

* end is after the range
2025-11-06 15:30:32 -08:00
chenyu
bfb0c0391f test custom eye function (#13134)
this version is also faster with NOOPT
2025-11-06 14:51:55 -05:00
nimlgen
dafdb4bfb1 test hcq open with pytest (#13124)
* test hcq open with pytest

* fi
2025-11-06 20:09:51 +08:00
nimlgen
05e2ff4d87 system: fix flock on pcidevs (#13123)
* system: fix locking of hcq devices

* rename and fullrun

* force ok

* fix

* fix
2025-11-06 19:02:13 +08:00
chenyu
f33c182393 test custom qkv kernel (#13118)
adding the online softmax hits infinite loop so starting with this
2025-11-05 23:32:13 -05:00
George Hotz
9b2b535fa4 fix issue with multi flip (#13115) 2025-11-05 15:28:50 -08:00
George Hotz
4027eef264 fix test warnings (#13114)
* fix test warnings

* precommit passes

* ignore std_mean warning
2025-11-05 15:06:29 -08:00
George Hotz
bcfe42937f move permute/flip/shrink to mixins (#13113)
* move permute to mixins

* move more stuff

* two more

* fix local mypy

* fix tests

* fix shrink
2025-11-05 14:14:15 -08:00
George Hotz
2d4f01fda0 move mixins to mixin dir (#13105)
* move mixins to mixin dir

* math
2025-11-05 10:18:33 -08:00
chenyu
18d4ecc1f3 lower nv test_gemm_4096 target (#13107) 2025-11-05 11:05:16 -05:00
chenyu
54141e9cb9 DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096) 2025-11-04 11:28:18 -05:00