Commit Graph

4673 Commits

Author SHA1 Message Date
George Hotz
ab7df42c78 bring back fold_divmod_general with bugfix and test [pr] (#13369)
* Revert "Revert "merge to fold_divmod_general [p] (#13359)""

This reverts commit 05ccc69248.

* Revert "Revert "actually merge to fold_divmod_general [pr] (#13363)""

This reverts commit 90e5752199.

* Revert "Revert "add cache to fold_divmod_general (#13365)""

This reverts commit 8e17bd6791.

* bring back fold_divmod_general with bugfix and test
2025-11-19 14:51:51 -08:00
George Hotz
986d113024 symbolic fuzz failure (#13367)
* symbolic fuzz failure

* skip flaky test
2025-11-19 14:21:08 -08:00
George Hotz
05ccc69248 Revert "merge to fold_divmod_general [p] (#13359)"
This reverts commit 7711bbac7f.
2025-11-19 14:18:09 -08:00
George Hotz
8e17bd6791 Revert "add cache to fold_divmod_general (#13365)"
This reverts commit b5309a5043.
2025-11-19 14:18:08 -08:00
George Hotz
b5309a5043 add cache to fold_divmod_general (#13365) 2025-11-19 13:49:18 -08:00
George Hotz
7711bbac7f merge to fold_divmod_general [p] (#13359)
* merge to fold_divmod_general [p]

* merge more

* merge more

* merge more
2025-11-19 11:37:45 -08:00
George Hotz
6fdbd03104 more divmod cleanup [p] (#13358)
* more divmod cleanup [p]

* lil cleanups, faster
2025-11-19 10:35:15 -08:00
George Hotz
957cf717e7 Python speed (#13355)
* skip process replay by default

* work on python speed

* fix names of rewrite rules

* fix that test
2025-11-19 09:03:00 -08:00
George Hotz
385618d45b skip process replay by default (#13353) 2025-11-19 08:25:34 -08:00
Christopher Milan
a438c277de autogen tests for 3.14 (#13343) 2025-11-18 22:16:59 -05:00
George Hotz
1afa3c0877 vmap on full model (#13340)
* vmap on full model

* vmap gemm

* reduce sums on end

* outer reduce

* only if there's ranges

* put those rules in symbolic

* ranges

* do opt later

* add zero range
2025-11-18 16:06:06 -08:00
George Hotz
9c59b3d19e vmap grad needs reduce_backward (#13336)
* vmap grad needs reduce_backward

* fuse and outer
2025-11-18 10:08:30 -08:00
George Hotz
06e39a88a9 outer vmap works (#13334)
* outer vmap works

* fuse works

* vmap outer works

* outer ranges work

* grad work

* should be good to merge
2025-11-18 09:27:48 -08:00
George Hotz
583560ab72 this is the right way to write vmap (#13328) 2025-11-17 20:20:52 -08:00
George Hotz
e4fead8a86 write scan in uops (#13321)
* write scan in uops

* ops range

* no need for variable

* meh, later

* shorter
2025-11-17 16:58:08 -08:00
George Hotz
6d3385c284 print special ops in postrange (#13318)
* print special ops in postrange

* fix on OSX
2025-11-17 14:43:23 -08:00
wozeparrot
33773fda87 tk initial mi350 (#13289) 2025-11-17 11:46:32 -08:00
nimlgen
9bb17c53ea amd: timer fix (#13267) 2025-11-17 13:59:03 +03:00
George Hotz
cabd4add48 more work parsing SQTT, separate VIZ/PROFILE (#13308)
* more work parsing SQTT

* more minimal runner

* sep VIZ/PROFILE

* parse print new

* improve parser

* more filter

* that

* split them

* lil cleanup

* skip flaky test

* AQL in mmapeak
2025-11-16 10:40:39 -08:00
George Hotz
295600dc5a saturday coffee shop work parsing the att format (#13295)
* saturday coffee shop work parsing the att format

* add examples

* parser

* classes of packets

* fully vibe coded parser

* vibing

* empty

* some vibe names

* vibes

* most of these are wrong

* more vibes

* better names

* parsing

* parse

* cleanup parser

* touchups
2025-11-16 08:25:51 -08:00
chenyu
8f0e747b3a Tensor._tri with arange (#13297) 2025-11-16 10:21:16 -05:00
chenyu
e8844853ed Tensor.eye with arange (#13287)
with rangify we can write these with arange
2025-11-15 12:32:27 -05:00
George Hotz
22c08b470c fold using outerworld range (#13286)
* scan using outerworld range

* almost

* sched

* simple range

* mypy

* woooo outer range

* spec passes

* print the numbers

* lol it runs

* real test
2025-11-14 20:43:41 -08:00
George Hotz
567066f51f tests for cast there and back (#13195)
* fix cast folding in llama

* dtypes that work everywhere

* Skip test_cast_there_and_back for backend casts

Skip test due to backend casting issues.
2025-11-14 16:56:09 -08:00
George Hotz
6c5fa349e1 add (unused) outer range (#13285) 2025-11-14 16:47:52 -08:00
George Hotz
e5351699bd openpilot warp (#13283)
* openpilot image warp test

* 0.4 ms on metal, 1 ms on CPU

* new inputs each time

* reshape
2025-11-14 13:55:32 -08:00
chenyu
888aaab151 test_tiny cleanup (#13276) 2025-11-14 11:11:32 -05:00
nimlgen
3e63831b98 nv: support 580+ drivers (#13269)
* nv: 580+ support

* start

* f

* fake

* fix
2025-11-14 21:44:16 +08:00
nimlgen
c80d459d99 autogen: fix packed args structs (#13274)
* autogen: fix packed args structs

* and test this
2025-11-14 20:24:06 +08:00
nimlgen
14eb48b13a autogen: rename nv_gpu to nv_570 (#13273)
* autogen: rename nv_gpu to nv_570

* rename
2025-11-14 20:07:19 +08:00
nimlgen
f72b1fbca4 nv: read numClasses (#13271)
* nv: read numClasses

* fix

* d
2025-11-14 19:43:25 +08:00
Christopher Milan
09f3aae169 In-tree autogen: all C libraries (#13220)
* checkout files from autogen branch

* ioctl with payload

* fix am generations

* properly fix generations

This reverts commit b2a54f4f41.

* revert discovery.h

* support pragma pack(1)

* typo

* better getter

* typo

* NVCEC0_QMDV05_00_RELEASE[01]_ENABLE

* align support

* anon handling fix

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-13 18:57:44 -08:00
wozeparrot
7eb0d8e744 feat: mixins on tiles (#13246) 2025-11-13 16:52:52 -08:00
Ayman Jabr
256f81bb02 Fix tracemeta 0 (#13049)
* chore: tclesius branch resolved

* fix: indentation

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-13 09:07:11 -08:00
George Hotz
bcdfc109b5 hotfix: disable flaky test 2025-11-13 06:19:28 -08:00
wozeparrot
759557f633 feat: move tk tests to testextra (#13242) 2025-11-12 17:06:53 -08:00
chenyu
3f939f3d3c update pm_simplify_valid (#13241)
* update pm_simplify_valid

fixed openpilot conv regression

* IMAGE training is broken
2025-11-12 19:40:02 -05:00
George Hotz
ab9fa964d8 DISABLE_COMPILER_CACHE -> CCACHE (#13234)
* DISABLE_COMPILER_CACHE -> CCACHE

* Fix cachekey assignment in Compiler constructor
2025-11-12 15:07:09 -08:00
Jan Akhremchik
bc8e537423 Add NONZERO op to onnx backend (#13211) 2025-11-12 08:55:51 -08:00
qazal
7a6853fa40 viz: show python callstack in the first graph (#13218) 2025-11-12 20:52:28 +08:00
wozeparrot
371c1f2355 tk: move tiles to class (#13224) 2025-11-11 21:53:46 -08:00
Christopher Milan
41a098a82d In-tree autogen: libc.py (#13217)
* checkout changes from autogen branch

* parents

* pylint happy

* move sys to system in helpers.py

* typo

* typo
2025-11-11 19:13:48 -08:00
wozeparrot
222bb12ddf tk softmax (#13205) 2025-11-11 15:13:16 -08:00
qazal
bc55bc4849 cleanup test_viz profiler tests (#13221) 2025-11-12 03:46:48 +08:00
wozeparrot
73497af4c0 clean: use np for allclose (#13204) 2025-11-10 23:02:43 -08:00
chenyu
22b8579234 one last regressed dm kernel (#13201) 2025-11-10 23:30:52 -05:00
chenyu
829cdafccc update openpilot slow conv uop ast (#13197)
the two remaining slow ones
2025-11-10 17:03:20 -05:00
wozeparrot
6252831ceb feat: initial tk library (#13160) 2025-11-09 22:54:29 -08:00
chenyu
2ba8b4946f external_benchmark_op_cat.py (#13168)
* external_benchmark_op_cat.py

cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS

* fix
2025-11-08 01:54:10 -05:00
George Hotz
ffb9e8396f fix indexing bug with convs
* minimal difference for ONE_POOL=1

* fix indexing bug

* improve indexing debugger

* more debugger improvements

* always for reshape
2025-11-07 16:45:19 -08:00