George Hotz
ab7df42c78
bring back fold_divmod_general with bugfix and test [pr] ( #13369 )
...
* Revert "Revert "merge to fold_divmod_general [p] (#13359 )""
This reverts commit 05ccc69248 .
* Revert "Revert "actually merge to fold_divmod_general [pr] (#13363 )""
This reverts commit 90e5752199 .
* Revert "Revert "add cache to fold_divmod_general (#13365 )""
This reverts commit 8e17bd6791 .
* bring back fold_divmod_general with bugfix and test
2025-11-19 14:51:51 -08:00
George Hotz
986d113024
symbolic fuzz failure ( #13367 )
...
* symbolic fuzz failure
* skip flaky test
2025-11-19 14:21:08 -08:00
George Hotz
05ccc69248
Revert "merge to fold_divmod_general [p] ( #13359 )"
...
This reverts commit 7711bbac7f .
2025-11-19 14:18:09 -08:00
George Hotz
8e17bd6791
Revert "add cache to fold_divmod_general ( #13365 )"
...
This reverts commit b5309a5043 .
2025-11-19 14:18:08 -08:00
George Hotz
b5309a5043
add cache to fold_divmod_general ( #13365 )
2025-11-19 13:49:18 -08:00
George Hotz
7711bbac7f
merge to fold_divmod_general [p] ( #13359 )
...
* merge to fold_divmod_general [p]
* merge more
* merge more
* merge more
2025-11-19 11:37:45 -08:00
George Hotz
6fdbd03104
more divmod cleanup [p] ( #13358 )
...
* more divmod cleanup [p]
* lil cleanups, faster
2025-11-19 10:35:15 -08:00
George Hotz
957cf717e7
Python speed ( #13355 )
...
* skip process replay by default
* work on python speed
* fix names of rewrite rules
* fix that test
2025-11-19 09:03:00 -08:00
George Hotz
385618d45b
skip process replay by default ( #13353 )
2025-11-19 08:25:34 -08:00
Christopher Milan
a438c277de
autogen tests for 3.14 ( #13343 )
2025-11-18 22:16:59 -05:00
George Hotz
1afa3c0877
vmap on full model ( #13340 )
...
* vmap on full model
* vmap gemm
* reduce sums on end
* outer reduce
* only if there's ranges
* put those rules in symbolic
* ranges
* do opt later
* add zero range
2025-11-18 16:06:06 -08:00
George Hotz
9c59b3d19e
vmap grad needs reduce_backward ( #13336 )
...
* vmap grad needs reduce_backward
* fuse and outer
2025-11-18 10:08:30 -08:00
George Hotz
06e39a88a9
outer vmap works ( #13334 )
...
* outer vmap works
* fuse works
* vmap outer works
* outer ranges work
* grad work
* should be good to merge
2025-11-18 09:27:48 -08:00
George Hotz
583560ab72
this is the right way to write vmap ( #13328 )
2025-11-17 20:20:52 -08:00
George Hotz
e4fead8a86
write scan in uops ( #13321 )
...
* write scan in uops
* ops range
* no need for variable
* meh, later
* shorter
2025-11-17 16:58:08 -08:00
George Hotz
6d3385c284
print special ops in postrange ( #13318 )
...
* print special ops in postrange
* fix on OSX
2025-11-17 14:43:23 -08:00
wozeparrot
33773fda87
tk initial mi350 ( #13289 )
2025-11-17 11:46:32 -08:00
nimlgen
9bb17c53ea
amd: timer fix ( #13267 )
2025-11-17 13:59:03 +03:00
George Hotz
cabd4add48
more work parsing SQTT, separate VIZ/PROFILE ( #13308 )
...
* more work parsing SQTT
* more minimal runner
* sep VIZ/PROFILE
* parse print new
* improve parser
* more filter
* that
* split them
* lil cleanup
* skip flaky test
* AQL in mmapeak
2025-11-16 10:40:39 -08:00
George Hotz
295600dc5a
saturday coffee shop work parsing the att format ( #13295 )
...
* saturday coffee shop work parsing the att format
* add examples
* parser
* classes of packets
* fully vibe coded parser
* vibing
* empty
* some vibe names
* vibes
* most of these are wrong
* more vibes
* better names
* parsing
* parse
* cleanup parser
* touchups
2025-11-16 08:25:51 -08:00
chenyu
8f0e747b3a
Tensor._tri with arange ( #13297 )
2025-11-16 10:21:16 -05:00
chenyu
e8844853ed
Tensor.eye with arange ( #13287 )
...
with rangify we can write these with arange
2025-11-15 12:32:27 -05:00
George Hotz
22c08b470c
fold using outerworld range ( #13286 )
...
* scan using outerworld range
* almost
* sched
* simple range
* mypy
* woooo outer range
* spec passes
* print the numbers
* lol it runs
* real test
2025-11-14 20:43:41 -08:00
George Hotz
567066f51f
tests for cast there and back ( #13195 )
...
* fix cast folding in llama
* dtypes that work everywhere
* Skip test_cast_there_and_back for backend casts
Skip test due to backend casting issues.
2025-11-14 16:56:09 -08:00
George Hotz
6c5fa349e1
add (unused) outer range ( #13285 )
2025-11-14 16:47:52 -08:00
George Hotz
e5351699bd
openpilot warp ( #13283 )
...
* openpilot image warp test
* 0.4 ms on metal, 1 ms on CPU
* new inputs each time
* reshape
2025-11-14 13:55:32 -08:00
chenyu
888aaab151
test_tiny cleanup ( #13276 )
2025-11-14 11:11:32 -05:00
nimlgen
3e63831b98
nv: support 580+ drivers ( #13269 )
...
* nv: 580+ support
* start
* f
* fake
* fix
2025-11-14 21:44:16 +08:00
nimlgen
c80d459d99
autogen: fix packed args structs ( #13274 )
...
* autogen: fix packed args structs
* and test this
2025-11-14 20:24:06 +08:00
nimlgen
14eb48b13a
autogen: rename nv_gpu to nv_570 ( #13273 )
...
* autogen: rename nv_gpu to nv_570
* rename
2025-11-14 20:07:19 +08:00
nimlgen
f72b1fbca4
nv: read numClasses ( #13271 )
...
* nv: read numClasses
* fix
* d
2025-11-14 19:43:25 +08:00
Christopher Milan
09f3aae169
In-tree autogen: all C libraries ( #13220 )
...
* checkout files from autogen branch
* ioctl with payload
* fix am generations
* properly fix generations
This reverts commit b2a54f4f41 .
* revert discovery.h
* support pragma pack(1)
* typo
* better getter
* typo
* NVCEC0_QMDV05_00_RELEASE[01]_ENABLE
* align support
* anon handling fix
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-13 18:57:44 -08:00
wozeparrot
7eb0d8e744
feat: mixins on tiles ( #13246 )
2025-11-13 16:52:52 -08:00
Ayman Jabr
256f81bb02
Fix tracemeta 0 ( #13049 )
...
* chore: tclesius branch resolved
* fix: indentation
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-13 09:07:11 -08:00
George Hotz
bcdfc109b5
hotfix: disable flaky test
2025-11-13 06:19:28 -08:00
wozeparrot
759557f633
feat: move tk tests to testextra ( #13242 )
2025-11-12 17:06:53 -08:00
chenyu
3f939f3d3c
update pm_simplify_valid ( #13241 )
...
* update pm_simplify_valid
fixed openpilot conv regression
* IMAGE training is broken
2025-11-12 19:40:02 -05:00
George Hotz
ab9fa964d8
DISABLE_COMPILER_CACHE -> CCACHE ( #13234 )
...
* DISABLE_COMPILER_CACHE -> CCACHE
* Fix cachekey assignment in Compiler constructor
2025-11-12 15:07:09 -08:00
Jan Akhremchik
bc8e537423
Add NONZERO op to onnx backend ( #13211 )
2025-11-12 08:55:51 -08:00
qazal
7a6853fa40
viz: show python callstack in the first graph ( #13218 )
2025-11-12 20:52:28 +08:00
wozeparrot
371c1f2355
tk: move tiles to class ( #13224 )
2025-11-11 21:53:46 -08:00
Christopher Milan
41a098a82d
In-tree autogen: libc.py ( #13217 )
...
* checkout changes from autogen branch
* parents
* pylint happy
* move sys to system in helpers.py
* typo
* typo
2025-11-11 19:13:48 -08:00
wozeparrot
222bb12ddf
tk softmax ( #13205 )
2025-11-11 15:13:16 -08:00
qazal
bc55bc4849
cleanup test_viz profiler tests ( #13221 )
2025-11-12 03:46:48 +08:00
wozeparrot
73497af4c0
clean: use np for allclose ( #13204 )
2025-11-10 23:02:43 -08:00
chenyu
22b8579234
one last regressed dm kernel ( #13201 )
2025-11-10 23:30:52 -05:00
chenyu
829cdafccc
update openpilot slow conv uop ast ( #13197 )
...
the two remaining slow ones
2025-11-10 17:03:20 -05:00
wozeparrot
6252831ceb
feat: initial tk library ( #13160 )
2025-11-09 22:54:29 -08:00
chenyu
2ba8b4946f
external_benchmark_op_cat.py ( #13168 )
...
* external_benchmark_op_cat.py
cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS
* fix
2025-11-08 01:54:10 -05:00
George Hotz
ffb9e8396f
fix indexing bug with convs
...
* minimal difference for ONE_POOL=1
* fix indexing bug
* improve indexing debugger
* more debugger improvements
* always for reshape
2025-11-07 16:45:19 -08:00