George Hotz
e4fead8a86
write scan in uops ( #13321 )
...
* write scan in uops
* ops range
* no need for variable
* meh, later
* shorter
2025-11-17 16:58:08 -08:00
George Hotz
6d3385c284
print special ops in postrange ( #13318 )
...
* print special ops in postrange
* fix on OSX
2025-11-17 14:43:23 -08:00
wozeparrot
33773fda87
tk initial mi350 ( #13289 )
2025-11-17 11:46:32 -08:00
nimlgen
9bb17c53ea
amd: timer fix ( #13267 )
2025-11-17 13:59:03 +03:00
George Hotz
cabd4add48
more work parsing SQTT, separate VIZ/PROFILE ( #13308 )
...
* more work parsing SQTT
* more minimal runner
* sep VIZ/PROFILE
* parse print new
* improve parser
* more filter
* that
* split them
* lil cleanup
* skip flaky test
* AQL in mmapeak
2025-11-16 10:40:39 -08:00
George Hotz
295600dc5a
saturday coffee shop work parsing the att format ( #13295 )
...
* saturday coffee shop work parsing the att format
* add examples
* parser
* classes of packets
* fully vibe coded parser
* vibing
* empty
* some vibe names
* vibes
* most of these are wrong
* more vibes
* better names
* parsing
* parse
* cleanup parser
* touchups
2025-11-16 08:25:51 -08:00
chenyu
8f0e747b3a
Tensor._tri with arange ( #13297 )
2025-11-16 10:21:16 -05:00
chenyu
e8844853ed
Tensor.eye with arange ( #13287 )
...
with rangify we can write these with arange
2025-11-15 12:32:27 -05:00
George Hotz
22c08b470c
fold using outerworld range ( #13286 )
...
* scan using outerworld range
* almost
* sched
* simple range
* mypy
* woooo outer range
* spec passes
* print the numbers
* lol it runs
* real test
2025-11-14 20:43:41 -08:00
George Hotz
567066f51f
tests for cast there and back ( #13195 )
...
* fix cast folding in llama
* dtypes that work everywhere
* Skip test_cast_there_and_back for backend casts
Skip test due to backend casting issues.
2025-11-14 16:56:09 -08:00
George Hotz
6c5fa349e1
add (unused) outer range ( #13285 )
2025-11-14 16:47:52 -08:00
George Hotz
e5351699bd
openpilot warp ( #13283 )
...
* openpilot image warp test
* 0.4 ms on metal, 1 ms on CPU
* new inputs each time
* reshape
2025-11-14 13:55:32 -08:00
chenyu
888aaab151
test_tiny cleanup ( #13276 )
2025-11-14 11:11:32 -05:00
nimlgen
3e63831b98
nv: support 580+ drivers ( #13269 )
...
* nv: 580+ support
* start
* f
* fake
* fix
2025-11-14 21:44:16 +08:00
nimlgen
c80d459d99
autogen: fix packed args structs ( #13274 )
...
* autogen: fix packed args structs
* and test this
2025-11-14 20:24:06 +08:00
nimlgen
14eb48b13a
autogen: rename nv_gpu to nv_570 ( #13273 )
...
* autogen: rename nv_gpu to nv_570
* rename
2025-11-14 20:07:19 +08:00
nimlgen
f72b1fbca4
nv: read numClasses ( #13271 )
...
* nv: read numClasses
* fix
* d
2025-11-14 19:43:25 +08:00
Christopher Milan
09f3aae169
In-tree autogen: all C libraries ( #13220 )
...
* checkout files from autogen branch
* ioctl with payload
* fix am generations
* properly fix generations
This reverts commit b2a54f4f41 .
* revert discovery.h
* support pragma pack(1)
* typo
* better getter
* typo
* NVCEC0_QMDV05_00_RELEASE[01]_ENABLE
* align support
* anon handling fix
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-13 18:57:44 -08:00
wozeparrot
7eb0d8e744
feat: mixins on tiles ( #13246 )
2025-11-13 16:52:52 -08:00
Ayman Jabr
256f81bb02
Fix tracemeta 0 ( #13049 )
...
* chore: tclesius branch resolved
* fix: indentation
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-13 09:07:11 -08:00
George Hotz
bcdfc109b5
hotfix: disable flaky test
2025-11-13 06:19:28 -08:00
wozeparrot
759557f633
feat: move tk tests to testextra ( #13242 )
2025-11-12 17:06:53 -08:00
chenyu
3f939f3d3c
update pm_simplify_valid ( #13241 )
...
* update pm_simplify_valid
fixed openpilot conv regression
* IMAGE training is broken
2025-11-12 19:40:02 -05:00
George Hotz
ab9fa964d8
DISABLE_COMPILER_CACHE -> CCACHE ( #13234 )
...
* DISABLE_COMPILER_CACHE -> CCACHE
* Fix cachekey assignment in Compiler constructor
2025-11-12 15:07:09 -08:00
Jan Akhremchik
bc8e537423
Add NONZERO op to onnx backend ( #13211 )
2025-11-12 08:55:51 -08:00
qazal
7a6853fa40
viz: show python callstack in the first graph ( #13218 )
2025-11-12 20:52:28 +08:00
wozeparrot
371c1f2355
tk: move tiles to class ( #13224 )
2025-11-11 21:53:46 -08:00
Christopher Milan
41a098a82d
In-tree autogen: libc.py ( #13217 )
...
* checkout changes from autogen branch
* parents
* pylint happy
* move sys to system in helpers.py
* typo
* typo
2025-11-11 19:13:48 -08:00
wozeparrot
222bb12ddf
tk softmax ( #13205 )
2025-11-11 15:13:16 -08:00
qazal
bc55bc4849
cleanup test_viz profiler tests ( #13221 )
2025-11-12 03:46:48 +08:00
wozeparrot
73497af4c0
clean: use np for allclose ( #13204 )
2025-11-10 23:02:43 -08:00
chenyu
22b8579234
one last regressed dm kernel ( #13201 )
2025-11-10 23:30:52 -05:00
chenyu
829cdafccc
update openpilot slow conv uop ast ( #13197 )
...
the two remaining slow ones
2025-11-10 17:03:20 -05:00
wozeparrot
6252831ceb
feat: initial tk library ( #13160 )
2025-11-09 22:54:29 -08:00
chenyu
2ba8b4946f
external_benchmark_op_cat.py ( #13168 )
...
* external_benchmark_op_cat.py
cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS
* fix
2025-11-08 01:54:10 -05:00
George Hotz
ffb9e8396f
fix indexing bug with convs
...
* minimal difference for ONE_POOL=1
* fix indexing bug
* improve indexing debugger
* more debugger improvements
* always for reshape
2025-11-07 16:45:19 -08:00
Ahmed Harmouche
3ecff3a8da
Fix dim splitting bug for len(dim) == len(limited) case ( #13142 )
...
* Fix gpudims bug on webgpu
* Fix split dim bug
* Remove webgpu_bug from examples
* Add test for shape correctness
* Fix 3D indexing
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-11-07 12:31:06 -05:00
nimlgen
b8e48effcb
device: no compilers message with reasons ( #13146 )
...
* device: no compilers message with reasons
* typings
* mypy
2025-11-07 23:01:45 +08:00
chenyu
bb8cf948f2
variation of (x%c)+(x//c)*c = x ( #13135 )
...
when x is in the form of y//b, the idiv term might have combined
2025-11-06 18:53:28 -05:00
George Hotz
42b34cf83d
bottom up linearizer ( #13133 )
...
* bottom up linearizer
* late stores
* more complete
* remove broken heuristic
* upcast size
* opt
* more conservative
* it needs that
* disable opencl half on QCOM
* fix
* make that a real test
* cpu test okay
* ptx skip
* end is after the range
2025-11-06 15:30:32 -08:00
chenyu
bfb0c0391f
test custom eye function ( #13134 )
...
this version is also faster with NOOPT
2025-11-06 14:51:55 -05:00
nimlgen
dafdb4bfb1
test hcq open with pytest ( #13124 )
...
* test hcq open with pytest
* fi
2025-11-06 20:09:51 +08:00
nimlgen
05e2ff4d87
system: fix flock on pcidevs ( #13123 )
...
* system: fix locking of hcq devices
* rename and fullrun
* force ok
* fix
* fix
2025-11-06 19:02:13 +08:00
chenyu
f33c182393
test custom qkv kernel ( #13118 )
...
adding the online softmax hits infinite loop so starting with this
2025-11-05 23:32:13 -05:00
George Hotz
9b2b535fa4
fix issue with multi flip ( #13115 )
2025-11-05 15:28:50 -08:00
George Hotz
4027eef264
fix test warnings ( #13114 )
...
* fix test warnings
* precommit passes
* ignore std_mean warning
2025-11-05 15:06:29 -08:00
George Hotz
bcfe42937f
move permute/flip/shrink to mixins ( #13113 )
...
* move permute to mixins
* move more stuff
* two more
* fix local mypy
* fix tests
* fix shrink
2025-11-05 14:14:15 -08:00
George Hotz
2d4f01fda0
move mixins to mixin dir ( #13105 )
...
* move mixins to mixin dir
* math
2025-11-05 10:18:33 -08:00
chenyu
18d4ecc1f3
lower nv test_gemm_4096 target ( #13107 )
2025-11-05 11:05:16 -05:00
chenyu
54141e9cb9
DISABLE_COMPILER_CACHE=1 in speed_v_theoretical ( #13096 )
2025-11-04 11:28:18 -05:00