10772 Commits

Author SHA1 Message Date
George Hotz
f76a6c8845 unused 2025-10-27 13:31:52 +08:00
George Hotz
33013db092 fix nan 2025-10-27 13:21:43 +08:00
George Hotz
b7436f600d works 2025-10-27 13:05:36 +08:00
George Hotz
67183049c1 SPEC=3 works 2025-10-27 12:51:56 +08:00
George Hotz
9cdb45f410 spec 3 works 2025-10-27 11:59:33 +08:00
George Hotz
46914e2f40 only check it there 2025-10-27 11:50:33 +08:00
George Hotz
1eb982e01f SPEC=3 tests pyrender 2025-10-27 11:11:00 +08:00
Sieds Lykles
eaeaea2f9c pyrender Ops.SPECIAL and use correct dtype for Ops.RANGE rendering (#12931) 2025-10-27 03:21:34 +01:00
nimlgen
8c1368cab6 system: class PCIBarInfo (#12930)
* system: class PCIBarInfo

* fix
2025-10-27 03:57:42 +08:00
nimlgen
f00009c731 hcq: drivers take pcidev (#12929)
* hcq: drivers take pcidev

* fix nv
2025-10-26 20:43:51 +08:00
ttomsa
99a519f068 linearizer cleanup (#12923)
* cleanup

* comments

* also this
2025-10-26 18:30:12 +08:00
George Hotz
c0c24d3a70 cleanup wmma (#12927)
* cleanup wmma

* fix test_ops failures on android
2025-10-26 18:26:47 +08:00
George Hotz
0a32ab0006 nitpicks from typecheckers (#12926)
* nitpicks from the typechecker

* more
2025-10-26 17:52:55 +08:00
George Hotz
db5c918215 source extra/cl_android.sh to fix opencl on android 2025-10-26 15:27:51 +08:00
qazal
c94e597b3e viz ui selector cleanups (#12924) 2025-10-26 14:40:47 +08:00
chenyu
94701d4838 clean up divide_exact order [pr] (#12919)
do the const first since ADD can also call into that
2025-10-25 18:47:57 -04:00
chenyu
e18922f111 limit AND const min max to ints [pr] (#12918) 2025-10-25 16:07:52 -04:00
nimlgen
92324172be amd: refactor usb into usbdevice (#12916)
* amd: refactor usb into usbdevice

* nu

* my bad

* ops

* my bad
2025-10-26 01:00:19 +08:00
qazal
3b192f5eac split viz graph rendering from dag layout (#12914) 2025-10-25 15:36:44 +08:00
George Hotz
6415e3e8a7 use Ops.GROUP instead of Ops.NOOP for merging stores (#12912)
* use Ops.GROUP instead of Ops.NOOP for merging stores

* fs noop
2025-10-25 12:26:12 +08:00
George Hotz
b4f6a2c7a3 add kernel spec (#12911)
* add kernel spec

* fix kernel spec
2025-10-25 11:49:20 +08:00
George Hotz
8a941d95a4 SPEC=2 is full spec, SPEC=1 is default (#12910)
* SPEC=1 passes all tests

* just use SPEC, not __debug__
2025-10-25 11:10:43 +08:00
wozeparrot
456560c1ff stateless tinyfs copyin (#12908) 2025-10-24 19:18:38 -07:00
wozeparrot
a5b0f57067 clean: cleanup tinyfs copyout (#12907) 2025-10-24 18:32:55 -07:00
chenyu
4b7329001d clean up test_avg_pool3d (#12905) 2025-10-24 14:31:36 -04:00
George Hotz
6b35467f53 stores don't end ranges (#12902)
* early endrange

* bugfixes
2025-10-24 23:05:03 +08:00
nimlgen
5b5ba31a86 amd: make sqtt bufs uc (#12898) 2025-10-24 18:55:14 +08:00
Sieds Lykles
e1f8c82938 Onnx Layer/Group/RMS/Batch-Norm ReduceL2 fp32 intermediates for fp16 (#12109)
* match onnx spec

* use least_upper_dtype

* promote the square

* just cast before the square
2025-10-24 12:26:11 +02:00
George Hotz
0bde87d8d7 cleanups from flash attention branch (#12897) 2025-10-24 14:14:56 +08:00
wozeparrot
9dac505565 variable bs keccak (#10731) 2025-10-23 14:10:21 -07:00
chenyu
154b4f9f40 test FUSE_OPTIM=1 test/test_optim.py (#12895) 2025-10-23 15:54:27 -04:00
chenyu
6e4ee8deea small heuristic cleanup [pr] (#12892) 2025-10-23 10:50:15 -04:00
nimlgen
f835566e27 sqtt: correct header (#12891)
* sqtt: correct header

* f
2025-10-23 22:37:17 +08:00
Sieds Lykles
c1db62ff7c move reduce collapse to rangeify (#12845) 2025-10-23 15:44:17 +02:00
Sieds Lykles
04b3e51f1b remove old reduce collapse rule (#12889)
* comment this out

* remove
2025-10-23 13:51:49 +02:00
qazal
cdfb8e31ae hotfix: correct viz rewrite step counter reset (#12890) 2025-10-23 19:47:16 +08:00
George Hotz
6df19a4ac6 lil qol improvements to viz (#12887) 2025-10-23 18:41:07 +08:00
George Hotz
ff68a6263b move locals into codegen (dedup works) (#12885)
* move locals into codegen (dedup works)

* move in optimize
2025-10-23 17:07:39 +08:00
George Hotz
ddb53d1d48 PCONTIG=3 both saves ram and flops (#12884)
* PCONTIG=3 both saves ram and flops

* group

* gate locals

* should be correct
2025-10-23 16:37:26 +08:00
qazal
2a5c22436e remove outdated docs (#12881) 2025-10-23 12:52:36 +08:00
qazal
bcc30e5e10 viz: add linearized UOp list view (#12883)
* viz: add linearized UOp list view

* lang
2025-10-23 12:52:14 +08:00
George Hotz
e85cee0aad flip Ops.END srcs (#12882)
* flip Ops.END srcs

* backward

* late end split
2025-10-23 12:47:50 +08:00
George Hotz
74b4cfe44b Ops.GROUP + range check (#12880)
* simpler

* fix that

* Ops.GROUP + range check

* fix bugs

* fix linter

* fix test
2025-10-23 12:05:21 +08:00
Sieds Lykles
914defd55d give endrange priority (#12870)
* uncomment line

* try giving endrange priority
2025-10-23 05:19:13 +02:00
qazal
2f95c10702 remu new instructions / use volatile in emulator tests (#12862)
* remu new instructions

* start moving to volatile

* test_simple works

* test_exec_mov works and lid is still here

* test_exec_cmp_vopc

* clang did s_mov_b32 exec_lo, 1

* don't hardcode v1

* support volatile in tests

* hw_test passes

* only the volatile version

* subrev saturating behavior
2025-10-23 11:13:43 +08:00
George Hotz
e718254004 simpler end (#12879)
* simpler

* fix that
2025-10-23 10:35:58 +08:00
wozeparrot
6e00dec95d feat: pin openpilot 0.10.1 models (#12878) 2025-10-22 14:57:54 -07:00
wozeparrot
3a9aa05359 feat: extra nvcc options (#12876) 2025-10-22 13:21:11 -07:00
chenyu
f0831c8c30 add 0.10.0 to comma benchmark (#12875)
* add 0.10.0 to comma benchmark

disabled the 0.10.1 ones which are pinned to master. it does not work because benchmark uses the cached old version

* that's pinned
2025-10-22 15:18:21 -04:00
nimlgen
e7e535cd53 amd: sqtt for gfx9 (#12844)
* amd: start sqtt for gfx9

* writes something, but sometimes zeroes

* HEADER!

* w

* tiny

* mypy
2025-10-23 02:31:07 +08:00