Commit Graph

10794 Commits

Author SHA1 Message Date
George Hotz
c24ac16841 more syntactic sugar for pyrender 2025-10-28 15:15:06 +08:00
Sieds Lykles
e110f4632a split cat (on cpu) (#12864)
* split ranges but only on cpu

* except KernelOptError for threads

* use GROUP and END

* no more flatten_range needed

* remove noop end

* always process replay for openpilot

* update test

* skip test

* fix in outs calculation

With the new linearizer the toposort is a problem, this matches the spec
now

* undo that
2025-10-28 07:55:19 +01:00
qazal
3b82dee625 viz: match DEBUG=2 for exec item metadata (#12966)
* viz: match DEBUG=2 for exec item metadata

* remove repr from kernel
2025-10-28 14:53:57 +08:00
qazal
99589dea81 move viz edge tagging to UOp graph (#12964) 2025-10-28 12:46:23 +08:00
George Hotz
bbe0bebbf3 no range tags in kernels (#12962) 2025-10-28 12:33:48 +08:00
George Hotz
39c2117dea cleanup pyrender (#12961) 2025-10-28 10:47:39 +08:00
George Hotz
2832954bcb test with IGNORE_OOB=0 (#12960) 2025-10-28 10:32:19 +08:00
George Hotz
7784cec48e pytest-split on spec (#12959) 2025-10-28 10:09:01 +08:00
George Hotz
4d817a289e simplify spec (#12958)
* simplify spec

* more
2025-10-28 09:52:32 +08:00
George Hotz
62e62d8760 move verify to spec / cleanup (#12956)
* move verify to spec / cleanup

* lil

* more explicit
2025-10-28 08:58:10 +08:00
wozeparrot
24884c6768 fix: don't use KITTENS_HOPPER for 4090 (#12954) 2025-10-27 17:19:53 -07:00
nimlgen
372d9e5753 hcq: helper for visible devices (#12950)
* hcq: helper for visible devices

* fix

* f
2025-10-28 02:27:56 +08:00
Justin Erenkrantz
f2ffe9c8cf Apply an override for nbio 7.3.0 to 7.2.0. (#12949) 2025-10-27 11:10:10 -07:00
qazal
63484d837e Revert "viz graph drawing cleanups (#12933)" (#12947)
This reverts commit 189582db5e.
2025-10-28 00:39:37 +08:00
chenyu
a79832b01f control_flow.py -> linearizer.py [pr] (#12948) 2025-10-27 12:38:13 -04:00
b1tg
45e2f916a3 add quantize fp8 in llama3 (#12893)
* add quantize fp8 in llama3

* don't truncate fp8 alu result

* cast to float32 before matmul

* --model weights/LLaMA-3/8B-SF-DPO/

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-27 10:22:57 -04:00
George Hotz
25c2da1579 check SPEC=2 in CI (#12945)
* check SPEC=2 in CI

* split SPEC=2

* fast enough
2025-10-27 21:53:57 +08:00
Sieds Lykles
072f7c35c5 fix in/outs calculation in ProgramSpec (#12937)
With the new linearizer the toposort is a problem, this matches the spec
now
2025-10-27 12:31:41 +01:00
qazal
e93c9bf6a7 viz: extend main code block to full height (#12944) 2025-10-27 18:43:49 +08:00
George Hotz
273b1f914d new pyrender, tested with SPEC=2 (#12934)
* pyrender always works with SPEC=3

* test pyrender

* work

* work

* work

* .sintify

* v const

* kernelize

* pyrender

* viz always

* optional forced_reshape

* cleanups
2025-10-27 18:41:51 +08:00
George Hotz
701a632907 move VECTORIZE/CONST (#12942) 2025-10-27 17:37:13 +08:00
nimlgen
95748a4518 nv: map vram after resets (#12938) 2025-10-27 17:17:07 +08:00
George Hotz
8fb545c475 don't late simplify on marg (#12941) 2025-10-27 17:07:41 +08:00
George Hotz
7139e036c5 bugfixes from pyrender (#12940) 2025-10-27 16:56:53 +08:00
George Hotz
804133cffd rename RECIP to RECIPROCAL (#12939) 2025-10-27 16:53:13 +08:00
nimlgen
f4da94af28 system: reset is a method of pcidevice (#12936) 2025-10-27 16:21:10 +08:00
wozeparrot
6b54378eba working kitten matmul (#12935) 2025-10-26 23:40:49 -07:00
qazal
189582db5e viz graph drawing cleanups (#12933)
* viz: make node label dims optional

* inplace edge updates

* change that
2025-10-27 13:59:32 +08:00
qazal
70ba84eb04 viz: generic node label centering (#12925)
* viz: correct node label centering

* matches

* overlay

* the other way
2025-10-27 12:02:34 +08:00
Sieds Lykles
eaeaea2f9c pyrender Ops.SPECIAL and use correct dtype for Ops.RANGE rendering (#12931) 2025-10-27 03:21:34 +01:00
nimlgen
8c1368cab6 system: class PCIBarInfo (#12930)
* system: class PCIBarInfo

* fix
2025-10-27 03:57:42 +08:00
nimlgen
f00009c731 hcq: drivers take pcidev (#12929)
* hcq: drivers take pcidev

* fix nv
2025-10-26 20:43:51 +08:00
ttomsa
99a519f068 linearizer cleanup (#12923)
* cleanup

* comments

* also this
2025-10-26 18:30:12 +08:00
George Hotz
c0c24d3a70 cleanup wmma (#12927)
* cleanup wmma

* fix test_ops failures on android
2025-10-26 18:26:47 +08:00
George Hotz
0a32ab0006 nitpicks from typecheckers (#12926)
* nitpicks from the typechecker

* more
2025-10-26 17:52:55 +08:00
George Hotz
db5c918215 source extra/cl_android.sh to fix opencl on android 2025-10-26 15:27:51 +08:00
qazal
c94e597b3e viz ui selector cleanups (#12924) 2025-10-26 14:40:47 +08:00
chenyu
94701d4838 clean up divide_exact order [pr] (#12919)
do the const first since ADD can also call into that
2025-10-25 18:47:57 -04:00
chenyu
e18922f111 limit AND const min max to ints [pr] (#12918) 2025-10-25 16:07:52 -04:00
nimlgen
92324172be amd: refactor usb into usbdevice (#12916)
* amd: refactor usb into usbdevice

* nu

* my bad

* ops

* my bad
2025-10-26 01:00:19 +08:00
qazal
3b192f5eac split viz graph rendering from dag layout (#12914) 2025-10-25 15:36:44 +08:00
George Hotz
6415e3e8a7 use Ops.GROUP instead of Ops.NOOP for merging stores (#12912)
* use Ops.GROUP instead of Ops.NOOP for merging stores

* fs noop
2025-10-25 12:26:12 +08:00
George Hotz
b4f6a2c7a3 add kernel spec (#12911)
* add kernel spec

* fix kernel spec
2025-10-25 11:49:20 +08:00
George Hotz
8a941d95a4 SPEC=2 is full spec, SPEC=1 is default (#12910)
* SPEC=1 passes all tests

* just use SPEC, not __debug__
2025-10-25 11:10:43 +08:00
wozeparrot
456560c1ff stateless tinyfs copyin (#12908) 2025-10-24 19:18:38 -07:00
wozeparrot
a5b0f57067 clean: cleanup tinyfs copyout (#12907) 2025-10-24 18:32:55 -07:00
chenyu
4b7329001d clean up test_avg_pool3d (#12905) 2025-10-24 14:31:36 -04:00
George Hotz
6b35467f53 stores don't end ranges (#12902)
* early endrange

* bugfixes
2025-10-24 23:05:03 +08:00
nimlgen
5b5ba31a86 amd: make sqtt bufs uc (#12898) 2025-10-24 18:55:14 +08:00
Sieds Lykles
e1f8c82938 Onnx Layer/Group/RMS/Batch-Norm ReduceL2 fp32 intermediates for fp16 (#12109)
* match onnx spec

* use least_upper_dtype

* promote the square

* just cast before the square
2025-10-24 12:26:11 +02:00