George Hotz
c24ac16841
more syntactic sugar for pyrender
2025-10-28 15:15:06 +08:00
Sieds Lykles
e110f4632a
split cat (on cpu) ( #12864 )
...
* split ranges but only on cpu
* except KernelOptError for threads
* use GROUP and END
* no more flatten_range needed
* remove noop end
* always process replay for openpilot
* update test
* skip test
* fix in outs calculation
With the new linearizer the toposort is a problem, this matches the spec
now
* undo that
2025-10-28 07:55:19 +01:00
qazal
3b82dee625
viz: match DEBUG=2 for exec item metadata ( #12966 )
...
* viz: match DEBUG=2 for exec item metadata
* remove repr from kernel
2025-10-28 14:53:57 +08:00
qazal
99589dea81
move viz edge tagging to UOp graph ( #12964 )
2025-10-28 12:46:23 +08:00
George Hotz
bbe0bebbf3
no range tags in kernels ( #12962 )
2025-10-28 12:33:48 +08:00
George Hotz
39c2117dea
cleanup pyrender ( #12961 )
2025-10-28 10:47:39 +08:00
George Hotz
2832954bcb
test with IGNORE_OOB=0 ( #12960 )
2025-10-28 10:32:19 +08:00
George Hotz
7784cec48e
pytest-split on spec ( #12959 )
2025-10-28 10:09:01 +08:00
George Hotz
4d817a289e
simplify spec ( #12958 )
...
* simplify spec
* more
2025-10-28 09:52:32 +08:00
George Hotz
62e62d8760
move verify to spec / cleanup ( #12956 )
...
* move verify to spec / cleanup
* lil
* more explicit
2025-10-28 08:58:10 +08:00
wozeparrot
24884c6768
fix: don't use KITTENS_HOPPER for 4090 ( #12954 )
2025-10-27 17:19:53 -07:00
nimlgen
372d9e5753
hcq: helper for visible devices ( #12950 )
...
* hcq: helper for visible devices
* fix
* f
2025-10-28 02:27:56 +08:00
Justin Erenkrantz
f2ffe9c8cf
Apply an override for nbio 7.3.0 to 7.2.0. ( #12949 )
2025-10-27 11:10:10 -07:00
qazal
63484d837e
Revert "viz graph drawing cleanups ( #12933 )" ( #12947 )
...
This reverts commit 189582db5e .
2025-10-28 00:39:37 +08:00
chenyu
a79832b01f
control_flow.py -> linearizer.py [pr] ( #12948 )
2025-10-27 12:38:13 -04:00
b1tg
45e2f916a3
add quantize fp8 in llama3 ( #12893 )
...
* add quantize fp8 in llama3
* don't truncate fp8 alu result
* cast to float32 before matmul
* --model weights/LLaMA-3/8B-SF-DPO/
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-10-27 10:22:57 -04:00
George Hotz
25c2da1579
check SPEC=2 in CI ( #12945 )
...
* check SPEC=2 in CI
* split SPEC=2
* fast enough
2025-10-27 21:53:57 +08:00
Sieds Lykles
072f7c35c5
fix in/outs calculation in ProgramSpec ( #12937 )
...
With the new linearizer the toposort is a problem, this matches the spec
now
2025-10-27 12:31:41 +01:00
qazal
e93c9bf6a7
viz: extend main code block to full height ( #12944 )
2025-10-27 18:43:49 +08:00
George Hotz
273b1f914d
new pyrender, tested with SPEC=2 ( #12934 )
...
* pyrender always works with SPEC=3
* test pyrender
* work
* work
* work
* .sintify
* v const
* kernelize
* pyrender
* viz always
* optional forced_reshape
* cleanups
2025-10-27 18:41:51 +08:00
George Hotz
701a632907
move VECTORIZE/CONST ( #12942 )
2025-10-27 17:37:13 +08:00
nimlgen
95748a4518
nv: map vram after resets ( #12938 )
2025-10-27 17:17:07 +08:00
George Hotz
8fb545c475
don't late simplify on marg ( #12941 )
2025-10-27 17:07:41 +08:00
George Hotz
7139e036c5
bugfixes from pyrender ( #12940 )
2025-10-27 16:56:53 +08:00
George Hotz
804133cffd
rename RECIP to RECIPROCAL ( #12939 )
2025-10-27 16:53:13 +08:00
nimlgen
f4da94af28
system: reset is a method of pcidevice ( #12936 )
2025-10-27 16:21:10 +08:00
wozeparrot
6b54378eba
working kitten matmul ( #12935 )
2025-10-26 23:40:49 -07:00
qazal
189582db5e
viz graph drawing cleanups ( #12933 )
...
* viz: make node label dims optional
* inplace edge updates
* change that
2025-10-27 13:59:32 +08:00
qazal
70ba84eb04
viz: generic node label centering ( #12925 )
...
* viz: correct node label centering
* matches
* overlay
* the other way
2025-10-27 12:02:34 +08:00
Sieds Lykles
eaeaea2f9c
pyrender Ops.SPECIAL and use correct dtype for Ops.RANGE rendering ( #12931 )
2025-10-27 03:21:34 +01:00
nimlgen
8c1368cab6
system: class PCIBarInfo ( #12930 )
...
* system: class PCIBarInfo
* fix
2025-10-27 03:57:42 +08:00
nimlgen
f00009c731
hcq: drivers take pcidev ( #12929 )
...
* hcq: drivers take pcidev
* fix nv
2025-10-26 20:43:51 +08:00
ttomsa
99a519f068
linearizer cleanup ( #12923 )
...
* cleanup
* comments
* also this
2025-10-26 18:30:12 +08:00
George Hotz
c0c24d3a70
cleanup wmma ( #12927 )
...
* cleanup wmma
* fix test_ops failures on android
2025-10-26 18:26:47 +08:00
George Hotz
0a32ab0006
nitpicks from typecheckers ( #12926 )
...
* nitpicks from the typechecker
* more
2025-10-26 17:52:55 +08:00
George Hotz
db5c918215
source extra/cl_android.sh to fix opencl on android
2025-10-26 15:27:51 +08:00
qazal
c94e597b3e
viz ui selector cleanups ( #12924 )
2025-10-26 14:40:47 +08:00
chenyu
94701d4838
clean up divide_exact order [pr] ( #12919 )
...
do the const first since ADD can also call into that
2025-10-25 18:47:57 -04:00
chenyu
e18922f111
limit AND const min max to ints [pr] ( #12918 )
2025-10-25 16:07:52 -04:00
nimlgen
92324172be
amd: refactor usb into usbdevice ( #12916 )
...
* amd: refactor usb into usbdevice
* nu
* my bad
* ops
* my bad
2025-10-26 01:00:19 +08:00
qazal
3b192f5eac
split viz graph rendering from dag layout ( #12914 )
2025-10-25 15:36:44 +08:00
George Hotz
6415e3e8a7
use Ops.GROUP instead of Ops.NOOP for merging stores ( #12912 )
...
* use Ops.GROUP instead of Ops.NOOP for merging stores
* fs noop
2025-10-25 12:26:12 +08:00
George Hotz
b4f6a2c7a3
add kernel spec ( #12911 )
...
* add kernel spec
* fix kernel spec
2025-10-25 11:49:20 +08:00
George Hotz
8a941d95a4
SPEC=2 is full spec, SPEC=1 is default ( #12910 )
...
* SPEC=1 passes all tests
* just use SPEC, not __debug__
2025-10-25 11:10:43 +08:00
wozeparrot
456560c1ff
stateless tinyfs copyin ( #12908 )
2025-10-24 19:18:38 -07:00
wozeparrot
a5b0f57067
clean: cleanup tinyfs copyout ( #12907 )
2025-10-24 18:32:55 -07:00
chenyu
4b7329001d
clean up test_avg_pool3d ( #12905 )
2025-10-24 14:31:36 -04:00
George Hotz
6b35467f53
stores don't end ranges ( #12902 )
...
* early endrange
* bugfixes
2025-10-24 23:05:03 +08:00
nimlgen
5b5ba31a86
amd: make sqtt bufs uc ( #12898 )
2025-10-24 18:55:14 +08:00
Sieds Lykles
e1f8c82938
Onnx Layer/Group/RMS/Batch-Norm ReduceL2 fp32 intermediates for fp16 ( #12109 )
...
* match onnx spec
* use least_upper_dtype
* promote the square
* just cast before the square
2025-10-24 12:26:11 +02:00