George Hotz
87affa8661
num_batches_tracked has shape ()
2025-10-21 09:10:35 +08:00
George Hotz
25beea5769
hotfix: suppress_finalizing on device __del__
2025-10-21 09:04:36 +08:00
chenyu
c7c59e6dd7
unused UPat.or_broadcasted and GroupOp.Block [pr] ( #12819 )
2025-10-20 12:24:58 -04:00
nimlgen
e284f6325a
llvm: fix compile key for different processors ( #12812 )
2025-10-20 19:46:48 +08:00
George Hotz
203a93363c
Revert "after clean up of locals ( #12813 )" ( #12814 )
...
This reverts commit 5d0d3d7aac .
2025-10-20 19:33:35 +08:00
George Hotz
5d0d3d7aac
after clean up of locals ( #12813 )
2025-10-20 19:24:24 +08:00
George Hotz
d1e2c393f8
after in sym, axis_letters in range ( #12811 )
...
* after in sym, axis_letters in range
* this is better
* this work?
2025-10-20 18:54:37 +08:00
Sieds Lykles
a8e4614436
remove REAL_SUBSTITUTE=0 and make it fast ( #12809 )
...
* fast REAL_substitute
* remove REAL_SUBSTITUTE=0
2025-10-20 12:44:20 +02:00
Sieds Lykles
1e93d19ee3
stable diffusion --fakeweights ( #12810 )
2025-10-20 12:41:06 +02:00
nimlgen
b5e36e3c6c
nv: check if jitlink is avail ( #12808 )
...
* nv: check if jitlink is avail
* why
* fix
* fix
2025-10-20 18:13:16 +08:00
George Hotz
b8a9cce783
replace NOOP with AFTER in reg init ( #12804 )
...
* after op
* fix tests
* replace NOOP with AFTER in reg init
* closer
* or_after there
* fix device
* fix all renderers
* better spec for after
2025-10-20 15:34:32 +08:00
qazal
12fd2c9c7b
explicitly set ignore_indexing for schedule only ( #12803 )
2025-10-20 13:11:57 +08:00
qazal
734c99f722
viz: show indexing rewrites during run_rangeify ( #12802 )
...
* viz: show indexing rewrites during run_rangeify
* sinking index
2025-10-20 12:37:03 +08:00
George Hotz
2e9082e0bc
after op ( #12801 )
...
* after op
* fix tests
2025-10-20 12:27:56 +08:00
qazal
339e6edb7d
viz: ui prereqs for hierarchical rewrites ( #12799 )
2025-10-20 12:15:15 +08:00
wozeparrot
357dac8425
feat: allow tuple indexing on uops ( #12797 )
2025-10-19 19:11:05 -07:00
George Hotz
ba593f7b98
don't render index ( #12796 )
...
* don't render index
* update to ignore_indexing
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-10-20 09:48:36 +08:00
George Hotz
cad3ada909
tinygpu: build with SIP off works
2025-10-20 09:11:09 +08:00
nimlgen
9cd35deae7
amd: fix alignment + pointers for aql over usb ( #12793 )
2025-10-19 23:55:57 +08:00
nimlgen
59784a5972
amd: ensure ts is written ( #12794 )
2025-10-19 23:55:49 +08:00
chenyu
63a23dfe80
test step 0 in TestTrainingOnnxOps ( #12790 )
...
and tighter rtol
2025-10-19 09:15:49 -04:00
chenyu
e8158afd4b
update test_qlinear_add_round_half_to_even ( #12789 )
...
this does not pass locally
2025-10-19 08:47:27 -04:00
Sieds Lykles
1df9c7d7e7
reduce_collapse uses symbolic_flat ( #12766 )
...
* sym->symbolic_flat
* cast invalid drops invalid
2025-10-19 12:27:47 +02:00
Sieds Lykles
fd6ef4801c
rangeify uses symbolic_flat ( #12786 )
...
* symbolic_simple -> symbolic_flat
* remove expected failures
2025-10-19 12:27:14 +02:00
George Hotz
89e7f2fa00
mmapeak: gfx1103 support
2025-10-19 16:57:28 +08:00
George Hotz
617614beb7
add mi350x support to mmapeak ( #12784 )
2025-10-19 16:11:07 +08:00
qazal
c8ef4b60f6
viz: share match tracing and TINY device profiler ( #12783 )
...
* set a default name for the traces
* set profile_matches + renames
* profile_matches test
* traces 4 steps total
2025-10-19 14:30:07 +08:00
chenyu
350a4754a9
Update openpilot models ( #12780 )
...
* Update openpilot models
* Update slower model
* fix that
---------
Co-authored-by: Bruce Wayne <harald.the.engineer@gmail.com >
2025-10-18 20:32:35 -04:00
chenyu
30ff84d050
update test_conv2d_ceildiv_edge_case ( #12779 )
2025-10-18 16:43:32 -04:00
nimlgen
442218266d
qcom: fix profiler ( #12778 )
...
* qcom: fix profiler
* this way
2025-10-19 01:27:59 +08:00
Harald Schäfer
addc54b96c
Simplify openpilot compile3.py ( #12748 )
...
* Simpler compile3
* tests
* remove default args
* onnx file is still fp16
* self-test FP16 too
* allow test disable
* absurd tolerance
* Just do latest
* Try simplest
* use later models
* kernel count not relevant if speed is good
* dead improts
* Revert "dead improts"
This reverts commit f68c2cd15d .
* Revert "kernel count not relevant if speed is good"
This reverts commit 0955ca4ee0 .
* add back kernal count check on latest model
2025-10-18 10:12:22 -04:00
nimlgen
037f6e8fa0
qcom: ioctl for 7xx ( #12777 )
2025-10-18 20:33:14 +08:00
wozeparrot
82f10cfe2e
feat: assert on bufferview math ( #12772 )
2025-10-17 14:20:08 -07:00
chenyu
fcdf4ab37e
remove a contiguous in LARS ( #12770 )
2025-10-17 17:07:30 -04:00
nimlgen
910d698b78
system: cleanup page sizes ( #12771 )
...
* system: cleanup page sizes
* ooops
2025-10-18 02:06:42 +08:00
George Hotz
062a6d68d7
test flash attention backward ( #12762 )
...
* test flash attention backward
* TODO: fix pcontig
* end ranges
* render colors
* very big
* multiout at every level
* reset ending ranges
* fix tests
* ugh
2025-10-17 23:15:59 +08:00
George Hotz
33025b99f6
small changes from fa backward ( #12769 )
2025-10-17 22:41:18 +08:00
chenyu
e0d0d4372d
fix shape of m and v in onnx Adam with FUSE_OPTIM ( #12768 )
...
value is still slightly off but that's not onnx specific
2025-10-17 10:32:41 -04:00
qazal
bd662bea67
viz: light up program runs ( #12764 )
...
* basics work
* fix the color
* light up program events
* swap a with p
* better
2025-10-17 19:33:18 +08:00
George Hotz
c9a3464f76
those decimals never mattered ( #12760 )
...
* those decimals never mattered
* this
* improve debug
* real substitute fixes pcontig
* locals are different buffers
2025-10-17 17:16:24 +08:00
qazal
0160f034d6
viz: show display name for copy runners ( #12761 )
...
* viz: show display name for copy runners
* more u32
2025-10-17 16:59:51 +08:00
qazal
253d32b065
viz: add metadata to buffer user list ( #12758 )
...
* simple failing test
* encodings
* test passing
* key is deduped
2025-10-17 16:28:54 +08:00
George Hotz
935a60db72
bring back partial contig and flash attention ( #12756 )
...
* bring back partial contig and flash attention
* why not 2
* work
* that
* fix pcontig
2025-10-17 16:19:05 +08:00
Sieds Lykles
f6bc620169
UOp.prod and UOp.sum methods ( #12755 )
2025-10-17 10:02:01 +02:00
Sieds Lykles
d1bb5c0426
slightly flatter symbolic ( #12757 )
2025-10-17 09:58:45 +02:00
qazal
5417e4b099
viz helper cleanups ( #12754 )
2025-10-17 15:20:24 +08:00
qazal
3196a7aae3
viz: pre reqs for lighting up programs ( #12753 )
2025-10-17 15:03:21 +08:00
qazal
dfb8f9fc9e
viz: annotate buffer mutability in the memory graph ( #12750 )
2025-10-17 11:53:02 +08:00
Sieds Lykles
79c2f1ae26
remove reduce_rangless and replace with reduce_unparented ( #12749 )
2025-10-17 04:46:05 +02:00
chenyu
9561803cb0
fix assert in test_schedule ( #12745 )
...
* fix assert in test_schedule
updated kernel counts and some old tests
* fix
2025-10-16 15:39:50 -04:00