chenyu
834067d91f
move onnx import in compile3 ( #13172 )
...
only used in test_vs_onnx
2025-11-08 09:44:34 -08:00
nimlgen
7f3240dbfe
nv: cleanup alloc ( #13170 )
...
* nv: cleanup alloc
* okay okay
2025-11-09 00:14:46 +08:00
qazal
7250fc0354
viz: double click on kernel run goes to codegen ( #13147 )
2025-11-08 23:40:50 +08:00
qazal
8a7fa9e7b4
sqtt: show total cycles of kernel in viz ( #13169 )
2025-11-08 21:00:40 +08:00
chenyu
2ba8b4946f
external_benchmark_op_cat.py ( #13168 )
...
* external_benchmark_op_cat.py
cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS
* fix
2025-11-08 01:54:10 -05:00
chenyu
a62496cb3d
clean up get_grouped_dims [pr] ( #13159 )
2025-11-08 01:53:54 -05:00
wozeparrot
eb0192b0bb
feat: print ranges that aren't ended ( #13167 )
2025-11-07 22:01:29 -08:00
George Hotz
b41541bc44
bounty: Remove Tensor._pool alternative implementation and verify kernels remain the same ( #13164 )
2025-11-07 16:59:48 -08:00
George Hotz
ffb9e8396f
fix indexing bug with convs
...
* minimal difference for ONE_POOL=1
* fix indexing bug
* improve indexing debugger
* more debugger improvements
* always for reshape
2025-11-07 16:45:19 -08:00
chenyu
6a509da7f3
Scheduler.reduceops helper [pr] ( #13162 )
2025-11-07 18:59:46 -05:00
George Hotz
2413311289
make _pool simpler ( #13161 )
...
* make _pool simpler
* just syntax
* more correct and smaller
* try this now
* Revert "try this now"
This reverts commit 607cdc2164 .
* ONE_POOL
2025-11-07 15:58:44 -08:00
George Hotz
70054cdb14
move backward cast to broadcasted, expand to mixins ( #13156 )
...
* shrink_to mixin
* move backward cast into _broadcasted
* expand to movement mixin
* move a few more
* fix spec issue
2025-11-07 15:07:47 -08:00
George Hotz
f2519ea0ba
shrink_to mixin ( #13155 )
2025-11-07 11:46:24 -08:00
C T
0f9d7f650d
whisper: fix oob, explicit dtype ( #13144 )
...
* fix dtype depending on numpy version
numpy v2 np.array returns int64 which Tensor passed through for the
first decode call, swallowing the <|notimestamps|> token and corrupting
the sequence
* fix whisper OOB
global limit on whisper's context length
* enforce whisper max_tokens_to_sample (match openai)
local limit on max tokens decoded
2025-11-07 12:55:01 -05:00
Ahmed Harmouche
3ecff3a8da
Fix dim splitting bug for len(dim) == len(limited) case ( #13142 )
...
* Fix gpudims bug on webgpu
* Fix split dim bug
* Remove webgpu_bug from examples
* Add test for shape correctness
* Fix 3D indexing
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-11-07 12:31:06 -05:00
nimlgen
b8e48effcb
device: no compilers message with reasons ( #13146 )
...
* device: no compilers message with reasons
* typings
* mypy
2025-11-07 23:01:45 +08:00
nimlgen
35e461ef69
hcq: use exception group ( #12616 )
...
* hcq: use exception group
* fix
2025-11-07 21:23:12 +08:00
nimlgen
10dc8335d2
tinygpu: fix teardown crash ( #13143 )
...
* tinygpu: fix crash
* um?
* double relase
* restore
2025-11-07 19:52:54 +08:00
qazal
d4a216d7d9
viz: display compiler errors ( #13141 )
2025-11-07 18:09:50 +08:00
qazal
7e94369464
add helper for test_timing custom ops ( #13140 )
2025-11-07 17:13:55 +08:00
nimlgen
95620426d5
tinygpu: unmap dma when client closed ( #13129 )
...
* tinygpu: unmap dma when client closed
* syn
* tiny fixes
2025-11-07 16:08:43 +08:00
wozeparrot
500d7661fa
feat: show range len on index in viz ( #13139 )
2025-11-06 23:21:27 -08:00
George Hotz
bb6364d7c7
tuplize from linearizer behind flag ( #13136 )
...
* remove tuplize from linearizer
* optional tuplize
2025-11-06 20:15:03 -08:00
chenyu
bb8cf948f2
variation of (x%c)+(x//c)*c = x ( #13135 )
...
when x is in the form of y//b, the idiv term might have combined
2025-11-06 18:53:28 -05:00
George Hotz
42b34cf83d
bottom up linearizer ( #13133 )
...
* bottom up linearizer
* late stores
* more complete
* remove broken heuristic
* upcast size
* opt
* more conservative
* it needs that
* disable opencl half on QCOM
* fix
* make that a real test
* cpu test okay
* ptx skip
* end is after the range
2025-11-06 15:30:32 -08:00
George Hotz
e0d828dba8
little cleanups
2025-11-06 13:58:19 -08:00
chenyu
bfb0c0391f
test custom eye function ( #13134 )
...
this version is also faster with NOOPT
2025-11-06 14:51:55 -05:00
George Hotz
290441dd44
do loads early ( #13131 )
...
* do loads early
* local and reg
2025-11-06 09:57:09 -08:00
George Hotz
097264853d
very simple priority ( #13130 )
...
* very simple priority
* still simple
2025-11-06 09:25:28 -08:00
George Hotz
07b415e831
fixup op order ( #13128 )
...
* fixup op order
* more order
* move a few more
* more
* DEBUG_LINEARIZE
2025-11-06 08:50:04 -08:00
nimlgen
b9b68bf437
amd: add kern to sqtt event ( #13126 )
...
* amd: add kern to sqtt event
* fix
2025-11-06 22:02:02 +08:00
qazal
88245d6579
qol improvements to sqtt decoder and timing tests ( #13125 )
2025-11-06 20:51:30 +08:00
nimlgen
dafdb4bfb1
test hcq open with pytest ( #13124 )
...
* test hcq open with pytest
* fi
2025-11-06 20:09:51 +08:00
nimlgen
05e2ff4d87
system: fix flock on pcidevs ( #13123 )
...
* system: fix locking of hcq devices
* rename and fullrun
* force ok
* fix
* fix
2025-11-06 19:02:13 +08:00
qazal
3126c89b84
viz: visible horizontal scrollbar in long texts ( #13122 )
2025-11-06 17:23:02 +08:00
George Hotz
91cc773397
add run count to toposort ( #13119 )
2025-11-05 22:29:34 -08:00
Adeeb Shihadeh
dca7fb0a49
qcom: make priority configurable ( #13120 )
2025-11-05 22:27:54 -08:00
qazal
b2bb3af12a
make range_color work in VIZ ( #13121 )
2025-11-06 14:26:48 +08:00
chenyu
f33c182393
test custom qkv kernel ( #13118 )
...
adding the online softmax hits infinite loop so starting with this
2025-11-05 23:32:13 -05:00
George Hotz
c65e6d8887
add ranges to print_uops ( #13116 )
...
* remove tuplize from linearizer
* try this
* simple priority
* add colored ranges to print_uops
* improve comments
* fix no const in src
* fix mypy
* fix define global
* fix var placement
* no prefer early load
* revert linearizer for now
2025-11-05 20:26:56 -08:00
George Hotz
9b2b535fa4
fix issue with multi flip ( #13115 )
2025-11-05 15:28:50 -08:00
George Hotz
4027eef264
fix test warnings ( #13114 )
...
* fix test warnings
* precommit passes
* ignore std_mean warning
2025-11-05 15:06:29 -08:00
George Hotz
bcfe42937f
move permute/flip/shrink to mixins ( #13113 )
...
* move permute to mixins
* move more stuff
* two more
* fix local mypy
* fix tests
* fix shrink
2025-11-05 14:14:15 -08:00
George Hotz
2d4f01fda0
move mixins to mixin dir ( #13105 )
...
* move mixins to mixin dir
* math
2025-11-05 10:18:33 -08:00
chenyu
52f0081e77
use where instead of mul in Embedding ( #13112 )
2025-11-05 12:49:01 -05:00
b1tg
edc4e1aede
ignore trailing nops in llvm-objdump output ( #13110 )
2025-11-06 01:10:51 +08:00
chenyu
03ee0cfe45
minor fast_idiv cleanup [pr] ( #13109 )
2025-11-05 11:44:36 -05:00
chenyu
18d4ecc1f3
lower nv test_gemm_4096 target ( #13107 )
2025-11-05 11:05:16 -05:00
nimlgen
eff80beeed
amd: props in device not sqtt ( #13106 )
...
* amd: props in device not sqtt
* fix
* f
* fix
* fix
2025-11-05 23:43:20 +08:00
nimlgen
757ceab2a2
system: allow using vidmem for uc mem ( #13104 )
2025-11-05 19:12:59 +08:00