wozeparrot
6252831ceb
feat: initial tk library ( #13160 )
2025-11-09 22:54:29 -08:00
George Hotz
925231aec1
repeat does less reshape for 1s ( #13183 )
2025-11-09 19:43:02 -08:00
George Hotz
d7369de048
hotfix: update weekly commits table
2025-11-09 19:37:06 -08:00
chenyu
6c48c87e51
improved ASSERT_MIN_STEP_TIME ( #13182 )
...
* improved ASSERT_MIN_STEP_TIME
getting close, current time +1ms then round up
* relax
2025-11-09 16:41:12 -05:00
nimlgen
17715688c7
system: validate vendor for APLPCIIfaceBase ( #13181 )
2025-11-10 02:49:21 +08:00
nimlgen
614783693e
nv: remove hardcoded expansion_rom_off ( #13180 )
...
* nv: remove hardcoded expansion_rom_off
* to max size
2025-11-09 21:43:19 +08:00
chenyu
e1d46de8f8
update GROUPTOP heuristic more ( #13178 )
...
reverts #13176
2025-11-09 02:31:12 -05:00
chenyu
41e45c20ff
minor stuff reading the printed code [pr] ( #13177 )
2025-11-09 00:58:51 -05:00
chenyu
8e868dced8
only GROUPTOP one reduce kernel ( #13176 )
...
* only GROUPTOP one reduce kernel
* ALLOWED_GATED_READ_IMAGE=148
2025-11-08 22:38:44 -05:00
chenyu
834067d91f
move onnx import in compile3 ( #13172 )
...
only used in test_vs_onnx
2025-11-08 09:44:34 -08:00
nimlgen
7f3240dbfe
nv: cleanup alloc ( #13170 )
...
* nv: cleanup alloc
* okay okay
2025-11-09 00:14:46 +08:00
qazal
7250fc0354
viz: double click on kernel run goes to codegen ( #13147 )
2025-11-08 23:40:50 +08:00
qazal
8a7fa9e7b4
sqtt: show total cycles of kernel in viz ( #13169 )
2025-11-08 21:00:40 +08:00
chenyu
2ba8b4946f
external_benchmark_op_cat.py ( #13168 )
...
* external_benchmark_op_cat.py
cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS
* fix
2025-11-08 01:54:10 -05:00
chenyu
a62496cb3d
clean up get_grouped_dims [pr] ( #13159 )
2025-11-08 01:53:54 -05:00
wozeparrot
eb0192b0bb
feat: print ranges that aren't ended ( #13167 )
2025-11-07 22:01:29 -08:00
George Hotz
b41541bc44
bounty: Remove Tensor._pool alternative implementation and verify kernels remain the same ( #13164 )
2025-11-07 16:59:48 -08:00
George Hotz
ffb9e8396f
fix indexing bug with convs
...
* minimal difference for ONE_POOL=1
* fix indexing bug
* improve indexing debugger
* more debugger improvements
* always for reshape
2025-11-07 16:45:19 -08:00
chenyu
6a509da7f3
Scheduler.reduceops helper [pr] ( #13162 )
2025-11-07 18:59:46 -05:00
George Hotz
2413311289
make _pool simpler ( #13161 )
...
* make _pool simpler
* just syntax
* more correct and smaller
* try this now
* Revert "try this now"
This reverts commit 607cdc2164 .
* ONE_POOL
2025-11-07 15:58:44 -08:00
George Hotz
70054cdb14
move backward cast to broadcasted, expand to mixins ( #13156 )
...
* shrink_to mixin
* move backward cast into _broadcasted
* expand to movement mixin
* move a few more
* fix spec issue
2025-11-07 15:07:47 -08:00
George Hotz
f2519ea0ba
shrink_to mixin ( #13155 )
2025-11-07 11:46:24 -08:00
C T
0f9d7f650d
whisper: fix oob, explicit dtype ( #13144 )
...
* fix dtype depending on numpy version
numpy v2 np.array returns int64 which Tensor passed through for the
first decode call, swallowing the <|notimestamps|> token and corrupting
the sequence
* fix whisper OOB
global limit on whisper's context length
* enforce whisper max_tokens_to_sample (match openai)
local limit on max tokens decoded
2025-11-07 12:55:01 -05:00
Ahmed Harmouche
3ecff3a8da
Fix dim splitting bug for len(dim) == len(limited) case ( #13142 )
...
* Fix gpudims bug on webgpu
* Fix split dim bug
* Remove webgpu_bug from examples
* Add test for shape correctness
* Fix 3D indexing
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-11-07 12:31:06 -05:00
nimlgen
b8e48effcb
device: no compilers message with reasons ( #13146 )
...
* device: no compilers message with reasons
* typings
* mypy
2025-11-07 23:01:45 +08:00
nimlgen
35e461ef69
hcq: use exception group ( #12616 )
...
* hcq: use exception group
* fix
2025-11-07 21:23:12 +08:00
nimlgen
10dc8335d2
tinygpu: fix teardown crash ( #13143 )
...
* tinygpu: fix crash
* um?
* double relase
* restore
2025-11-07 19:52:54 +08:00
qazal
d4a216d7d9
viz: display compiler errors ( #13141 )
2025-11-07 18:09:50 +08:00
qazal
7e94369464
add helper for test_timing custom ops ( #13140 )
2025-11-07 17:13:55 +08:00
nimlgen
95620426d5
tinygpu: unmap dma when client closed ( #13129 )
...
* tinygpu: unmap dma when client closed
* syn
* tiny fixes
2025-11-07 16:08:43 +08:00
wozeparrot
500d7661fa
feat: show range len on index in viz ( #13139 )
2025-11-06 23:21:27 -08:00
George Hotz
bb6364d7c7
tuplize from linearizer behind flag ( #13136 )
...
* remove tuplize from linearizer
* optional tuplize
2025-11-06 20:15:03 -08:00
chenyu
bb8cf948f2
variation of (x%c)+(x//c)*c = x ( #13135 )
...
when x is in the form of y//b, the idiv term might have combined
2025-11-06 18:53:28 -05:00
George Hotz
42b34cf83d
bottom up linearizer ( #13133 )
...
* bottom up linearizer
* late stores
* more complete
* remove broken heuristic
* upcast size
* opt
* more conservative
* it needs that
* disable opencl half on QCOM
* fix
* make that a real test
* cpu test okay
* ptx skip
* end is after the range
2025-11-06 15:30:32 -08:00
George Hotz
e0d828dba8
little cleanups
2025-11-06 13:58:19 -08:00
chenyu
bfb0c0391f
test custom eye function ( #13134 )
...
this version is also faster with NOOPT
2025-11-06 14:51:55 -05:00
George Hotz
290441dd44
do loads early ( #13131 )
...
* do loads early
* local and reg
2025-11-06 09:57:09 -08:00
George Hotz
097264853d
very simple priority ( #13130 )
...
* very simple priority
* still simple
2025-11-06 09:25:28 -08:00
George Hotz
07b415e831
fixup op order ( #13128 )
...
* fixup op order
* more order
* move a few more
* more
* DEBUG_LINEARIZE
2025-11-06 08:50:04 -08:00
nimlgen
b9b68bf437
amd: add kern to sqtt event ( #13126 )
...
* amd: add kern to sqtt event
* fix
2025-11-06 22:02:02 +08:00
qazal
88245d6579
qol improvements to sqtt decoder and timing tests ( #13125 )
2025-11-06 20:51:30 +08:00
nimlgen
dafdb4bfb1
test hcq open with pytest ( #13124 )
...
* test hcq open with pytest
* fi
2025-11-06 20:09:51 +08:00
nimlgen
05e2ff4d87
system: fix flock on pcidevs ( #13123 )
...
* system: fix locking of hcq devices
* rename and fullrun
* force ok
* fix
* fix
2025-11-06 19:02:13 +08:00
qazal
3126c89b84
viz: visible horizontal scrollbar in long texts ( #13122 )
2025-11-06 17:23:02 +08:00
George Hotz
91cc773397
add run count to toposort ( #13119 )
2025-11-05 22:29:34 -08:00
Adeeb Shihadeh
dca7fb0a49
qcom: make priority configurable ( #13120 )
2025-11-05 22:27:54 -08:00
qazal
b2bb3af12a
make range_color work in VIZ ( #13121 )
2025-11-06 14:26:48 +08:00
chenyu
f33c182393
test custom qkv kernel ( #13118 )
...
adding the online softmax hits infinite loop so starting with this
2025-11-05 23:32:13 -05:00
George Hotz
c65e6d8887
add ranges to print_uops ( #13116 )
...
* remove tuplize from linearizer
* try this
* simple priority
* add colored ranges to print_uops
* improve comments
* fix no const in src
* fix mypy
* fix define global
* fix var placement
* no prefer early load
* revert linearizer for now
2025-11-05 20:26:56 -08:00
George Hotz
9b2b535fa4
fix issue with multi flip ( #13115 )
2025-11-05 15:28:50 -08:00