Commit Graph

10969 Commits

Author SHA1 Message Date
wozeparrot
6252831ceb feat: initial tk library (#13160) 2025-11-09 22:54:29 -08:00
George Hotz
925231aec1 repeat does less reshape for 1s (#13183) 2025-11-09 19:43:02 -08:00
George Hotz
d7369de048 hotfix: update weekly commits table 2025-11-09 19:37:06 -08:00
chenyu
6c48c87e51 improved ASSERT_MIN_STEP_TIME (#13182)
* improved ASSERT_MIN_STEP_TIME

getting close, current time +1ms  then round up

* relax
2025-11-09 16:41:12 -05:00
nimlgen
17715688c7 system: validate vendor for APLPCIIfaceBase (#13181) 2025-11-10 02:49:21 +08:00
nimlgen
614783693e nv: remove hardcoded expansion_rom_off (#13180)
* nv: remove hardcoded expansion_rom_off

* to max size
2025-11-09 21:43:19 +08:00
chenyu
e1d46de8f8 update GROUPTOP heuristic more (#13178)
reverts #13176
2025-11-09 02:31:12 -05:00
chenyu
41e45c20ff minor stuff reading the printed code [pr] (#13177) 2025-11-09 00:58:51 -05:00
chenyu
8e868dced8 only GROUPTOP one reduce kernel (#13176)
* only GROUPTOP one reduce kernel

* ALLOWED_GATED_READ_IMAGE=148
2025-11-08 22:38:44 -05:00
chenyu
834067d91f move onnx import in compile3 (#13172)
only used in test_vs_onnx
2025-11-08 09:44:34 -08:00
nimlgen
7f3240dbfe nv: cleanup alloc (#13170)
* nv: cleanup alloc

* okay okay
2025-11-09 00:14:46 +08:00
qazal
7250fc0354 viz: double click on kernel run goes to codegen (#13147) 2025-11-08 23:40:50 +08:00
qazal
8a7fa9e7b4 sqtt: show total cycles of kernel in viz (#13169) 2025-11-08 21:00:40 +08:00
chenyu
2ba8b4946f external_benchmark_op_cat.py (#13168)
* external_benchmark_op_cat.py

cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS

* fix
2025-11-08 01:54:10 -05:00
chenyu
a62496cb3d clean up get_grouped_dims [pr] (#13159) 2025-11-08 01:53:54 -05:00
wozeparrot
eb0192b0bb feat: print ranges that aren't ended (#13167) 2025-11-07 22:01:29 -08:00
George Hotz
b41541bc44 bounty: Remove Tensor._pool alternative implementation and verify kernels remain the same (#13164) 2025-11-07 16:59:48 -08:00
George Hotz
ffb9e8396f fix indexing bug with convs
* minimal difference for ONE_POOL=1

* fix indexing bug

* improve indexing debugger

* more debugger improvements

* always for reshape
2025-11-07 16:45:19 -08:00
chenyu
6a509da7f3 Scheduler.reduceops helper [pr] (#13162) 2025-11-07 18:59:46 -05:00
George Hotz
2413311289 make _pool simpler (#13161)
* make _pool simpler

* just syntax

* more correct and smaller

* try this now

* Revert "try this now"

This reverts commit 607cdc2164.

* ONE_POOL
2025-11-07 15:58:44 -08:00
George Hotz
70054cdb14 move backward cast to broadcasted, expand to mixins (#13156)
* shrink_to mixin

* move backward cast into _broadcasted

* expand to movement mixin

* move a few more

* fix spec issue
2025-11-07 15:07:47 -08:00
George Hotz
f2519ea0ba shrink_to mixin (#13155) 2025-11-07 11:46:24 -08:00
C T
0f9d7f650d whisper: fix oob, explicit dtype (#13144)
* fix dtype depending on numpy version

numpy v2 np.array returns int64 which Tensor passed through for the
first decode call, swallowing the <|notimestamps|> token and corrupting
the sequence

* fix whisper OOB

global limit on whisper's context length

* enforce whisper max_tokens_to_sample (match openai)

local limit on max tokens decoded
2025-11-07 12:55:01 -05:00
Ahmed Harmouche
3ecff3a8da Fix dim splitting bug for len(dim) == len(limited) case (#13142)
* Fix gpudims bug on webgpu

* Fix split dim bug

* Remove webgpu_bug from examples

* Add test for shape correctness

* Fix 3D indexing

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-11-07 12:31:06 -05:00
nimlgen
b8e48effcb device: no compilers message with reasons (#13146)
* device: no compilers message with reasons

* typings

* mypy
2025-11-07 23:01:45 +08:00
nimlgen
35e461ef69 hcq: use exception group (#12616)
* hcq: use exception group

* fix
2025-11-07 21:23:12 +08:00
nimlgen
10dc8335d2 tinygpu: fix teardown crash (#13143)
* tinygpu: fix crash

* um?

* double relase

* restore
2025-11-07 19:52:54 +08:00
qazal
d4a216d7d9 viz: display compiler errors (#13141) 2025-11-07 18:09:50 +08:00
qazal
7e94369464 add helper for test_timing custom ops (#13140) 2025-11-07 17:13:55 +08:00
nimlgen
95620426d5 tinygpu: unmap dma when client closed (#13129)
* tinygpu: unmap dma when client closed

* syn

* tiny fixes
2025-11-07 16:08:43 +08:00
wozeparrot
500d7661fa feat: show range len on index in viz (#13139) 2025-11-06 23:21:27 -08:00
George Hotz
bb6364d7c7 tuplize from linearizer behind flag (#13136)
* remove tuplize from linearizer

* optional tuplize
2025-11-06 20:15:03 -08:00
chenyu
bb8cf948f2 variation of (x%c)+(x//c)*c = x (#13135)
when x is in the form of y//b, the idiv term might have combined
2025-11-06 18:53:28 -05:00
George Hotz
42b34cf83d bottom up linearizer (#13133)
* bottom up linearizer

* late stores

* more complete

* remove broken heuristic

* upcast size

* opt

* more conservative

* it needs that

* disable opencl half on QCOM

* fix

* make that a real test

* cpu test okay

* ptx skip

* end is after the range
2025-11-06 15:30:32 -08:00
George Hotz
e0d828dba8 little cleanups 2025-11-06 13:58:19 -08:00
chenyu
bfb0c0391f test custom eye function (#13134)
this version is also faster with NOOPT
2025-11-06 14:51:55 -05:00
George Hotz
290441dd44 do loads early (#13131)
* do loads early

* local and reg
2025-11-06 09:57:09 -08:00
George Hotz
097264853d very simple priority (#13130)
* very simple priority

* still simple
2025-11-06 09:25:28 -08:00
George Hotz
07b415e831 fixup op order (#13128)
* fixup op order

* more order

* move a few more

* more

* DEBUG_LINEARIZE
2025-11-06 08:50:04 -08:00
nimlgen
b9b68bf437 amd: add kern to sqtt event (#13126)
* amd: add kern to sqtt event

* fix
2025-11-06 22:02:02 +08:00
qazal
88245d6579 qol improvements to sqtt decoder and timing tests (#13125) 2025-11-06 20:51:30 +08:00
nimlgen
dafdb4bfb1 test hcq open with pytest (#13124)
* test hcq open with pytest

* fi
2025-11-06 20:09:51 +08:00
nimlgen
05e2ff4d87 system: fix flock on pcidevs (#13123)
* system: fix locking of hcq devices

* rename and fullrun

* force ok

* fix

* fix
2025-11-06 19:02:13 +08:00
qazal
3126c89b84 viz: visible horizontal scrollbar in long texts (#13122) 2025-11-06 17:23:02 +08:00
George Hotz
91cc773397 add run count to toposort (#13119) 2025-11-05 22:29:34 -08:00
Adeeb Shihadeh
dca7fb0a49 qcom: make priority configurable (#13120) 2025-11-05 22:27:54 -08:00
qazal
b2bb3af12a make range_color work in VIZ (#13121) 2025-11-06 14:26:48 +08:00
chenyu
f33c182393 test custom qkv kernel (#13118)
adding the online softmax hits infinite loop so starting with this
2025-11-05 23:32:13 -05:00
George Hotz
c65e6d8887 add ranges to print_uops (#13116)
* remove tuplize from linearizer

* try this

* simple priority

* add colored ranges to print_uops

* improve comments

* fix no const in src

* fix mypy

* fix define global

* fix var placement

* no prefer early load

* revert linearizer for now
2025-11-05 20:26:56 -08:00
George Hotz
9b2b535fa4 fix issue with multi flip (#13115) 2025-11-05 15:28:50 -08:00