Commit Graph

10947 Commits

Author SHA1 Message Date
C T
0f9d7f650d whisper: fix oob, explicit dtype (#13144)
* fix dtype depending on numpy version

numpy v2 np.array returns int64 which Tensor passed through for the
first decode call, swallowing the <|notimestamps|> token and corrupting
the sequence

* fix whisper OOB

global limit on whisper's context length

* enforce whisper max_tokens_to_sample (match openai)

local limit on max tokens decoded
2025-11-07 12:55:01 -05:00
Ahmed Harmouche
3ecff3a8da Fix dim splitting bug for len(dim) == len(limited) case (#13142)
* Fix gpudims bug on webgpu

* Fix split dim bug

* Remove webgpu_bug from examples

* Add test for shape correctness

* Fix 3D indexing

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-11-07 12:31:06 -05:00
nimlgen
b8e48effcb device: no compilers message with reasons (#13146)
* device: no compilers message with reasons

* typings

* mypy
2025-11-07 23:01:45 +08:00
nimlgen
35e461ef69 hcq: use exception group (#12616)
* hcq: use exception group

* fix
2025-11-07 21:23:12 +08:00
nimlgen
10dc8335d2 tinygpu: fix teardown crash (#13143)
* tinygpu: fix crash

* um?

* double relase

* restore
2025-11-07 19:52:54 +08:00
qazal
d4a216d7d9 viz: display compiler errors (#13141) 2025-11-07 18:09:50 +08:00
qazal
7e94369464 add helper for test_timing custom ops (#13140) 2025-11-07 17:13:55 +08:00
nimlgen
95620426d5 tinygpu: unmap dma when client closed (#13129)
* tinygpu: unmap dma when client closed

* syn

* tiny fixes
2025-11-07 16:08:43 +08:00
wozeparrot
500d7661fa feat: show range len on index in viz (#13139) 2025-11-06 23:21:27 -08:00
George Hotz
bb6364d7c7 tuplize from linearizer behind flag (#13136)
* remove tuplize from linearizer

* optional tuplize
2025-11-06 20:15:03 -08:00
chenyu
bb8cf948f2 variation of (x%c)+(x//c)*c = x (#13135)
when x is in the form of y//b, the idiv term might have combined
2025-11-06 18:53:28 -05:00
George Hotz
42b34cf83d bottom up linearizer (#13133)
* bottom up linearizer

* late stores

* more complete

* remove broken heuristic

* upcast size

* opt

* more conservative

* it needs that

* disable opencl half on QCOM

* fix

* make that a real test

* cpu test okay

* ptx skip

* end is after the range
2025-11-06 15:30:32 -08:00
George Hotz
e0d828dba8 little cleanups 2025-11-06 13:58:19 -08:00
chenyu
bfb0c0391f test custom eye function (#13134)
this version is also faster with NOOPT
2025-11-06 14:51:55 -05:00
George Hotz
290441dd44 do loads early (#13131)
* do loads early

* local and reg
2025-11-06 09:57:09 -08:00
George Hotz
097264853d very simple priority (#13130)
* very simple priority

* still simple
2025-11-06 09:25:28 -08:00
George Hotz
07b415e831 fixup op order (#13128)
* fixup op order

* more order

* move a few more

* more

* DEBUG_LINEARIZE
2025-11-06 08:50:04 -08:00
nimlgen
b9b68bf437 amd: add kern to sqtt event (#13126)
* amd: add kern to sqtt event

* fix
2025-11-06 22:02:02 +08:00
qazal
88245d6579 qol improvements to sqtt decoder and timing tests (#13125) 2025-11-06 20:51:30 +08:00
nimlgen
dafdb4bfb1 test hcq open with pytest (#13124)
* test hcq open with pytest

* fi
2025-11-06 20:09:51 +08:00
nimlgen
05e2ff4d87 system: fix flock on pcidevs (#13123)
* system: fix locking of hcq devices

* rename and fullrun

* force ok

* fix

* fix
2025-11-06 19:02:13 +08:00
qazal
3126c89b84 viz: visible horizontal scrollbar in long texts (#13122) 2025-11-06 17:23:02 +08:00
George Hotz
91cc773397 add run count to toposort (#13119) 2025-11-05 22:29:34 -08:00
Adeeb Shihadeh
dca7fb0a49 qcom: make priority configurable (#13120) 2025-11-05 22:27:54 -08:00
qazal
b2bb3af12a make range_color work in VIZ (#13121) 2025-11-06 14:26:48 +08:00
chenyu
f33c182393 test custom qkv kernel (#13118)
adding the online softmax hits infinite loop so starting with this
2025-11-05 23:32:13 -05:00
George Hotz
c65e6d8887 add ranges to print_uops (#13116)
* remove tuplize from linearizer

* try this

* simple priority

* add colored ranges to print_uops

* improve comments

* fix no const in src

* fix mypy

* fix define global

* fix var placement

* no prefer early load

* revert linearizer for now
2025-11-05 20:26:56 -08:00
George Hotz
9b2b535fa4 fix issue with multi flip (#13115) 2025-11-05 15:28:50 -08:00
George Hotz
4027eef264 fix test warnings (#13114)
* fix test warnings

* precommit passes

* ignore std_mean warning
2025-11-05 15:06:29 -08:00
George Hotz
bcfe42937f move permute/flip/shrink to mixins (#13113)
* move permute to mixins

* move more stuff

* two more

* fix local mypy

* fix tests

* fix shrink
2025-11-05 14:14:15 -08:00
George Hotz
2d4f01fda0 move mixins to mixin dir (#13105)
* move mixins to mixin dir

* math
2025-11-05 10:18:33 -08:00
chenyu
52f0081e77 use where instead of mul in Embedding (#13112) 2025-11-05 12:49:01 -05:00
b1tg
edc4e1aede ignore trailing nops in llvm-objdump output (#13110) 2025-11-06 01:10:51 +08:00
chenyu
03ee0cfe45 minor fast_idiv cleanup [pr] (#13109) 2025-11-05 11:44:36 -05:00
chenyu
18d4ecc1f3 lower nv test_gemm_4096 target (#13107) 2025-11-05 11:05:16 -05:00
nimlgen
eff80beeed amd: props in device not sqtt (#13106)
* amd: props in device not sqtt

* fix

* f

* fix

* fix
2025-11-05 23:43:20 +08:00
nimlgen
757ceab2a2 system: allow using vidmem for uc mem (#13104) 2025-11-05 19:12:59 +08:00
qazal
8119d9f082 sqtt: decode each instruction exec (#13093)
* sqtt: decode each instruction exec

* start tests

* run_asm

* capture sqtt per kernel

* chaining vgprs

* test things

* inst_execs in viz

* can also configure l and g

* 1l + cleanup

* test_sleep

* test_wmma

* work

* test sleep with llvm builtin
2025-11-05 17:30:27 +08:00
chenyu
54141e9cb9 DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096) 2025-11-04 11:28:18 -05:00
chenyu
1c9f720654 remove unused type ignore [pr] (#13095) 2025-11-04 10:08:07 -05:00
nimlgen
c857dc5af0 autogen: try/except in try_dlopen (#13094)
* autogen: try/except in try_dlopen

* ugh
2025-11-04 22:51:53 +08:00
nimlgen
eaf7cbc178 amd: flush sqtt after each kernel (#13092)
* amd: flush sqtt after each kernel

* merge for rgp
2025-11-04 22:12:48 +08:00
qazal
96417665e8 show sqtt decoder errs in viz (#13088)
* show sqtt decoder errs in viz

* don't touch roc.py

* give hljs a default language

* work from tinyr9

* work
2025-11-04 22:05:06 +08:00
nimlgen
49191ada77 roc: install sqtt decoder (#13091)
* roc: install?

* msg

* 0.1.4
2025-11-04 18:56:01 +08:00
nimlgen
16f1f644ba amd: remove sqtt=2 (#13090) 2025-11-04 18:29:24 +08:00
nimlgen
2e97eaa866 roc: no nullptr when no wave instructions (#13087) 2025-11-04 17:32:14 +08:00
wozeparrot
9c00c0688a tk fa: use 16x64 tiles (#13086) 2025-11-03 18:25:38 -08:00
wozeparrot
4ed0f216b5 fix: make max_matmul run again (#13085) 2025-11-03 18:09:09 -08:00
chenyu
ca17718b6d remove symbolic_flat (#13083)
* remove symbolic_flat

some kernels are different but sometimes it's better so not clear, will merge as long as benchmark passes

* test_location
2025-11-03 17:25:21 -05:00
chenyu
fda720e013 simpler _is_balanced [pr] (#13082)
returns False earlier
2025-11-03 16:47:14 -05:00