Commit Graph

10600 Commits

Author SHA1 Message Date
George Hotz
0b00981cd1 fix wmma 2025-10-15 09:38:07 +08:00
George Hotz
5d485660da reproed failure in emulation 2025-10-15 09:19:56 +08:00
George Hotz
0e06b5cbb6 support emulate in the NullDevice 2025-10-15 09:11:57 +08:00
George Hotz
3a4a3e09ea Merge branch 'master' into new_shape 2025-10-14 21:16:44 +08:00
chenyu
70dd297a05 BS=96 for bert (#12675)
96 trains fine now
2025-10-14 09:07:43 -04:00
George Hotz
3b0b3dcff3 oops, i didn't mean to change that 2025-10-14 20:12:47 +08:00
George Hotz
accee5d840 hack for 3 op assign 2025-10-14 20:10:53 +08:00
George Hotz
e6812bbe63 one less st 2025-10-14 19:54:37 +08:00
George Hotz
18a6492e98 test is broken 2025-10-14 19:50:17 +08:00
George Hotz
9723b4f1c1 close 2025-10-14 19:46:52 +08:00
Sieds Lykles
852d80dff9 better where on load folding (#12651)
* move where clauses to load

* shorten line

* drop clauses if they are duplicated

* add rule for swapped where branch

* where on ungated load

* dont move clause if load is in the clause

* parse_valid returns None

* no data dependent branches

* fix rule

* enable swapped rule

* remove those
2025-10-14 13:30:47 +02:00
George Hotz
07162df323 size doesn't use st 2025-10-14 19:30:24 +08:00
George Hotz
7a2e206a0d fix tests 2025-10-14 19:22:16 +08:00
nimlgen
c7e63601fd gfx1200 tc for AMD_LLVM (#12673) 2025-10-14 19:17:48 +08:00
George Hotz
61855c24a8 Merge branch 'master' into new_shape 2025-10-14 19:15:09 +08:00
George Hotz
db4a359374 fix up some slow tests that launch python (#12672)
* fix up some slow tests that launch python

* svd nonfull in parallel

* split test_advancedindex
2025-10-14 19:13:55 +08:00
George Hotz
28076d9270 const uses _shape 2025-10-14 19:12:03 +08:00
nimlgen
4918c827c2 amd: lib_gpu does not need cpu_access (#12670) 2025-10-14 18:34:34 +08:00
nimlgen
0c9d47deab hcq: add alignment to kernargs (#12669) 2025-10-14 18:33:12 +08:00
George Hotz
d51cae1396 shape is good 2025-10-14 18:28:00 +08:00
George Hotz
0b69698ad4 mostly works 2025-10-14 18:19:02 +08:00
qazal
d3bfcd3277 minor patches for SQTT over usb on gfx12 (#12627)
* disable cpu_access in the sqtt buffer allocation

not sure if this is required, it results in a very slow call to
pcie_mem_write over USB GPU, removing it worked fine.

* fix itrace_se_mask on gfx12

on gfx11 it gave 6 se, on gfx11 this value is 2 so no instructions were
traced.

* Revert "fix itrace_se_mask on gfx12"

This reverts commit 0644adbcd1.
2025-10-14 18:07:46 +08:00
Sieds Lykles
1e6e5a0efd parse_valid returns None instead of raising (#12663)
* parse_valid returns None

* change there too
2025-10-14 11:57:38 +02:00
George Hotz
04ead92ebd _shape is like _device 2025-10-14 17:53:17 +08:00
qazal
471bd30d16 cleanup viz/serve.py (#12665)
* use load_pickle

* update comment
2025-10-14 17:50:39 +08:00
George Hotz
faddebef07 need to cache it 2025-10-14 17:35:29 +08:00
George Hotz
a659cb18a4 all mops 2025-10-14 17:24:08 +08:00
George Hotz
8721b6884c more mops 2025-10-14 17:20:04 +08:00
George Hotz
59512a49fa reshape causing issues 2025-10-14 16:59:25 +08:00
George Hotz
a73b59caa2 work on shape property 2025-10-14 16:50:43 +08:00
George Hotz
fb61f3519f remove assign contiguous hack (#12659)
* remove assign contiguous hack

* remove bad contiguous usage in torch backend

* assign
2025-10-14 16:42:14 +08:00
George Hotz
30ee7c4c26 cleanup Device usage in Tensor (#12662) 2025-10-14 16:22:22 +08:00
Sieds Lykles
e06cbfcb8a combine pm_drop_and_clauses (#12660)
* combine those

* wino kernels decreased
2025-10-14 10:09:41 +02:00
George Hotz
84d4589ed4 remove pylint from pre-commit and CI (#12658)
* remove pylint from pre-commit and CI

* multidevice test is fast

* faster pre-commit

* 8 is faster than 4

* better name

* how did that typecheck?
2025-10-14 15:39:59 +08:00
qazal
8ecaf839e2 cleanup UOp tracing [pr] (#12657) 2025-10-14 14:50:59 +08:00
George Hotz
b9eb5b5d49 clean up the LLM tokenizer (#12653)
* clean up the LLM tokenizer

* simple tokenizer is actually simple

* ugh write good code
2025-10-14 14:22:01 +08:00
qazal
a9ef93176f viz: add colored text helper (#12654) 2025-10-14 13:05:26 +08:00
George Hotz
ecdc7539a2 add typing to MathTraits (#12650)
* add typing to MathTraits

* fix assign
2025-10-14 12:35:20 +08:00
qazal
9bf032de69 viz: keep focused shape in view (#12648) 2025-10-14 10:49:08 +08:00
chenyu
77b5e6774e fix bert training config (#12647)
FREE_INTERMEDIATE=0 REWRITE_STACK_LIMIT=500000
2025-10-13 15:03:47 -04:00
nimlgen
f1041dc0ac pylint 4.0.0 (#12642)
* cpu: fix spacing

* fix pylint

* fix pylint

* pylint 4.0.0

* lambda

* keep eval for now

* im so sorry
2025-10-13 23:28:36 +08:00
wozeparrot
47e0c43976 feat: Tensor.{load, store} (#12629) 2025-10-13 08:04:41 -07:00
chenyu
0f776c6e46 examples/mlperf/training_submission_v6.0 (#12644)
copied from v5.1
2025-10-13 09:58:25 -04:00
Sieds Lykles
e0139fafc1 UOp symbolic tests use eval to check against string (#12643) 2025-10-13 14:19:42 +02:00
b1tg
218225e8d0 pylint error (#12630)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-10-13 05:05:12 -07:00
nimlgen
9096d7cc2e amd: support for rx9060 (#12640) 2025-10-13 19:44:15 +08:00
qazal
066d25f5fb refactor to trace_num property in buffers (#12638) 2025-10-13 18:06:55 +08:00
qazal
cd6aeebfee sqtt: osx decoder installer (#12637) 2025-10-13 17:26:12 +08:00
Sieds Lykles
e537e895b1 drop unused invalid conditions (#12635)
* drop where conditions if the ranges are not used inside the index

* remove allow_any_len
2025-10-13 10:52:21 +02:00
wozeparrot
9ab06dffad hotfix: block from env (#12628) 2025-10-12 08:07:32 -07:00