George Hotz
0b00981cd1
fix wmma
2025-10-15 09:38:07 +08:00
George Hotz
5d485660da
reproed failure in emulation
2025-10-15 09:19:56 +08:00
George Hotz
0e06b5cbb6
support emulate in the NullDevice
2025-10-15 09:11:57 +08:00
George Hotz
3a4a3e09ea
Merge branch 'master' into new_shape
2025-10-14 21:16:44 +08:00
chenyu
70dd297a05
BS=96 for bert ( #12675 )
...
96 trains fine now
2025-10-14 09:07:43 -04:00
George Hotz
3b0b3dcff3
oops, i didn't mean to change that
2025-10-14 20:12:47 +08:00
George Hotz
accee5d840
hack for 3 op assign
2025-10-14 20:10:53 +08:00
George Hotz
e6812bbe63
one less st
2025-10-14 19:54:37 +08:00
George Hotz
18a6492e98
test is broken
2025-10-14 19:50:17 +08:00
George Hotz
9723b4f1c1
close
2025-10-14 19:46:52 +08:00
Sieds Lykles
852d80dff9
better where on load folding ( #12651 )
...
* move where clauses to load
* shorten line
* drop clauses if they are duplicated
* add rule for swapped where branch
* where on ungated load
* dont move clause if load is in the clause
* parse_valid returns None
* no data dependent branches
* fix rule
* enable swapped rule
* remove those
2025-10-14 13:30:47 +02:00
George Hotz
07162df323
size doesn't use st
2025-10-14 19:30:24 +08:00
George Hotz
7a2e206a0d
fix tests
2025-10-14 19:22:16 +08:00
nimlgen
c7e63601fd
gfx1200 tc for AMD_LLVM ( #12673 )
2025-10-14 19:17:48 +08:00
George Hotz
61855c24a8
Merge branch 'master' into new_shape
2025-10-14 19:15:09 +08:00
George Hotz
db4a359374
fix up some slow tests that launch python ( #12672 )
...
* fix up some slow tests that launch python
* svd nonfull in parallel
* split test_advancedindex
2025-10-14 19:13:55 +08:00
George Hotz
28076d9270
const uses _shape
2025-10-14 19:12:03 +08:00
nimlgen
4918c827c2
amd: lib_gpu does not need cpu_access ( #12670 )
2025-10-14 18:34:34 +08:00
nimlgen
0c9d47deab
hcq: add alignment to kernargs ( #12669 )
2025-10-14 18:33:12 +08:00
George Hotz
d51cae1396
shape is good
2025-10-14 18:28:00 +08:00
George Hotz
0b69698ad4
mostly works
2025-10-14 18:19:02 +08:00
qazal
d3bfcd3277
minor patches for SQTT over usb on gfx12 ( #12627 )
...
* disable cpu_access in the sqtt buffer allocation
not sure if this is required, it results in a very slow call to
pcie_mem_write over USB GPU, removing it worked fine.
* fix itrace_se_mask on gfx12
on gfx11 it gave 6 se, on gfx11 this value is 2 so no instructions were
traced.
* Revert "fix itrace_se_mask on gfx12"
This reverts commit 0644adbcd1 .
2025-10-14 18:07:46 +08:00
Sieds Lykles
1e6e5a0efd
parse_valid returns None instead of raising (#12663 )
...
* parse_valid returns None
* change there too
2025-10-14 11:57:38 +02:00
George Hotz
04ead92ebd
_shape is like _device
2025-10-14 17:53:17 +08:00
qazal
471bd30d16
cleanup viz/serve.py ( #12665 )
...
* use load_pickle
* update comment
2025-10-14 17:50:39 +08:00
George Hotz
faddebef07
need to cache it
2025-10-14 17:35:29 +08:00
George Hotz
a659cb18a4
all mops
2025-10-14 17:24:08 +08:00
George Hotz
8721b6884c
more mops
2025-10-14 17:20:04 +08:00
George Hotz
59512a49fa
reshape causing issues
2025-10-14 16:59:25 +08:00
George Hotz
a73b59caa2
work on shape property
2025-10-14 16:50:43 +08:00
George Hotz
fb61f3519f
remove assign contiguous hack ( #12659 )
...
* remove assign contiguous hack
* remove bad contiguous usage in torch backend
* assign
2025-10-14 16:42:14 +08:00
George Hotz
30ee7c4c26
cleanup Device usage in Tensor ( #12662 )
2025-10-14 16:22:22 +08:00
Sieds Lykles
e06cbfcb8a
combine pm_drop_and_clauses ( #12660 )
...
* combine those
* wino kernels decreased
2025-10-14 10:09:41 +02:00
George Hotz
84d4589ed4
remove pylint from pre-commit and CI ( #12658 )
...
* remove pylint from pre-commit and CI
* multidevice test is fast
* faster pre-commit
* 8 is faster than 4
* better name
* how did that typecheck?
2025-10-14 15:39:59 +08:00
qazal
8ecaf839e2
cleanup UOp tracing [pr] ( #12657 )
2025-10-14 14:50:59 +08:00
George Hotz
b9eb5b5d49
clean up the LLM tokenizer ( #12653 )
...
* clean up the LLM tokenizer
* simple tokenizer is actually simple
* ugh write good code
2025-10-14 14:22:01 +08:00
qazal
a9ef93176f
viz: add colored text helper ( #12654 )
2025-10-14 13:05:26 +08:00
George Hotz
ecdc7539a2
add typing to MathTraits ( #12650 )
...
* add typing to MathTraits
* fix assign
2025-10-14 12:35:20 +08:00
qazal
9bf032de69
viz: keep focused shape in view ( #12648 )
2025-10-14 10:49:08 +08:00
chenyu
77b5e6774e
fix bert training config ( #12647 )
...
FREE_INTERMEDIATE=0 REWRITE_STACK_LIMIT=500000
2025-10-13 15:03:47 -04:00
nimlgen
f1041dc0ac
pylint 4.0.0 ( #12642 )
...
* cpu: fix spacing
* fix pylint
* fix pylint
* pylint 4.0.0
* lambda
* keep eval for now
* im so sorry
2025-10-13 23:28:36 +08:00
wozeparrot
47e0c43976
feat: Tensor.{load, store} ( #12629 )
2025-10-13 08:04:41 -07:00
chenyu
0f776c6e46
examples/mlperf/training_submission_v6.0 ( #12644 )
...
copied from v5.1
2025-10-13 09:58:25 -04:00
Sieds Lykles
e0139fafc1
UOp symbolic tests use eval to check against string ( #12643 )
2025-10-13 14:19:42 +02:00
b1tg
218225e8d0
pylint error ( #12630 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-10-13 05:05:12 -07:00
nimlgen
9096d7cc2e
amd: support for rx9060 ( #12640 )
2025-10-13 19:44:15 +08:00
qazal
066d25f5fb
refactor to trace_num property in buffers ( #12638 )
2025-10-13 18:06:55 +08:00
qazal
cd6aeebfee
sqtt: osx decoder installer ( #12637 )
2025-10-13 17:26:12 +08:00
Sieds Lykles
e537e895b1
drop unused invalid conditions ( #12635 )
...
* drop where conditions if the ranges are not used inside the index
* remove allow_any_len
2025-10-13 10:52:21 +02:00
wozeparrot
9ab06dffad
hotfix: block from env ( #12628 )
2025-10-12 08:07:32 -07:00