nimlgen
75c2c42def
suppress exceptions only during finalization ( #11451 )
...
* suppress exceptions only during finalization
* fix
* fix typing
* fix more warns
* fix
* better?
* Revert "better?"
This reverts commit a068aa5793 .
* mm?
* no as e
2025-07-31 13:57:12 +03:00
wozeparrot
24dd0d52ed
feat: test remove to cpu ( #11444 )
2025-07-30 20:18:56 -07:00
kevvz
c3cfcb50cb
Add linalg_det and test for torch backend ( #11405 )
...
* add linalg_det and test
* space
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-30 22:04:44 -04:00
Eitan Turok
cba3655de5
Add Test for Setitem ( #10559 )
...
* init
* update
* better
* failing test
* works
* Delete test file
* clean
* lint
* simplify variable name
* rm contigious, rm int dtype, and add assertEqual
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-30 22:03:41 -04:00
wozeparrot
6252f7770e
feat: fake data ( #11447 )
2025-07-30 17:18:20 -07:00
chenyu
e300451f3a
update llama3 ( #11446 )
...
`LR=1e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 FUSE_ARANGE=1 JITBEAM=2 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=512 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` trained to 7
2025-07-30 19:34:21 -04:00
wozeparrot
5fb975351a
feat: flag for training on val ( #11441 )
2025-07-30 14:29:45 -07:00
chenyu
4ca430e5bf
fix search dedup ( #11439 )
...
it should check against pre real_axis axis in actions, not real_axis.
2025-07-30 17:24:16 -04:00
wozeparrot
d3da20eca6
feat: bump mlperf workflow timeout to 6 hours ( #11440 )
2025-07-30 14:12:12 -07:00
wozeparrot
825b6a2505
feat: llama3 dataloader ( #11340 )
2025-07-30 13:27:55 -07:00
qazal
af357b5dc8
disable TRACK_MATCH_STATS in BEAM workers [pr] ( #11437 )
2025-07-30 23:22:08 +03:00
George Hotz
7c2d2eff86
check tensor core dims ( #11436 )
...
* check elements_per_thread in tensorcore [pr]
* check tc dims
2025-07-30 13:06:59 -07:00
nimlgen
5fc5bb5237
ci: clear processes ( #11434 )
...
* unified hcq_smi for managment
* fix
* fix
* no reset for amd
2025-07-30 22:15:18 +03:00
George Hotz
4f26a9ad32
check elements_per_thread in tensorcore [pr] ( #11435 )
2025-07-30 11:55:48 -07:00
nimlgen
4b4ba5454c
ci: move driver start higher ( #11431 )
2025-07-30 10:48:38 +03:00
George Hotz
1bef2d80c1
unrolls are all in the same scope ( #11429 )
...
* unrolls are all in the same scope
* fix that import
2025-07-29 16:55:37 -07:00
chenyu
204da24cfc
increase driverbenchmark timeout-minutes to 15 ( #11428 )
2025-07-29 19:45:05 -04:00
chenyu
d5fc6af4a2
remove unused ShapeTracker.consecutive [pr] ( #11426 )
2025-07-29 18:36:19 -04:00
George Hotz
49a2583584
real new lowerer ( #11419 )
...
* real new lowerer
* fix group for reduce
* skip missing ranges
* fix wmma and unroll/contract
* real fix for wmma
* disable that test
* fix if gate
* simpler
* flash attention fusion works
* no end barriers
* still broken
* flash attention finally works
2025-07-29 15:35:51 -07:00
chenyu
0e5d8d5c3c
remove tests that used .to_uop() ( #11425 )
...
* remove tests that used .to_uop()
* import
2025-07-29 15:52:16 -04:00
nimlgen
c88e401d0e
ci: fix typos in h machine benchmarks ( #11423 )
2025-07-29 22:11:47 +03:00
chenyu
90a5a312eb
simplify ShapeTracker in UOp.const [pr] ( #11424 )
2025-07-29 15:04:06 -04:00
chenyu
398594029b
spec checks arg of VIEW are ShapeTracker ( #11422 )
2025-07-29 14:05:12 -04:00
George Hotz
1f1f99c287
hotfix: add DEBUG=3 to driver CI
2025-07-29 11:03:47 -07:00
George Hotz
50fae54175
global local dims in gpudims [pr] ( #11420 )
2025-07-29 10:39:03 -07:00
chenyu
9bc413f104
remove ShapeTracker.to_uop [pr] ( #11418 )
2025-07-29 13:29:37 -04:00
George Hotz
ba2c4df125
dont render cast ptrs standalone ( #11417 )
...
* dont render cast ptrs standalone
* barrier cleanups
2025-07-29 09:24:26 -07:00
nimlgen
d38d285489
ci: add h machines ( #11416 )
...
* ci: add h machines
* more
* fix names
* names not collide
* 20
* 10
2025-07-29 19:21:51 +03:00
Tom Clesius
2568bc0d99
ci: add caching for apt packages ( #11162 )
...
* add caching for apt packages
* remove 'inputs' from apt cache key, use outputs instead of env
* remove unnecessary mkdir for partial
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-07-29 09:04:56 -07:00
George Hotz
03909f2772
permute locals for HL uop matmul ( #11412 )
...
* permute locals for HL uop matmul
* parens fix that
* permutes
* 20 TFLOPS
2025-07-29 08:19:59 -07:00
nimlgen
e0c9747684
amd: fix typo in has_scratch_base_registers for mi350 ( #11413 )
2025-07-29 10:30:06 +03:00
George Hotz
735ad5f10d
kernel4 and 5 in uops ( #11411 )
...
* move simplify views to merge views
* add amd kernel 4
* Revert "move simplify views to merge views"
This reverts commit 1e07dff384 .
* k4 in python
* kernel4 written in uops
* k5 support
* cleanups
2025-07-28 19:35:48 -07:00
George Hotz
fddc645668
HL=2 top matmul ( #11406 )
...
* HL=2 top matmul
* top colored
2025-07-28 12:32:38 -07:00
nimlgen
c7b4ab86e4
fix llvm tc on mi350 ( #11404 )
2025-07-28 21:37:43 +03:00
chenyu
9f7c72ff8f
remove UOp.valid method [pr] ( #11402 )
...
only used in add_buffer_ops
2025-07-28 11:29:08 -04:00
chenyu
b22a34331b
remove const valid in fixup_ast [pr] ( #11401 )
2025-07-28 11:07:59 -04:00
qazal
7737cbb2a0
viz: tabulate runtime stats ( #11400 )
2025-07-28 15:56:39 +03:00
chenyu
ab6a27f627
remove a branch in UOp.r [pr] ( #11398 )
2025-07-27 18:00:01 -04:00
uuuvn
052191eae4
Remote multihost (p2p with infiniband verbs) ( #9746 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-27 14:44:32 -07:00
qazal
a22417cc75
viz: fix bug with wrong program links ( #11396 )
2025-07-28 02:52:06 +08:00
nimlgen
a5371f514b
cpu: copies in profile ( #11392 )
...
* cpu: copies in profile
* fix
* rename to tiny?
2025-07-27 20:56:27 +03:00
George Hotz
8c10085459
assert shape on lowerer store [pr] ( #11395 )
...
* assert shape on lowerer store [pr]
* fix ptx
2025-07-27 10:41:57 -07:00
qazal
6174cfa828
viz: only show match counts greater than 0 ( #11394 )
2025-07-28 00:25:00 +08:00
qazal
3466a220de
viz: disassembly viewer ( #11393 )
...
* test
* CPU=1 disasm works
* METAL=1 disasm works
* fix that
* work
* can unwrap
* work p2
* don't crash
2025-07-27 18:44:28 +03:00
qazal
3bb232eb29
viz: query path in rewrite steps ( #11391 )
2025-07-27 14:51:47 +03:00
b1tg
b7ef73babd
fix wmma ptx ( #11389 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-07-26 23:28:35 -07:00
b1tg
8dfcdb123d
less wmma args ( #11385 )
...
* less wmma args
* scalar
* ops_python
* mypy
* lint
* dedup
* helper wmma_args
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-07-26 21:24:05 -07:00
George Hotz
dfeee63d30
uop matmul work ( #11388 )
...
* uop matmul work
* works with locals
2025-07-26 21:23:55 -07:00
George Hotz
3923e78061
no_vectorized_acc keeps single DEFINE_REG ( #11387 )
...
* no_vectorized_acc keeps single DEFINE_REG
* fix ptx, skip flaky test
2025-07-26 11:44:09 -07:00
qazal
4866ad57da
viz: add runtime stats ( #11383 )
...
* viz: add runtime stats
* lint
* better
* flat
2025-07-26 20:40:46 +03:00