Commit Graph

10633 Commits

Author SHA1 Message Date
chenyu
4ca430e5bf fix search dedup (#11439)
it should check against pre real_axis axis in actions, not real_axis.
2025-07-30 17:24:16 -04:00
wozeparrot
d3da20eca6 feat: bump mlperf workflow timeout to 6 hours (#11440) 2025-07-30 14:12:12 -07:00
wozeparrot
825b6a2505 feat: llama3 dataloader (#11340) 2025-07-30 13:27:55 -07:00
qazal
af357b5dc8 disable TRACK_MATCH_STATS in BEAM workers [pr] (#11437) 2025-07-30 23:22:08 +03:00
George Hotz
7c2d2eff86 check tensor core dims (#11436)
* check elements_per_thread in tensorcore [pr]

* check tc dims
2025-07-30 13:06:59 -07:00
nimlgen
5fc5bb5237 ci: clear processes (#11434)
* unified hcq_smi for managment

* fix

* fix

* no reset for amd
2025-07-30 22:15:18 +03:00
George Hotz
4f26a9ad32 check elements_per_thread in tensorcore [pr] (#11435) 2025-07-30 11:55:48 -07:00
nimlgen
4b4ba5454c ci: move driver start higher (#11431) 2025-07-30 10:48:38 +03:00
George Hotz
1bef2d80c1 unrolls are all in the same scope (#11429)
* unrolls are all in the same scope

* fix that import
2025-07-29 16:55:37 -07:00
chenyu
204da24cfc increase driverbenchmark timeout-minutes to 15 (#11428) 2025-07-29 19:45:05 -04:00
chenyu
d5fc6af4a2 remove unused ShapeTracker.consecutive [pr] (#11426) 2025-07-29 18:36:19 -04:00
George Hotz
49a2583584 real new lowerer (#11419)
* real new lowerer

* fix group for reduce

* skip missing ranges

* fix wmma and unroll/contract

* real fix for wmma

* disable that test

* fix if gate

* simpler

* flash attention fusion works

* no end barriers

* still broken

* flash attention finally works
2025-07-29 15:35:51 -07:00
chenyu
0e5d8d5c3c remove tests that used .to_uop() (#11425)
* remove tests that used .to_uop()

* import
2025-07-29 15:52:16 -04:00
nimlgen
c88e401d0e ci: fix typos in h machine benchmarks (#11423) 2025-07-29 22:11:47 +03:00
chenyu
90a5a312eb simplify ShapeTracker in UOp.const [pr] (#11424) 2025-07-29 15:04:06 -04:00
chenyu
398594029b spec checks arg of VIEW are ShapeTracker (#11422) 2025-07-29 14:05:12 -04:00
George Hotz
1f1f99c287 hotfix: add DEBUG=3 to driver CI 2025-07-29 11:03:47 -07:00
George Hotz
50fae54175 global local dims in gpudims [pr] (#11420) 2025-07-29 10:39:03 -07:00
chenyu
9bc413f104 remove ShapeTracker.to_uop [pr] (#11418) 2025-07-29 13:29:37 -04:00
George Hotz
ba2c4df125 dont render cast ptrs standalone (#11417)
* dont render cast ptrs standalone

* barrier cleanups
2025-07-29 09:24:26 -07:00
nimlgen
d38d285489 ci: add h machines (#11416)
* ci: add h machines

* more

* fix names

* names not collide

* 20

* 10
2025-07-29 19:21:51 +03:00
Tom Clesius
2568bc0d99 ci: add caching for apt packages (#11162)
* add caching for apt packages

* remove 'inputs' from apt cache key, use outputs instead of env

* remove unnecessary mkdir for partial

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-07-29 09:04:56 -07:00
George Hotz
03909f2772 permute locals for HL uop matmul (#11412)
* permute locals for HL uop matmul

* parens fix that

* permutes

* 20 TFLOPS
2025-07-29 08:19:59 -07:00
nimlgen
e0c9747684 amd: fix typo in has_scratch_base_registers for mi350 (#11413) 2025-07-29 10:30:06 +03:00
George Hotz
735ad5f10d kernel4 and 5 in uops (#11411)
* move simplify views to merge views

* add amd kernel 4

* Revert "move simplify views to merge views"

This reverts commit 1e07dff384.

* k4 in python

* kernel4 written in uops

* k5 support

* cleanups
2025-07-28 19:35:48 -07:00
George Hotz
fddc645668 HL=2 top matmul (#11406)
* HL=2 top matmul

* top colored
2025-07-28 12:32:38 -07:00
nimlgen
c7b4ab86e4 fix llvm tc on mi350 (#11404) 2025-07-28 21:37:43 +03:00
chenyu
9f7c72ff8f remove UOp.valid method [pr] (#11402)
only used in add_buffer_ops
2025-07-28 11:29:08 -04:00
chenyu
b22a34331b remove const valid in fixup_ast [pr] (#11401) 2025-07-28 11:07:59 -04:00
qazal
7737cbb2a0 viz: tabulate runtime stats (#11400) 2025-07-28 15:56:39 +03:00
chenyu
ab6a27f627 remove a branch in UOp.r [pr] (#11398) 2025-07-27 18:00:01 -04:00
uuuvn
052191eae4 Remote multihost (p2p with infiniband verbs) (#9746)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-27 14:44:32 -07:00
qazal
a22417cc75 viz: fix bug with wrong program links (#11396) 2025-07-28 02:52:06 +08:00
nimlgen
a5371f514b cpu: copies in profile (#11392)
* cpu: copies in profile

* fix

* rename to tiny?
2025-07-27 20:56:27 +03:00
George Hotz
8c10085459 assert shape on lowerer store [pr] (#11395)
* assert shape on lowerer store [pr]

* fix ptx
2025-07-27 10:41:57 -07:00
qazal
6174cfa828 viz: only show match counts greater than 0 (#11394) 2025-07-28 00:25:00 +08:00
qazal
3466a220de viz: disassembly viewer (#11393)
* test

* CPU=1 disasm works

* METAL=1 disasm works

* fix that

* work

* can unwrap

* work p2

* don't crash
2025-07-27 18:44:28 +03:00
qazal
3bb232eb29 viz: query path in rewrite steps (#11391) 2025-07-27 14:51:47 +03:00
b1tg
b7ef73babd fix wmma ptx (#11389)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-07-26 23:28:35 -07:00
b1tg
8dfcdb123d less wmma args (#11385)
* less wmma args

* scalar

* ops_python

* mypy

* lint

* dedup

* helper wmma_args

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-07-26 21:24:05 -07:00
George Hotz
dfeee63d30 uop matmul work (#11388)
* uop matmul work

* works with locals
2025-07-26 21:23:55 -07:00
George Hotz
3923e78061 no_vectorized_acc keeps single DEFINE_REG (#11387)
* no_vectorized_acc keeps single DEFINE_REG

* fix ptx, skip flaky test
2025-07-26 11:44:09 -07:00
qazal
4866ad57da viz: add runtime stats (#11383)
* viz: add runtime stats

* lint

* better

* flat
2025-07-26 20:40:46 +03:00
George Hotz
2c70eaf18c fix load / barrier (#11386)
* fix load / barrier

* cleanups

* fix CI
2025-07-26 10:27:37 -07:00
nimlgen
65673e68ca hcq: do not import during __del__ (#11384)
* hcq: do not import during __del__

* ignore
2025-07-26 13:58:55 +03:00
George Hotz
466ab5a3f2 store/load not pass through index (#11381)
* noop

* fix noop

* store cat is NOOP

* store dtype is void

* stores aren't passed through anymore

* meh, skip those for ptx

* correct ptx skip

* hl runs
2025-07-25 21:01:47 -07:00
George Hotz
0a5f37946b unused permute arg on r (#11379) 2025-07-25 19:52:37 -07:00
George Hotz
48562cb2db full shape simpler (#11376) 2025-07-25 18:27:48 -07:00
chenyu
3d68feb67d minor onnx Gather cleanup (#11375)
removed a type ignore and one error code skip
2025-07-25 21:08:08 -04:00
chenyu
88c338bfcc add kernelize to keccak for each data block (#11370)
* add kernelize to keccak for each data block

test_long works now. this prevents internal uops from growing propotional to data length and eventually too deep

* this?

* hash stuff

* gate test

* mv
2025-07-25 16:07:20 -04:00