George Hotz
1f1f99c287
hotfix: add DEBUG=3 to driver CI
2025-07-29 11:03:47 -07:00
George Hotz
50fae54175
global local dims in gpudims [pr] ( #11420 )
2025-07-29 10:39:03 -07:00
chenyu
9bc413f104
remove ShapeTracker.to_uop [pr] ( #11418 )
2025-07-29 13:29:37 -04:00
George Hotz
ba2c4df125
dont render cast ptrs standalone ( #11417 )
...
* dont render cast ptrs standalone
* barrier cleanups
2025-07-29 09:24:26 -07:00
nimlgen
d38d285489
ci: add h machines ( #11416 )
...
* ci: add h machines
* more
* fix names
* names not collide
* 20
* 10
2025-07-29 19:21:51 +03:00
Tom Clesius
2568bc0d99
ci: add caching for apt packages ( #11162 )
...
* add caching for apt packages
* remove 'inputs' from apt cache key, use outputs instead of env
* remove unnecessary mkdir for partial
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-07-29 09:04:56 -07:00
George Hotz
03909f2772
permute locals for HL uop matmul ( #11412 )
...
* permute locals for HL uop matmul
* parens fix that
* permutes
* 20 TFLOPS
2025-07-29 08:19:59 -07:00
nimlgen
e0c9747684
amd: fix typo in has_scratch_base_registers for mi350 ( #11413 )
2025-07-29 10:30:06 +03:00
George Hotz
735ad5f10d
kernel4 and 5 in uops ( #11411 )
...
* move simplify views to merge views
* add amd kernel 4
* Revert "move simplify views to merge views"
This reverts commit 1e07dff384 .
* k4 in python
* kernel4 written in uops
* k5 support
* cleanups
2025-07-28 19:35:48 -07:00
George Hotz
fddc645668
HL=2 top matmul ( #11406 )
...
* HL=2 top matmul
* top colored
2025-07-28 12:32:38 -07:00
nimlgen
c7b4ab86e4
fix llvm tc on mi350 ( #11404 )
2025-07-28 21:37:43 +03:00
chenyu
9f7c72ff8f
remove UOp.valid method [pr] ( #11402 )
...
only used in add_buffer_ops
2025-07-28 11:29:08 -04:00
chenyu
b22a34331b
remove const valid in fixup_ast [pr] ( #11401 )
2025-07-28 11:07:59 -04:00
qazal
7737cbb2a0
viz: tabulate runtime stats ( #11400 )
2025-07-28 15:56:39 +03:00
chenyu
ab6a27f627
remove a branch in UOp.r [pr] ( #11398 )
2025-07-27 18:00:01 -04:00
uuuvn
052191eae4
Remote multihost (p2p with infiniband verbs) ( #9746 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-27 14:44:32 -07:00
qazal
a22417cc75
viz: fix bug with wrong program links ( #11396 )
2025-07-28 02:52:06 +08:00
nimlgen
a5371f514b
cpu: copies in profile ( #11392 )
...
* cpu: copies in profile
* fix
* rename to tiny?
2025-07-27 20:56:27 +03:00
George Hotz
8c10085459
assert shape on lowerer store [pr] ( #11395 )
...
* assert shape on lowerer store [pr]
* fix ptx
2025-07-27 10:41:57 -07:00
qazal
6174cfa828
viz: only show match counts greater than 0 ( #11394 )
2025-07-28 00:25:00 +08:00
qazal
3466a220de
viz: disassembly viewer ( #11393 )
...
* test
* CPU=1 disasm works
* METAL=1 disasm works
* fix that
* work
* can unwrap
* work p2
* don't crash
2025-07-27 18:44:28 +03:00
qazal
3bb232eb29
viz: query path in rewrite steps ( #11391 )
2025-07-27 14:51:47 +03:00
b1tg
b7ef73babd
fix wmma ptx ( #11389 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-07-26 23:28:35 -07:00
b1tg
8dfcdb123d
less wmma args ( #11385 )
...
* less wmma args
* scalar
* ops_python
* mypy
* lint
* dedup
* helper wmma_args
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-07-26 21:24:05 -07:00
George Hotz
dfeee63d30
uop matmul work ( #11388 )
...
* uop matmul work
* works with locals
2025-07-26 21:23:55 -07:00
George Hotz
3923e78061
no_vectorized_acc keeps single DEFINE_REG ( #11387 )
...
* no_vectorized_acc keeps single DEFINE_REG
* fix ptx, skip flaky test
2025-07-26 11:44:09 -07:00
qazal
4866ad57da
viz: add runtime stats ( #11383 )
...
* viz: add runtime stats
* lint
* better
* flat
2025-07-26 20:40:46 +03:00
George Hotz
2c70eaf18c
fix load / barrier ( #11386 )
...
* fix load / barrier
* cleanups
* fix CI
2025-07-26 10:27:37 -07:00
nimlgen
65673e68ca
hcq: do not import during __del__ ( #11384 )
...
* hcq: do not import during __del__
* ignore
2025-07-26 13:58:55 +03:00
George Hotz
466ab5a3f2
store/load not pass through index ( #11381 )
...
* noop
* fix noop
* store cat is NOOP
* store dtype is void
* stores aren't passed through anymore
* meh, skip those for ptx
* correct ptx skip
* hl runs
2025-07-25 21:01:47 -07:00
George Hotz
0a5f37946b
unused permute arg on r ( #11379 )
2025-07-25 19:52:37 -07:00
George Hotz
48562cb2db
full shape simpler ( #11376 )
2025-07-25 18:27:48 -07:00
chenyu
3d68feb67d
minor onnx Gather cleanup ( #11375 )
...
removed a type ignore and one error code skip
2025-07-25 21:08:08 -04:00
chenyu
88c338bfcc
add kernelize to keccak for each data block ( #11370 )
...
* add kernelize to keccak for each data block
test_long works now. this prevents internal uops from growing propotional to data length and eventually too deep
* this?
* hash stuff
* gate test
* mv
2025-07-25 16:07:20 -04:00
chenyu
dab07bcad9
use next instead of full list in UOp._device [pr] ( #11369 )
...
prevents exponential fan out
2025-07-25 10:04:29 -04:00
nimlgen
1bb1f1aee8
hcq: fix race in _at_profile_finalize ( #11368 )
2025-07-25 14:14:02 +03:00
George Hotz
490a93902c
define reg doesn't have init anymore ( #11365 )
...
* define reg doesn't have init anymore
* remove that
* no special logic for dr
* fix amd uop matmul
2025-07-24 19:15:49 -07:00
George Hotz
9da3f72495
identity store for DEFINE_REG ( #11363 )
...
* identity store for DEFINE_REG
* identity store for DEFINE_REG
* noop continue
2025-07-24 16:41:29 -07:00
chenyu
cc795c6656
simplify keccak pad mask code ( #11362 )
2025-07-24 19:24:10 -04:00
chenyu
c0c4bc9d7c
use int32 for keccak reorder_indexes ( #11360 )
...
it's used for tensor indexing, so int32 instead of uint64 is slightly faster
2025-07-24 15:54:50 -04:00
George Hotz
0602b22086
kernel spec ( #11359 )
...
* kernel spec
* ops.VIEW
* work
2025-07-24 12:45:38 -07:00
qazal
519f1d13cc
viz: generic stuff from gpu counters ui ( #11358 )
...
* viz: generic stuff from gpu counters ui
* move pointer
* pre fetch
* move timeout
2025-07-24 20:29:24 +03:00
nimlgen
3b3de8df61
hcq: graphed copies ( #11302 )
...
* fast copies p2
* upd and fix
* graph supports
* fixes
* fixes
* fixes
* fix
* fix
* fix mockgpu
* fix alignment
* smaller in ci
2025-07-24 17:36:19 +03:00
nimlgen
3046ead6e8
jit: graph reports ei support ( #11356 )
2025-07-24 16:35:10 +03:00
nimlgen
bf12041910
hcq: mapping of cpu to all hcq devices ( #11354 )
...
* hcq: mapping of cpu to all hcq devices
* fix kfd
* nv
* simpler
* cleaner
* correct skip
* fix ifaces
* system fixes
* mypy
2025-07-24 12:52:38 +03:00
chenyu
82e6de7fc6
more keccak reference tests ( #11329 )
2025-07-23 22:06:39 -04:00
George Hotz
b0dc97d1f7
write out kernel 3 in uops ( #11352 )
...
* write out kernel 3 in uops
* matmul is correct
* gemm passes spec
* bugfix to match speed
* cleanups
2025-07-23 17:32:38 -07:00
chenyu
5b570196e4
support DEV= to specify device ( #11351 )
2025-07-23 17:40:55 -04:00
uuuvn
76a2ddbd78
Move remote tests out of onnx ( #11310 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-23 13:25:55 -07:00
George Hotz
7f0a41df4d
move optional out of devectorize [pr] ( #11350 )
...
* move optional out of devectorize [pr]
* fast idiv
2025-07-23 11:26:05 -07:00