chenyu
b22a34331b
remove const valid in fixup_ast [pr] ( #11401 )
2025-07-28 11:07:59 -04:00
qazal
7737cbb2a0
viz: tabulate runtime stats ( #11400 )
2025-07-28 15:56:39 +03:00
chenyu
ab6a27f627
remove a branch in UOp.r [pr] ( #11398 )
2025-07-27 18:00:01 -04:00
uuuvn
052191eae4
Remote multihost (p2p with infiniband verbs) ( #9746 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-27 14:44:32 -07:00
qazal
a22417cc75
viz: fix bug with wrong program links ( #11396 )
2025-07-28 02:52:06 +08:00
nimlgen
a5371f514b
cpu: copies in profile ( #11392 )
...
* cpu: copies in profile
* fix
* rename to tiny?
2025-07-27 20:56:27 +03:00
George Hotz
8c10085459
assert shape on lowerer store [pr] ( #11395 )
...
* assert shape on lowerer store [pr]
* fix ptx
2025-07-27 10:41:57 -07:00
qazal
6174cfa828
viz: only show match counts greater than 0 ( #11394 )
2025-07-28 00:25:00 +08:00
qazal
3466a220de
viz: disassembly viewer ( #11393 )
...
* test
* CPU=1 disasm works
* METAL=1 disasm works
* fix that
* work
* can unwrap
* work p2
* don't crash
2025-07-27 18:44:28 +03:00
qazal
3bb232eb29
viz: query path in rewrite steps ( #11391 )
2025-07-27 14:51:47 +03:00
b1tg
b7ef73babd
fix wmma ptx ( #11389 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-07-26 23:28:35 -07:00
b1tg
8dfcdb123d
less wmma args ( #11385 )
...
* less wmma args
* scalar
* ops_python
* mypy
* lint
* dedup
* helper wmma_args
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-07-26 21:24:05 -07:00
George Hotz
dfeee63d30
uop matmul work ( #11388 )
...
* uop matmul work
* works with locals
2025-07-26 21:23:55 -07:00
George Hotz
3923e78061
no_vectorized_acc keeps single DEFINE_REG ( #11387 )
...
* no_vectorized_acc keeps single DEFINE_REG
* fix ptx, skip flaky test
2025-07-26 11:44:09 -07:00
qazal
4866ad57da
viz: add runtime stats ( #11383 )
...
* viz: add runtime stats
* lint
* better
* flat
2025-07-26 20:40:46 +03:00
George Hotz
2c70eaf18c
fix load / barrier ( #11386 )
...
* fix load / barrier
* cleanups
* fix CI
2025-07-26 10:27:37 -07:00
nimlgen
65673e68ca
hcq: do not import during __del__ ( #11384 )
...
* hcq: do not import during __del__
* ignore
2025-07-26 13:58:55 +03:00
George Hotz
466ab5a3f2
store/load not pass through index ( #11381 )
...
* noop
* fix noop
* store cat is NOOP
* store dtype is void
* stores aren't passed through anymore
* meh, skip those for ptx
* correct ptx skip
* hl runs
2025-07-25 21:01:47 -07:00
George Hotz
0a5f37946b
unused permute arg on r ( #11379 )
2025-07-25 19:52:37 -07:00
George Hotz
48562cb2db
full shape simpler ( #11376 )
2025-07-25 18:27:48 -07:00
chenyu
3d68feb67d
minor onnx Gather cleanup ( #11375 )
...
removed a type ignore and one error code skip
2025-07-25 21:08:08 -04:00
chenyu
88c338bfcc
add kernelize to keccak for each data block ( #11370 )
...
* add kernelize to keccak for each data block
test_long works now. this prevents internal uops from growing propotional to data length and eventually too deep
* this?
* hash stuff
* gate test
* mv
2025-07-25 16:07:20 -04:00
chenyu
dab07bcad9
use next instead of full list in UOp._device [pr] ( #11369 )
...
prevents exponential fan out
2025-07-25 10:04:29 -04:00
nimlgen
1bb1f1aee8
hcq: fix race in _at_profile_finalize ( #11368 )
2025-07-25 14:14:02 +03:00
George Hotz
490a93902c
define reg doesn't have init anymore ( #11365 )
...
* define reg doesn't have init anymore
* remove that
* no special logic for dr
* fix amd uop matmul
2025-07-24 19:15:49 -07:00
George Hotz
9da3f72495
identity store for DEFINE_REG ( #11363 )
...
* identity store for DEFINE_REG
* identity store for DEFINE_REG
* noop continue
2025-07-24 16:41:29 -07:00
chenyu
cc795c6656
simplify keccak pad mask code ( #11362 )
2025-07-24 19:24:10 -04:00
chenyu
c0c4bc9d7c
use int32 for keccak reorder_indexes ( #11360 )
...
it's used for tensor indexing, so int32 instead of uint64 is slightly faster
2025-07-24 15:54:50 -04:00
George Hotz
0602b22086
kernel spec ( #11359 )
...
* kernel spec
* ops.VIEW
* work
2025-07-24 12:45:38 -07:00
qazal
519f1d13cc
viz: generic stuff from gpu counters ui ( #11358 )
...
* viz: generic stuff from gpu counters ui
* move pointer
* pre fetch
* move timeout
2025-07-24 20:29:24 +03:00
nimlgen
3b3de8df61
hcq: graphed copies ( #11302 )
...
* fast copies p2
* upd and fix
* graph supports
* fixes
* fixes
* fixes
* fix
* fix
* fix mockgpu
* fix alignment
* smaller in ci
2025-07-24 17:36:19 +03:00
nimlgen
3046ead6e8
jit: graph reports ei support ( #11356 )
2025-07-24 16:35:10 +03:00
nimlgen
bf12041910
hcq: mapping of cpu to all hcq devices ( #11354 )
...
* hcq: mapping of cpu to all hcq devices
* fix kfd
* nv
* simpler
* cleaner
* correct skip
* fix ifaces
* system fixes
* mypy
2025-07-24 12:52:38 +03:00
chenyu
82e6de7fc6
more keccak reference tests ( #11329 )
2025-07-23 22:06:39 -04:00
George Hotz
b0dc97d1f7
write out kernel 3 in uops ( #11352 )
...
* write out kernel 3 in uops
* matmul is correct
* gemm passes spec
* bugfix to match speed
* cleanups
2025-07-23 17:32:38 -07:00
chenyu
5b570196e4
support DEV= to specify device ( #11351 )
2025-07-23 17:40:55 -04:00
uuuvn
76a2ddbd78
Move remote tests out of onnx ( #11310 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-23 13:25:55 -07:00
George Hotz
7f0a41df4d
move optional out of devectorize [pr] ( #11350 )
...
* move optional out of devectorize [pr]
* fast idiv
2025-07-23 11:26:05 -07:00
nimlgen
0f374e10d2
cpu: use mmap for allocations ( #11349 )
...
* cpu: use mmap for allocations
* ops
* fix mypy
2025-07-23 20:30:18 +03:00
George Hotz
ae07a93814
simple block barrier ( #11341 )
...
* simple block barrier
* simple
2025-07-23 10:14:11 -07:00
chenyu
86e7504111
mypy check extra/onnx.py ( #11348 )
...
instead of running test with 3.10, add onnx to mypy which would have caught StrEnum regression. Several type annotation failed mypy now that does not affect running the code and were skipped for now
2025-07-23 12:42:59 -04:00
chenyu
960da9319d
Remove StrEnum in onnx for python 3.10 ( #11345 )
...
some training tests failed looks like parsing error?
2025-07-23 11:52:25 -04:00
qazal
478a355325
gate PRINT_MATCH_STATS behind graph_rewrite tracking ( #11344 )
2025-07-23 16:32:43 +03:00
nimlgen
ca09c180dc
cpu: remove del spam ( #11343 )
...
* cpu: remove del spam
* fix
2025-07-23 12:02:37 +03:00
nimlgen
304eb9cecb
allocate less memory in am tests ( #11342 )
2025-07-23 11:11:26 +03:00
George Hotz
e14b4fefa5
ranges on store ( #11334 )
...
* ranges on store
* fix store spec
* fix that
* fix gates
* fix tests
* fix ptx
2025-07-22 21:00:50 -07:00
George Hotz
c65b5aab62
small things from endrange ( #11339 )
...
* small things from endrange
* store
2025-07-22 19:45:37 -07:00
George Hotz
53339e62f7
no gate store anymore ( #11338 )
...
* no gate store anymore
* fix up spec
2025-07-22 18:41:15 -07:00
chenyu
7a9a5cfd28
isolate test/external/external_test_am.py ( #11335 )
...
seems to be the one crashing, also remove -n=auto for that
2025-07-22 19:02:20 -04:00
George Hotz
fcbd0e4de3
assigns are no longer used [pr] ( #11333 )
2025-07-22 15:35:07 -07:00