Francis Lam
|
dece9958f8
|
wmma: clean up to make WMMA arg order consistent (#2014)
also add cache defeat to extra/gemm/simple_matmul.py
|
2023-10-07 17:45:40 -07:00 |
|
Francis Lam
|
f445e056ed
|
wmma: add test and tensor core shape (#1925)
|
2023-09-28 18:04:28 -07:00 |
|
George Hotz
|
e464442adf
|
WMMA for 7900XTX (#1563)
* go
* hip no LRU
* work
* works
* 16 TFLOPS
* 29 TFLOPS
* 30 TFLOPS
* never mind, it's 60 TFLOPS
* fix metal WMMA
* put hip alloc back
|
2023-08-19 09:07:23 -07:00 |
|
George Hotz
|
90fff82c8a
|
Rdna (#776)
* assembler maybe
* custom asm
* rdna3 on quiet
* trigger crashes
* fixed notes
* non-fatal rdna2 crash
* Crash4
* improve rdna sniffer
* comments
* improve sniffer
* asm
* 131 TFLOPS RDNA3
* opt simple matmul
* todos
|
2023-05-16 05:33:57 -07:00 |
|