George Hotz
ba84d415fe
work from benchmarking tinybox red v2 ( #13264 )
...
* work from benchmarking tinybox red v2
* gpuburn
2025-11-13 16:38:40 -08:00
George Hotz
267be7fc5e
fp16 acc
2025-11-02 12:53:04 +08:00
George Hotz
25c2da1579
check SPEC=2 in CI ( #12945 )
...
* check SPEC=2 in CI
* split SPEC=2
* fast enough
2025-10-27 21:53:57 +08:00
George Hotz
a3c78d47b3
speed docs + upgrades [pr] ( #8964 )
...
* add some docs about speed [pr]
* better torch gemm
* enable locals on llvm/clang
* disable locals for beam speed on LLVM/CLANG
* 0x20 alignment in llvm allows ymm use
2025-02-08 17:28:52 +08:00
George Hotz
fe71282ba1
faster RDNA assembly backend ( #990 )
...
* fast asm
* torch gemm
2023-06-16 12:06:38 -07:00