chenyu
1d730b8853
remove ACCUM_FP32 in simple_matmul.py ( #3045 )
...
* remove ACCUM_FP32 in simple_matmul.py
accumate for half inputs is always in float
* move test llama compile speed to metal
2024-01-08 17:37:57 -05:00
George Hotz
a280cfe169
move dtypes to dtype.py ( #2964 )
...
* move dtypes to dtype.py
* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
c81ce9643d
move globalcounters to ops ( #2960 )
...
* move globalcounters to ops
* missed a few
* sick of that failing
2024-01-01 14:21:02 -08:00
George Hotz
7da2325dc7
get_lazyops() -> lazyops ( #2884 )
...
* get_lazyops() -> lazyops
* don't compare empty mem
2023-12-20 18:04:49 -08:00
Rory Clear
f409b57854
update metal matmul and matvec for new device style ( #2732 )
...
* update for new device style
* create device before compile
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2023-12-17 16:15:07 -05:00
Nguyen Nguyen Phuong
07cf45e133
fix cuda matmul ( #2725 )
2023-12-12 07:59:31 -08:00
George Hotz
b5fd160b39
hotfix: increase rtol on simple_matmul
2023-12-11 10:10:29 -08:00
George Hotz
a73579919f
mlx benchmark, a lil slower than tg
2023-12-05 19:00:43 -08:00
George Hotz
0be5d16950
only 62 gflops ( #2629 )
2023-12-05 13:28:24 -08:00
Yixiang Gao
fde44aed76
update hip_matmul with new abstraction ( #2605 )
2023-12-04 13:37:10 -08:00
Jake
5588922884
Update cuda_matmul.py ( #2495 )
2023-11-28 19:46:01 -08:00
George Hotz
3f137b134a
jax parallel matmul example
2023-11-28 13:48:11 -08:00
Davi Silva
186ac77ec3
Update hip_matmul.py ( #2480 )
2023-11-27 18:36:19 -08:00
George Hotz
9e07824542
move device to device.py ( #2466 )
...
* move device to device.py
* pylint test --disable R,C,W,E --enable E0611
* fix tests
2023-11-27 11:34:37 -08:00
George Hotz
0cbf6c1811
move things, clean up extra ( #2292 )
...
* move things
* idk why pylint needs that now
* delete unused
2023-11-13 20:18:40 -08:00
Rory Clear
553688f12a
update metal matmul and matvec for compile api ( #2238 )
2023-11-08 08:08:35 -08:00
George Hotz
2f7aab3d13
move optimize_local_size ( #2221 )
...
* move optimize_local_size
* interpret_ast
2023-11-05 21:00:52 -08:00
George Hotz
5472a14544
openpilot compile2 ( #1977 )
...
* start compile2
* tweak
* why are there two more kernels?
* minor cleanups
* don't break onnx tests
* add __metadata__ support to safetensors
* no early realize in onnx
* cleanups
* bugfix
* clean up image type, add optimize
* opt to match old
* try that
* opt work
* run compile2
* optimizer
* prt more
* prerealize
* imp
* NOLOCALS works
* no locals means no locals
* support fractional globals
* all locals welcome
* int that
* cleanups
* show gemv regression
* clean up diff
* use idx for the cond
* nolocals
---------
Co-authored-by: Comma Device <device@comma.ai >
2023-10-15 20:39:46 -07:00
George Hotz
8db92bd060
fix tvm gemm example
2023-10-08 05:57:41 -07:00
Francis Lam
dece9958f8
wmma: clean up to make WMMA arg order consistent ( #2014 )
...
also add cache defeat to extra/gemm/simple_matmul.py
2023-10-07 17:45:40 -07:00
Francis Lam
0ba75c4370
optimizer: add matvec optimizations ( #1972 )
...
* optimizer: add matvec optimizations
* renderer: fix alignment of shared memory in opencl
2023-10-04 14:16:27 -07:00
George Hotz
717451a244
Revert "optimizer: add matvec optimizations ( #1753 )" ( #1959 )
...
This reverts commit f520323054 .
2023-10-03 00:28:42 -07:00
Francis Lam
f520323054
optimizer: add matvec optimizations ( #1753 )
...
* optimizer: add matvec optimizations
* Update optimizer.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2023-10-03 00:01:59 -07:00
Francis Lam
f445e056ed
wmma: add test and tensor core shape ( #1925 )
2023-09-28 18:04:28 -07:00
George Hotz
c36d0e3bd8
tvm import hook
2023-09-28 09:24:32 -07:00
qazal
d0e752003d
fixes ( #1893 )
2023-09-22 07:20:27 +08:00
George Hotz
4613c9e77c
add tvm example, formatting ( #1813 )
...
* add tvm example
* no realize
2023-09-07 11:50:41 -07:00
Pavol Rusnak
52a92bf95d
use class Foo: instead of class Foo(): ( #1797 )
...
* use class Foo: instead of class Foo():
* add ruff linter, copy settings from .flake8 to ruff.toml
2023-09-06 12:20:25 -07:00
George Hotz
a6d842af7a
move device to ops ( #1646 )
...
* move device to ops
* mlops types
* 2 lines
2023-08-23 08:30:17 -07:00
George Hotz
e464442adf
WMMA for 7900XTX ( #1563 )
...
* go
* hip no LRU
* work
* works
* 16 TFLOPS
* 29 TFLOPS
* 30 TFLOPS
* never mind, it's 60 TFLOPS
* fix metal WMMA
* put hip alloc back
2023-08-19 09:07:23 -07:00
George Hotz
c417cd3c97
fast HIP gemm -> 100 TFLOPS ( #1476 )
...
* fast HIP gemm
* wmma
* correct b
* fix spilling
* 60 TFLOPS
* 64 TFLOPS
* 65 TFLOPS
2023-08-09 06:54:15 -07:00
David Hou
3300d0aeaf
syncthreads before wmma ( #1389 )
...
(venv) chaos@tiny3:~/tinygrad$ KX=2 KY=2 N=2048 python extra/gemm/hip_matmul.py
4194304 289.60 us, would be 59322.55 GFLOPS matmul, 173.80 GB/s
2023-07-31 17:05:49 -07:00
George Hotz
37fa7e96fb
Revert "update editorconfig, enforce via CI ( #1343 )" ( #1380 )
...
This reverts commit da2efecbe2 .
2023-07-31 10:35:50 -07:00
Pavol Rusnak
da2efecbe2
update editorconfig, enforce via CI ( #1343 )
...
* update editorconfig to set unix-style newlines and trim whitespace
* add editorconfig github action to the CI
* fix whitespace
2023-07-30 18:44:30 -07:00
George Hotz
67e34b356a
good stuff from tensor cores branch ( #1199 )
2023-07-08 16:58:26 -07:00
George Hotz
b8dfbba703
hip_matmul: f16 gemm 2048x2048 gets 36 TFLOPS
2023-07-08 00:35:45 +00:00
George Hotz
e234bf2298
hip matmul : add K support
2023-06-28 19:54:33 +00:00
George Hotz
0e93b9642a
hip matmul
2023-06-28 19:21:01 +00:00
Casey Primozic
805eef10dd
Add tensorflow GEMM benchmark script ( #1000 )
...
* Modelled closely after the existing torch benchmark script but just adapted slightly for tensorflow
2023-06-18 10:57:45 -07:00
George Hotz
fe71282ba1
faster RDNA assembly backend ( #990 )
...
* fast asm
* torch gemm
2023-06-16 12:06:38 -07:00
George Hotz
90fff82c8a
Rdna ( #776 )
...
* assembler maybe
* custom asm
* rdna3 on quiet
* trigger crashes
* fixed notes
* non-fatal rdna2 crash
* Crash4
* improve rdna sniffer
* comments
* improve sniffer
* asm
* 131 TFLOPS RDNA3
* opt simple matmul
* todos
2023-05-16 05:33:57 -07:00
George Hotz
59d0d168cd
FLOAT16 off works
2023-04-19 15:34:56 -07:00
George Hotz
3d15769a8f
50 TFLOPS cuda matmul
2023-04-19 14:38:24 -07:00
George Hotz
0b5a0b9ba4
winograd comment
2023-04-16 03:36:51 -07:00
George Hotz
8b777af571
metal_conv gets over 10.4 TFLOPS...
2023-04-15 03:31:22 -07:00
George Hotz
d66e682205
metal matmul from tcores branch
2023-04-14 23:29:29 -07:00
George Hotz
68e45fca18
metal_matmul: bw and torch sync
2023-03-23 08:02:04 -07:00
George Hotz
bd6c3c31a9
compare to torch
2023-03-22 23:58:37 -07:00
George Hotz
c3a3db75c7
fix metal matmul example
2023-03-22 23:42:51 -07:00
George Hotz
1a039306d2
good changes from llama branch ( #671 )
...
* good changes from llama
* transpose behavior changed
2023-03-09 20:51:22 -08:00