Files
tinygrad/extra/gemm/gemm.py
George Hotz d878065ece Gemm (#416)
* gemm

* off by factor of 5

* 50 GFLOPS

* works

* 91 gflops

* working at 50G

* works

* iy

* 150 GFLOPS

* 150 GFLOPS

* N=2048 is still fast

* threading soon

* multithread

* pinning

* throttling is sad

* Align matrices to cacheline width (#361)

Co-authored-by: cloud <Cloud11665@gmail.com>
2022-11-06 10:07:28 -08:00

29 lines
601 B
Python
Executable File

#!/usr/bin/env python3
import os
#os.environ['OMP_NUM_THREADS'] = '1'
import time
import numpy as np
N = 2048
if __name__ == "__main__":
# N^2
A = np.random.randn(N, N).astype(np.float32)
# N^2
B = np.random.randn(N, N).astype(np.float32)
# 2N compute in N^2 output cells
flop = 2*N*N*N
#print(f"{flop / 1e9:.2f} GFLOP")
for i in range(4):
st = time.monotonic()
C = A @ B.T
et = time.monotonic()
s = et-st
print(f"{flop/s * 1e-9:.2f} GFLOP/S, {s*1e3:.2f} ms")
with open("/tmp/matmul", "wb") as f:
f.write(A.data)
f.write(B.data)
f.write(C.data)