Files
tinygrad/accel/cherry/INSTRUCTIONS
2021-10-30 16:10:59 -07:00

31 lines
629 B
Plaintext

For SZ = 32
31 vector registers: v0-v31
* 1024 element (19-bit TF32) wide
* 32x32 "matrix"
# Support masked execution of everything with v0
# Helper instructions to load sz into mask?
Movement:
mov v1, v2
mov v1, x0 # load 0
Vector Engine (1024 ALUs):
vfadd.vv vd, vs2, vs1, vm
add sub mul div pow mac
add v3, v1, v2 # vector/vector add
add v3, v1, t0 # vector/scalar add
Matrix Engine:
matmul v3, v1, v2
Load/Store Engine (R-type instructions):
load v1, t0(<address>), t1(<stride_y>)
store t0(<address>), v1, t1(<stride_y>)
Permute Engine:
# interleave - shuffle - unpack lo/hi - table interleave
transpose v1, v0