tinygrad/docs/nvidia_notes at 2affd226b30e592b29d2bc027745758cd80cb827 - tinygrad - AtHeartEngineering

github/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Files

George Hotz 2075fdeb4f FPGA Based Accelerator for Tinygrad (#258 )

* ops_risk

* risk sim

* guessing is for winners

* minor

* better

* matmal with risk

* conv doesn't work

* closer

* conv2d works

* ops_risk

* opt2 works

* opt1 may not be possible

* opt1 is a mulacc

* arty

* attosoc example building on mac

* minor

* riscv assembler

* gucci gang

* we got C code

* not a scam

* hello

* make risk mergeable into master

* unop support

2021-06-07 17:45:09 -07:00

27 lines

369 B

Plaintext

Raw Blame History

 A100 has 432 tensor cores
 x4x8 FP16 matmul = 256 FMAs
 x4x4 TF32 matmul = 128 FMAs
 * 256 * 1410 mhz * 2(m and a) = 312 TOPS
 Why aren't the other accelerators 3D like this?
 --
 Tesla chip
 x96 array
 MiB SRAM
 --
 SNPE is using 4x4x4 -> 4x4 (64 FMAs) in the convs.
 Then it's accumulating in that matrix.
 ALUs (1 FMA per cycle)
 mhz
 ---
 GFLOPS