Files
tinygrad/docs/nvidia_notes
George Hotz 2075fdeb4f FPGA Based Accelerator for Tinygrad (#258)
* ops_risk

* risk sim

* guessing is for winners

* minor

* better

* matmal with risk

* conv doesn't work

* closer

* conv2d works

* ops_risk

* opt2 works

* opt1 may not be possible

* opt1 is a mulacc

* arty

* attosoc example building on mac

* minor

* riscv assembler

* gucci gang

* we got C code

* not a scam

* hello

* make risk mergeable into master

* unop support
2021-06-07 17:45:09 -07:00

27 lines
369 B
Plaintext

A100 has 432 tensor cores
8x4x8 FP16 matmul = 256 FMAs
8x4x4 TF32 matmul = 128 FMAs
432 * 256 * 1410 mhz * 2(m and a) = 312 TOPS
Why aren't the other accelerators 3D like this?
--
Tesla chip
96x96 array
32 MiB SRAM
--
SNPE is using 4x4x4 -> 4x4 (64 FMAs) in the convs.
Then it's accumulating in that matrix.
256 ALUs (1 FMA per cycle)
652 mhz
---
319 GFLOPS