mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-04-29 03:00:14 -04:00
* ops_risk * risk sim * guessing is for winners * minor * better * matmal with risk * conv doesn't work * closer * conv2d works * ops_risk * opt2 works * opt1 may not be possible * opt1 is a mulacc * arty * attosoc example building on mac * minor * riscv assembler * gucci gang * we got C code * not a scam * hello * make risk mergeable into master * unop support
27 lines
369 B
Plaintext
27 lines
369 B
Plaintext
A100 has 432 tensor cores
|
|
|
|
8x4x8 FP16 matmul = 256 FMAs
|
|
8x4x4 TF32 matmul = 128 FMAs
|
|
|
|
432 * 256 * 1410 mhz * 2(m and a) = 312 TOPS
|
|
|
|
Why aren't the other accelerators 3D like this?
|
|
|
|
--
|
|
|
|
Tesla chip
|
|
|
|
96x96 array
|
|
32 MiB SRAM
|
|
|
|
--
|
|
|
|
SNPE is using 4x4x4 -> 4x4 (64 FMAs) in the convs.
|
|
Then it's accumulating in that matrix.
|
|
|
|
256 ALUs (1 FMA per cycle)
|
|
652 mhz
|
|
---
|
|
319 GFLOPS
|
|
|