only resnet18, it's too slow otherwise

2026-01-09 15:08:02 -05:00 · 2021-10-30 16:48:39 -07:00
parent c05867dcbb
commit fc6597a6d9
2 changed files with 2 additions and 2 deletions
--- a/docs/nvidia_notes
+++ b/docs/nvidia_notes
@@ -1,26 +0,0 @@
-A100 has 432 tensor cores
-
-8x4x8 FP16 matmul = 256 FMAs
-8x4x4 TF32 matmul = 128 FMAs
-
-432 * 256 * 1410 mhz * 2(m and a) = 312 TOPS
-
-Why aren't the other accelerators 3D like this?
-
--
-
-Tesla chip
-
-96x96 array
-32 MiB SRAM
-
-- 
-
-SNPE is using 4x4x4 -> 4x4 (64 FMAs) in the convs.
-Then it's accumulating in that matrix.
-
-256 ALUs (1 FMA per cycle)
-652 mhz
---
-319 GFLOPS
-