mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-04-29 03:00:14 -04:00
python time 45ms -> 9ms, it was spending time to schedule the shard also init bert data on CLANG since it's from numpy, so we don't create the tensor on default device then shard into GPUS
Each model should be a clean single file. They are imported from the top level `models` directory It should be capable of loading weights from the reference imp. We will focus on these 5 models: # Resnet50-v1.5 (classic) -- 8.2 GOPS/input # Retinanet # 3D UNET (upconvs) # RNNT # BERT-large (transformer) They are used in both the training and inference benchmark: https://mlcommons.org/en/training-normal-21/ https://mlcommons.org/en/inference-edge-30/ And we will submit to both. NOTE: we are Edge since we don't have ECC RAM