Files
tinygrad/examples/mlperf
chenyu ff05bff221 put bert data shard inside jit (#9160)
python time 45ms -> 9ms, it was spending time to schedule the shard

also init bert data on CLANG since it's from numpy, so we don't create the tensor on default device then shard into GPUS
2025-02-18 10:36:54 -05:00
..
2024-09-10 04:37:28 -04:00
2023-05-10 16:30:49 -07:00

Each model should be a clean single file.
They are imported from the top level `models` directory

It should be capable of loading weights from the reference imp.

We will focus on these 5 models:

# Resnet50-v1.5 (classic) -- 8.2 GOPS/input
# Retinanet
# 3D UNET (upconvs)
# RNNT
# BERT-large (transformer)

They are used in both the training and inference benchmark:
https://mlcommons.org/en/training-normal-21/
https://mlcommons.org/en/inference-edge-30/
And we will submit to both.

NOTE: we are Edge since we don't have ECC RAM