Commit Graph

38 Commits

Author SHA1 Message Date
hooved
5d9035f5a6 Eval for Stable Diffusion mlperf (#12316)
* add diff

* rerun ci

* refactor beam workaround, add test

* fix conflict

* linting
2025-10-02 02:35:38 -04:00
wozeparrot
7e68045fb2 feat: small llama3 training (#11829) 2025-08-31 13:41:47 -07:00
wozeparrot
825b6a2505 feat: llama3 dataloader (#11340) 2025-07-30 13:27:55 -07:00
George Hotz
b3b43a82c4 remove Tensor.no_grad, it's meaningless now [pr] (#10556) 2025-05-28 22:20:02 -07:00
wozeparrot
1ed04f993b move benchmark stat tracking to influxdb (#10185) 2025-05-15 16:14:56 -07:00
chenyu
d3a8d5c128 print postprocess_detections time in retinanet eval (#10005)
`BS=96 BASEDIR="/raid/datasets/openimages" MODEL=retinanet python examples/mlperf/model_eval.py`

```
...
loaded dataset             @  8.64s
loaded initial data        @ 12.57s
******  619.97 ms to enqueue, 46042.13 ms to realize ( 116.22 ms fetching, 45399.58 ms postprocess_detections).     0.09 examples/sec.  0.83 TFLOPS  @ 59.23s
******  147.49 ms to enqueue, 37362.16 ms to realize ( 146.96 ms fetching, 36618.84 ms postprocess_detections).     0.11 examples/sec.  1.03 TFLOPS  @ 96.74s
******  152.85 ms to enqueue, 37244.08 ms to realize ( 120.67 ms fetching, 36235.19 ms postprocess_detections).     0.11 examples/sec.  1.04 TFLOPS  @ 134.14s
******  146.39 ms to enqueue, 37279.85 ms to realize (  65.07 ms fetching, 36233.56 ms postprocess_detections).     0.11 examples/sec.  1.04 TFLOPS  @ 171.56s
******  152.41 ms to enqueue, 37264.04 ms to realize ( 127.08 ms fetching, 36196.10 ms postprocess_detections).     0.11 examples/sec.  1.04 TFLOPS  @ 208.98s
******  151.29 ms to enqueue, 36868.08 ms to realize ( 142.73 ms fetching, 36153.07 ms postprocess_detections).     0.11 examples/sec.  1.05 TFLOPS  @ 246.00s
******  136.41 ms to enqueue, 37325.04 ms to realize (  90.29 ms fetching, 36573.38 ms postprocess_detections).     0.11 examples/sec.  1.04 TFLOPS  @ 283.46s
```
2025-04-23 11:39:56 -04:00
Francis Lata
71b8890dd6 use validation dataloader inside retinanet eval (#9747) 2025-04-05 16:46:55 -04:00
Francis Lata
d3a387be63 [MLPerf] Prepare openimages dataset script (#6747)
* prepare openimages for MLPerf

* cleanup

* fix issue when clearing jit_cache on retinanet eval

* revert pandas specific changes
2024-09-27 11:13:56 -04:00
Anurag Lamsal
568757e087 fix model_eval.py in the mlperf folder searching for bert vocab in the wrong directory (#6649) 2024-09-24 11:20:44 +08:00
Nik
085c0bbf6b add mlperf train subset of openimages (#4841) 2024-06-05 10:10:11 -04:00
chenyu
683b7c605a pad first batch of imagenet dataloader and update eval (#4368)
* pad first batch of imagenet dataloader and update eval

* pad zero instead of empty for training
2024-05-01 00:21:52 -04:00
Francis Lata
bb849a57d1 [MLPerf] UNet3D dataloader (#4343)
* add support for train/val datasets for kits19

* split dataset into train and val sets

* add tests for kits19 dataloader

* add MLPerf dataset tests to CI

* update unet3d model_eval script

* fix linting

* add nibabel

* fix how mock dataset gets created

* update ref implementation with permalink and no edits

* clean up test and update rand_flip implementation

* cleanups
2024-04-28 22:34:18 -04:00
Francis Lata
3644077a42 [MLPerf][UNet3D] Add DICE loss + metrics (#4204)
* add DICE loss and metrics

* update dice to include reference implementation's link

* remove unused imports

* remove unnecessary test file and update pred + label for metrics and losses test

* add tests to CI + add exclusion of mlperf_unet3d

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-17 20:09:33 -04:00
chenyu
c71627fee6 move GlobalCounter to helpers (#4002)
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
George Hotz
150ea2eb76 create engine folder and move code (#3948)
* retry

* older tf

* that
2024-03-26 20:38:03 -07:00
David Hou
199f7c4342 MLPerf Resnet (cleaned up) (#3573)
* this is a lot of stuff

TEST_TRAIN env for less data

don't diskcache get_train_files

debug message

no lr_scaler for fp32

comment, typo

type stuff

don't destructure proc

make batchnorm parameters float

make batchnorm parameters float

resnet18, checkpointing

hack up checkpointing to keep the names in there

oops

wandb_resume

lower lr

eval/ckpt use e+1

lars

report top_1_acc

some wandb stuff

split fw and bw steps to save memory

oops

save model when reach target

formatting

make sgd hparams consistent

just always write the cats tag...

pass X and Y into backward_step to trigger input replace

shuffle eval set to fix batchnorm eval

dataset is sorted by class, so the means and variances are all wrong

small cleanup

hack restore only one copy of each tensor

do bufs from lin after cache check (lru should handle it fine)

record epoch in wandb

more digits for topk in eval

more env vars

small cleanup

cleanup hack tricks

cleanup hack tricks

don't save ckpt for testeval

cleanup

diskcache train file glob

clean up a little

device_str

SCE into tensor

small

small

log_softmax out of resnet.py

oops

hack :(

comments

HeNormal, track gradient norm

oops

log SYNCBN to wandb

real truncnorm

less samples for truncated normal

custom init for Linear

log layer stats

small

Revert "small"

This reverts commit 988f4c1cf3.

Revert "log layer stats"

This reverts commit 9d98224585.

rename BNSYNC to SYNCBN to be consistent with cifar

optional TRACK_NORMS

fix label smoothing :/

lars skip list

only weight decay if not in skip list

comment

default 0 TRACK_NORMS

don't allocate beam scratch buffers if in cache

clean up data pipeline, unsplit train/test, put back a hack

remove print

run test_indexing on remu (#3404)

* emulated ops_hip infra

* add int4

* include test_indexing in remu

* Revert "Merge branch 'remu-dev-mac'"

This reverts commit 6870457e57, reversing
changes made to 3c4c8c9e16.

fix bad seeding

UnsyncBatchNorm2d but with synced trainable weights

label downsample batchnorm in Bottleneck

:/

:/

i mean... it runs... its hits the acc... its fast...

new unsyncbatchnorm for resnet

small fix

don't do assign buffer reuse for axis change

* remove changes

* remove changes

* move LARS out of tinygrad/

* rand_truncn rename

* whitespace

* stray whitespace

* no more gnorms

* delete some dataloading stuff

* remove comment

* clean up train script

* small comments

* move checkpointing stuff to mlperf helpers

* if WANDB

* small comments

* remove whitespace change

* new unsynced bn

* clean up prints / loop vars

* whitespace

* undo nn changes

* clean up loops

* rearrange getenvs

* cpu_count()

* PolynomialLR whitespace

* move he_normal out

* cap warmup in polylr

* rearrange wandb log

* realize both x and y in data_get

* use double quotes

* combine prints in ckpts resume

* take UBN from cifar

* running_var

* whitespace

* whitespace

* typo

* if instead of ternary for resnet downsample

* clean up dataloader cleanup a little?

* separate rng for shuffle

* clean up imports in model_train

* clean up imports

* don't realize copyin in data_get

* remove TESTEVAL (train dataloader didn't get freed every loop)

* adjust wandb_config entries a little

* clean up wandb config dict

* reduce lines

* whitespace

* shorter lines

* put shm unlink back, but it doesn't seem to do anything

* don't pass seed per task

* monkeypatch batchnorm

* the reseed was wrong

* add epoch number to desc

* don't unsyncedbatchnorm is syncbn=1

* put back downsample name

* eval every epoch

* Revert "the reseed was wrong"

This reverts commit 3440a07dff3f40e8a8d156ca3f1938558a59249f.

* cast lr in onecycle

* support fp16

* cut off kernel if expand after reduce

* test polynomial lr

* move polynomiallr to examples/mlperf

* working PolynomialDecayWithWarmup + tests.......

add lars_util.py, oops

* keep lars_util.py as intact as possible, simplify our interface

* no more half

* polylr and lars were merged

* undo search change

* override Linear init

* remove half stuff from model_train

* update scheduler init with new args

* don't divide by input mean

* mistake in resnet.py

* restore whitespace in resnet.py

* add test_data_parallel_resnet_train_step

* move initializers out of resnet.py

* unused imports

* log_softmax to model output in test to fix precision flakiness

* log_softmax to model output in test to fix precision flakiness

* oops, don't realize here

* is None

* realize initializations in order for determinism

* BENCHMARK flag for number of steps

* add resnet to bechmark.yml

* return instead of break

* missing return

* cpu_count, rearrange benchmark.yml

* unused variable

* disable tqdm if BENCHMARK

* getenv WARMUP_EPOCHS

* unlink disktensor shm file if exists

* terminate instead of join

* properly shut down queues

* use hip in benchmark for now

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-14 00:53:41 -04:00
George Hotz
41efaa848c move graph.py and jit.py into features (#3376)
* move graph.py into features

* move jit into features

* fix quickstart
2024-02-12 17:34:34 +01:00
Francis Lata
86748f4a8c fix bbox format to be a list (#3265) 2024-01-27 17:54:19 -08:00
George Hotz
228f30b96a multitensor jit (#3149)
* initial multitensor jit support and tests

* Added graphs to multitensor jit and updated tests

* update unbind api

* fix set device, add TinyJit to resnet

* update_stats includes device

---------

Co-authored-by: ramenguy99 <ramenguy99@gmail.com>
2024-01-16 09:09:15 -08:00
George Hotz
cec0a7bc37 use shard api to eval resnet fast (#3136)
* use shard api to eval resnet fast

* to supports shard

* test to in multitensor
2024-01-15 16:49:38 -08:00
George Hotz
a464909d79 fast resnet eval (#3135)
* fast resnet eval

* fix HIP multidevice graph

* neater expression for devices

* lines

* add decorator test
2024-01-15 14:15:18 -08:00
George Hotz
a280cfe169 move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
c81ce9643d move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
George Hotz
232ed2af3f more test cleanups (#2631)
* more test cleanups

* move test example back
2023-12-05 16:17:57 -08:00
George Hotz
0cbf6c1811 move things, clean up extra (#2292)
* move things

* idk why pylint needs that now

* delete unused
2023-11-13 20:18:40 -08:00
George Hotz
4ff35e2b97 better resnet eval (#1943) 2023-09-29 05:40:25 -07:00
Karan Handa
a8aa13dc91 [ready] Replacing os with pathlib (#1708)
* replace os.path with pathlib

* safe convert dirnames to pathlib

* replace all os.path.join

* fix cuda error

* change main chunk

* Reviewer fixes

* fix vgg

* Fixed everything

* Final fixes

* ensure consistency

* Change all parent.parent... to parents
2023-08-30 10:41:08 -07:00
Umut Zengin
f720682beb np.argmax to Tensor.argmax (#1608)
* to tensor argmax

* removed keepdim

* training update
2023-08-21 15:22:29 -07:00
terafo
aa60feda48 Fix naming conflict with huggingface datasets (#1161)
* Rename in files

* Move files

* Moved to extra/datasets as suggested

* Changes to files

* Fixed stupid mistake

---------

Co-authored-by: terafo <terafo@protonmail.com>
2023-07-07 10:43:44 -07:00
George Hotz
6ec0a24706 imagenet eval in 1 min 28 sec 2023-06-28 04:23:26 +00:00
Kunwar Raj Singh
5d3310ce56 MaskRCNN Inference (#884)
* MaskRCNN weights loading

* backbone maybe works

* backbone works, but resnet body atol 1e-3

* RPN Call, but veryy wrong output

* fixed topk

* RPN maybe works, not sure about nms

* Fix cursed modules

* add back editorconfig

* Full call, wrong output

* Full call works

* fix mask

* use NMS from retinanet

* Removing extra funcs

* refactor

* readable

* Add example to run model

* remove filter

* Fix split, batched inference is worse

* Fix image sizes

* Matching reference

* merge master

* add filter on top detections

* cuda backend fixed

* add model eval and spec

* convert images to rgb

* fix eval

* simplify examples code

* remove extra code

* meshgrid using tinygrad

* removing numpy

* roi align, floor, ceil

* remove numpy from level_mapper

* remove numpy from pooler

* Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference"

This reverts commit 4b95a3cb49, reversing
changes made to 98f2b1fa2e.

* roi align gather

* fix master merge

* revert to old floor, ceil as ints present in domain

* use log2 op

* fix indexes

* weird bug with ints and gpu

* weird bug with ints and gpu

* refactors, add env var for gather

* floor with contiguous, where

* refactor topk, sort

* remove staticmethod

* refactor stride

* remove log2 mlop

* realize -> contiguous

* refactor forward

* remove num_classes, stride_in_1x1 from state

* refactor forward

* refactoring

* flake8

* removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk

* keep using tinygrad for smaller gathers

* fix empty tensors

* comms

* move from tensor.py

* resnet test passing

* add coco dataset back

* fix spaces

* add test for log2

* no need to create Tensors

* no need to create Tensors

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-06-25 15:37:51 -07:00
Jacky Lee
5d212864b5 Add MLPerf UNet3D model (#775)
* Add ResNet inference test and cannon

* Test with ResNet50

* test_car works with resnet fix

* Add KiTS19 dataset

* KiTS19: Implement iterate

* No batch load for this dataset

* Save results on iterate

* Implement dice score

* Add data prep and eval functions

* Resolve shape issue

* Conversion works but wrong values

* Segfaults when load_from_pretrained is called

* Fix segfault and assign properly

* Final result generated, though very slow

* Store and load final result to save time

* Fix typo in finalize

* Score computes

* More bug fixes, dice score is very low

* Working broken code

* Assign output values to result

* Getting a much higher score now

* Fix dataset preprocessing

* Mean DICE score of 88.5

* Ugh, typo

* Attempt to reimplement model

* Rename layers

* Tiny model works, kinda

* Accuracy? gone

* Implement InstanceNorm and match torch

* Test instance norm 2d and 3d

* Combined input block with downsample block

* Tiny model works, support strided convtranspose

* Commands to download dataset

* Clean up a bit

* unet3d_v2 -> unet3d

* Remove duplicated code

* Oops, put tests back
2023-05-28 20:38:19 -07:00
Sohaib
65d09031f2 add retinanet with resnet backbone (#813)
* add retinanet with resnet backbone

* adds resnext to support loading retinanet pretrained on openimages
* object detection post processing with numpy
* data is downloaded and converted to coco format with fiftyone
* data loading and mAP evaluation with pycocotools

* remove fiftyone dep

* * eval freq

* fix model timing

* del jit for last batch

* faster accumulate
2023-05-28 20:20:16 -07:00
wozeparrot
67de3aa1de Add mlperf bert model (#803)
* feat: add mlperf bert model

* feat: switch to nn.Embedding

* clean+fix: fix formatting

* feat: add simple downloader

* feat: metrics

* feat: don't actually need exact match

* feat: doing a run

* feat: set eps on the layernorms

* clean+fix: cleaner impl + hopefully fixed

* feat: move dataset initialization into iterate

* feat: move tokenizer out of iterate

* clean+fix: cleaner + working

* clean: cleanup

* fix: fix metrics

* feat: need to use original bert gelu + download vocab

* feat: make directory if it doesn't exist yet

* feat: jit go brrr
2023-05-27 14:53:32 -07:00
George Hotz
a968c4c3a4 Cleanup mlperf (#797)
* improve factorization

* cleanups
2023-05-25 11:36:43 -07:00
wozeparrot
01ae45a43c Add mlperf RNN-T model (#782)
* feat: initial rnn-t

* feat: working with BS>1

* feat: add lstm test

* feat: test passing hidden

* clean: cleanup

* feat: specify start

* feat: way faster lstm & model

* fix: default batch size

* feat: optimization

* fix: fix metrics

* fix: fix feature splicing

* feat: cleaner stacktime

* clean: remove unused import

* clean: remove extra prints

* fix: fix tests and happy llvm

* feat: have the librispeech dataset in its own dir

* clean: unused variable

* feat: no longer need numpy for the embedding + slightly more memory efficient lstm

* fix: forgot to remove something that broke tests

* feat: use relative paths

* feat: even faster

* feat: remove pointless transposes in StackTime

* fix: correct forward

* feat: switch to soundfile for loading and fix some leaks

* feat: add comment about initial dataset setup

* feat: jit more things

* feat: default batch size back to 1

larger than 1 is broken again :(
and even in the reference implementation it gives worse results
2023-05-25 00:41:21 -07:00
George Hotz
e0b2035023 fast imagenet eval, gets 76.14% across the set 2023-05-13 21:18:31 -07:00
George Hotz
b705510d5c getting 77% on imagenet eval 2023-05-13 07:46:27 -07:00