* work on minrf example
* more
* jit sample
* t is tensor not const
* fixes
* more convs
* fix dropout
* don't print
* 504
* big patch
* onehot
* touch
* use embeddings
* dumb uses final layer
* act
* non fl
* match
* tp
* 3
* of
* ppsz
* normal
* add adln
* no t
* weird transformer
* weird transformer
* contig
* actual speed fix
* dumb
* cb
* 0
* t is 0
* mort-t
* args
* dumb days are over
* readable
* contig
* no more t mask
* mask_t
* init to zero
* clean
* steps
* work
* tt
* t
* solid
* make beautiful indexing use a Variable
* stunning test
* better color
* training is broken
* fix tests
* fix variable indexing
* fix test
* no contiguous
* revert that
* revert that too
* indexing two bind
* skip for webgpu
* make not slow
`BS=96 BASEDIR="/raid/datasets/openimages" MODEL=retinanet python examples/mlperf/model_eval.py`
```
...
loaded dataset @ 8.64s
loaded initial data @ 12.57s
****** 619.97 ms to enqueue, 46042.13 ms to realize ( 116.22 ms fetching, 45399.58 ms postprocess_detections). 0.09 examples/sec. 0.83 TFLOPS @ 59.23s
****** 147.49 ms to enqueue, 37362.16 ms to realize ( 146.96 ms fetching, 36618.84 ms postprocess_detections). 0.11 examples/sec. 1.03 TFLOPS @ 96.74s
****** 152.85 ms to enqueue, 37244.08 ms to realize ( 120.67 ms fetching, 36235.19 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 134.14s
****** 146.39 ms to enqueue, 37279.85 ms to realize ( 65.07 ms fetching, 36233.56 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 171.56s
****** 152.41 ms to enqueue, 37264.04 ms to realize ( 127.08 ms fetching, 36196.10 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 208.98s
****** 151.29 ms to enqueue, 36868.08 ms to realize ( 142.73 ms fetching, 36153.07 ms postprocess_detections). 0.11 examples/sec. 1.05 TFLOPS @ 246.00s
****** 136.41 ms to enqueue, 37325.04 ms to realize ( 90.29 ms fetching, 36573.38 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 283.46s
```
eval 35 sec -> 20 sec. it was spending 13 seconds assembling output tensor on CPU backend. GPUS[0] seems to have enough memory, otherwise we can lower EVAL_BS
* add support for a custom BASEDIR for openimages download
* make export step faster
* add focal loss
* update model_eval with new dataloader
* generate_anchors in tinygrad
* update initializers for model
* small cleanup
* revert isin enhancements
* recursively go through backbone layers to freeze them
* add optimizer
* minor cleanup
* start dataloader work with input images
* add first transform for train set
* reuse existing prepare_target
* continue with dataloader implementation
* add dataloader
* separate out KiTS19 dataset test cases
* create mock data samples for test
* add dataloader + test
* cleanup dataloader test and revert shm path
* trim dataloader related code needed from ref
* got dataloader with normalize working
* update image to be float32
* add back normalization and negate it in test
* clean up reference dataset implementation + ruff changes
* add validation set test
* add proper training loop over the training dataset
* add LambdaLR support
* add LR scheduler and the start of training step
* get forward call to model work and setup multi-GPU
* already passed device
* return matches from dataloader
* hotfix for dataloader typo causing some hang
* start some work on classification loss
* update focal loss to support masking
* add missing test and cleanup focal loss
* cleanup unit tests
* remove masking support for sigmoid_focal_loss
* make ClassificationHead loss work
* cleanups + fix dataloader tests
* remove sigmoid when computing loss
* make anchors use Tensors
* simplify anchors batching
* revert anchors to use np
* implement regression loss
* fix regression loss
* cleanup losses
* move BoxCoder to MLPerf helpers
* revert helper changes
* fixes after helper refactor cleanup
* add tests for l1_loss
* start re-enabling training step
* minor cleanup
* add pycocotools to testing dependencies
* make training work
* adjust regression loss to mask after L1 loss is calculated
* reduce img and lbl sizes by half for KiTS19 dataset tests
* Revert "reduce img and lbl sizes by half for KiTS19 dataset tests"
This reverts commit d115b0c664.
* temporarily disable openimages dataset tests to debug CI
* enable openimages dataset test and create samples once
* temporarily disable openimages validation set test
* reenable test and add some debugging to the test
* add boto3 testing dependencies
* add pandas to testing dependencies
* This reverts commit 467704fec6.
* reenable test
* move sample creation to setup
* realize boxcoder's encoding
* add wandb
* fix wandb resuming feature
* move anchors as part of dataloader
* fix dtype for anchor inside dataloader and fix horizontal flip transformation
* add support for BENCHMARK
* set seed
* debug dataset test failuire
* Revert "debug dataset test failuire"
This reverts commit 1b2f9d7f50.
* fix dataloader script
* do not realize when sharding model weights
* setup openimages samples differently
* create the necessary samples per test case
* enable lr scheduler and fix benchmark timing
* add jit to the training loop
* add checkpointing and training resume capabilities
* refactor on training loop and start the work on val looop
* add debug logging for dataloader test
* debug test
* assert boxes again
* update validation dataloader and more cleanups
* fix validation test case
* add multi device support to retinanet eval
* fix issue with realized on dataloader
* remove optional disk tensors in dataloader
* remove verbose debugging on datasets test
* put back parallel testing and remove img_ids Tensor from dataloader
* cleanup train and validation dataloader
* return validation targets in dataloader
* cleanup boxes and labels in dataloader
* fix img_ids repeating its values
* remove unnecessary targets from validation dataloader
* add validation loop to training script
* adjust LR to be the ratio of the batch size
* minor cleanups
* remove frozen layers from optimizer's params
* hyperparameter adjustments and cleanups
* model init, hyperparam, and data preprocessing updates
* no need to return loaded keys for resnet
* fix train script
* update loss calculation for regresionhead and some cleanups
* add JIT reset support
* add nan check during training
* Revert "add nan check during training"
This reverts commit ddf1f0d5dd.
* Revert "Revert "add nan check during training""
This reverts commit b7b2943197.
* some typing cleanups
* update seeding on dataloader and the start of training script
* undo changse
* undo more changes
* more typing fixes
* minor cleanups
* update dataloader seed
* hotfix: log metric and move target metric check outside of CKPT
* check for CKPT when target metric is reached before saving
* add TRAIN_BEAM and EVAL_BEAM
* minor cleanup
* update hyperparams and add support for EVAL_BS
* add green coloring to metric reached statement
* initial work to support f16
* update model initializers to be monkeypatched
* update layers to support float32 weight loading + float16 training
* don't return loss that's scaled
* run eval on benchmark beam
* move BEAM to their respective steps
* update layers to be compatible with fp16
* end BENCHMARK after first eval
* cleanups and adjust learning rate for fp16
* remove duplicated files from test
* revert losses changes
* Revert "revert losses changes"
This reverts commit aebccf93ac.
* go back to old LR
* cast batchnorm to float32
* set new loss scaler default value for float16
* remove LambdaLRScheduler
* remove runner and use dataloader on eval
* fix retinanet eval with new dataloader
* remove unused import
* revert lr_scheduler updates
* use BS=96 with new learning rate
* rename module initializers
* more cleanups on training loop
* remove contig from optim.step
* simplify sum when computing loss