* Kernel.apply_opts [pr]
updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization
* not you yet
* Add amax support to Tensor operations
- Implemented amax function in backend.py for tensor max operations.
- Added unit tests for amax in test.py to ensure correct functionality.
* Fix formatting in amax output function
- Adjusted spacing in the amax output lambda function in backend.py
- Improved code readability for better maintenance
Not-quite-required but makes cloud graph a *lot* cleaner because unlike
raw compiled programs `GraphRunner` takes `Buffer`s like other runners.
Otherwise either of: adding a new option to not free on `__del__`,
(ab)using `external_ptr` to prevent free, or making something like a
`FakeBuffer` is required.
* fp8s part 1
* prettier
* fixes
* fixes
* remove stuff that should be in next pr
* revert
* add creation
---------
Co-authored-by: pkotzbach <pawkotz@gmail.com>
* FastPatternMatcher
* works without that
* fix test pickle
* strict len
* compile match function
* dynamic compile
* fast
* faster
* compile
* track
* a lot faster
* clean up
* dup or
* faster and simpler
* fast match doesn't support store
* plane
* minor refactor
* real speed
* don't imply return None
* upat
* fix test
* heard you wanted more speed
* no generator
* split cf
* early fixup
* fxn fixup
* reconstruct_function
* Revert "reconstruct_function"
This reverts commit 37dac010ab.
* simpler stuff
* too big
* upat compile error
* cleanups
* don't cache that
* cleanups
* 10 -> 15
* add support for a custom BASEDIR for openimages download
* make export step faster
* add focal loss
* update model_eval with new dataloader
* generate_anchors in tinygrad
* update initializers for model
* small cleanup
* revert isin enhancements
* recursively go through backbone layers to freeze them
* add optimizer
* minor cleanup
* start dataloader work with input images
* add first transform for train set
* reuse existing prepare_target
* continue with dataloader implementation
* add dataloader
* separate out KiTS19 dataset test cases
* create mock data samples for test
* add dataloader + test
* cleanup dataloader test and revert shm path
* trim dataloader related code needed from ref
* got dataloader with normalize working
* update image to be float32
* add back normalization and negate it in test
* clean up reference dataset implementation + ruff changes
* add validation set test
* add proper training loop over the training dataset
* add LambdaLR support
* add LR scheduler and the start of training step
* get forward call to model work and setup multi-GPU
* already passed device
* return matches from dataloader
* hotfix for dataloader typo causing some hang
* start some work on classification loss
* update focal loss to support masking
* add missing test and cleanup focal loss
* cleanup unit tests
* remove masking support for sigmoid_focal_loss
* make ClassificationHead loss work
* cleanups + fix dataloader tests
* remove sigmoid when computing loss
* make anchors use Tensors
* simplify anchors batching
* revert anchors to use np
* implement regression loss
* fix regression loss
* cleanup losses
* move BoxCoder to MLPerf helpers
* revert helper changes
* fixes after helper refactor cleanup
* add tests for l1_loss
* start re-enabling training step
* minor cleanup
* add pycocotools to testing dependencies
* make training work
* adjust regression loss to mask after L1 loss is calculated
* reduce img and lbl sizes by half for KiTS19 dataset tests
* Revert "reduce img and lbl sizes by half for KiTS19 dataset tests"
This reverts commit d115b0c664.
* temporarily disable openimages dataset tests to debug CI
* enable openimages dataset test and create samples once
* temporarily disable openimages validation set test
* reenable test and add some debugging to the test
* add boto3 testing dependencies
* add pandas to testing dependencies
* This reverts commit 467704fec6.
* reenable test
* move sample creation to setup
* realize boxcoder's encoding
* add wandb
* fix wandb resuming feature
* move anchors as part of dataloader
* fix dtype for anchor inside dataloader and fix horizontal flip transformation
* add support for BENCHMARK
* set seed
* debug dataset test failuire
* Revert "debug dataset test failuire"
This reverts commit 1b2f9d7f50.
* fix dataloader script
* do not realize when sharding model weights
* setup openimages samples differently
* create the necessary samples per test case
* enable lr scheduler and fix benchmark timing
* add jit to the training loop
* add checkpointing and training resume capabilities
* refactor on training loop and start the work on val looop
* add debug logging for dataloader test
* debug test
* assert boxes again
* update validation dataloader and more cleanups
* fix validation test case
* add multi device support to retinanet eval
* fix issue with realized on dataloader
* remove optional disk tensors in dataloader
* remove verbose debugging on datasets test
* put back parallel testing and remove img_ids Tensor from dataloader
* cleanup train and validation dataloader
* return validation targets in dataloader
* cleanup boxes and labels in dataloader
* fix img_ids repeating its values
* remove unnecessary targets from validation dataloader
* add validation loop to training script
* adjust LR to be the ratio of the batch size
* minor cleanups
* remove frozen layers from optimizer's params
* hyperparameter adjustments and cleanups
* model init, hyperparam, and data preprocessing updates
* no need to return loaded keys for resnet
* fix train script
* update loss calculation for regresionhead and some cleanups
* add JIT reset support
* add nan check during training
* Revert "add nan check during training"
This reverts commit ddf1f0d5dd.
* Revert "Revert "add nan check during training""
This reverts commit b7b2943197.
* some typing cleanups
* update seeding on dataloader and the start of training script
* undo changse
* undo more changes
* more typing fixes
* minor cleanups
* update dataloader seed
* hotfix: log metric and move target metric check outside of CKPT
* check for CKPT when target metric is reached before saving
* add TRAIN_BEAM and EVAL_BEAM
* minor cleanup
* update hyperparams and add support for EVAL_BS
* add green coloring to metric reached statement
* initial work to support f16
* update model initializers to be monkeypatched
* update layers to support float32 weight loading + float16 training
* don't return loss that's scaled
* run eval on benchmark beam
* move BEAM to their respective steps
* update layers to be compatible with fp16
* end BENCHMARK after first eval
* cleanups and adjust learning rate for fp16
* remove duplicated files from test
* revert losses changes
* Revert "revert losses changes"
This reverts commit aebccf93ac.
* go back to old LR
* cast batchnorm to float32
* set new loss scaler default value for float16
* remove LambdaLRScheduler
* remove runner and use dataloader on eval
* fix retinanet eval with new dataloader
* remove unused import
* revert lr_scheduler updates
* use BS=96 with new learning rate
* rename module initializers
* more cleanups on training loop
* remove contig from optim.step
* simplify sum when computing loss