mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-01-08 22:48:25 -05:00
* add support for a custom BASEDIR for openimages download * make export step faster * add focal loss * update model_eval with new dataloader * generate_anchors in tinygrad * update initializers for model * small cleanup * revert isin enhancements * recursively go through backbone layers to freeze them * add optimizer * minor cleanup * start dataloader work with input images * add first transform for train set * reuse existing prepare_target * continue with dataloader implementation * add dataloader * separate out KiTS19 dataset test cases * create mock data samples for test * add dataloader + test * cleanup dataloader test and revert shm path * trim dataloader related code needed from ref * got dataloader with normalize working * update image to be float32 * add back normalization and negate it in test * clean up reference dataset implementation + ruff changes * add validation set test * add proper training loop over the training dataset * add LambdaLR support * add LR scheduler and the start of training step * get forward call to model work and setup multi-GPU * already passed device * return matches from dataloader * hotfix for dataloader typo causing some hang * start some work on classification loss * update focal loss to support masking * add missing test and cleanup focal loss * cleanup unit tests * remove masking support for sigmoid_focal_loss * make ClassificationHead loss work * cleanups + fix dataloader tests * remove sigmoid when computing loss * make anchors use Tensors * simplify anchors batching * revert anchors to use np * implement regression loss * fix regression loss * cleanup losses * move BoxCoder to MLPerf helpers * revert helper changes * fixes after helper refactor cleanup * add tests for l1_loss * start re-enabling training step * minor cleanup * add pycocotools to testing dependencies * make training work * adjust regression loss to mask after L1 loss is calculated * reduce img and lbl sizes by half for KiTS19 dataset tests * Revert "reduce img and lbl sizes by half for KiTS19 dataset tests" This reverts commitd115b0c664. * temporarily disable openimages dataset tests to debug CI * enable openimages dataset test and create samples once * temporarily disable openimages validation set test * reenable test and add some debugging to the test * add boto3 testing dependencies * add pandas to testing dependencies * This reverts commit467704fec6. * reenable test * move sample creation to setup * realize boxcoder's encoding * add wandb * fix wandb resuming feature * move anchors as part of dataloader * fix dtype for anchor inside dataloader and fix horizontal flip transformation * add support for BENCHMARK * set seed * debug dataset test failuire * Revert "debug dataset test failuire" This reverts commit1b2f9d7f50. * fix dataloader script * do not realize when sharding model weights * setup openimages samples differently * create the necessary samples per test case * enable lr scheduler and fix benchmark timing * add jit to the training loop * add checkpointing and training resume capabilities * refactor on training loop and start the work on val looop * add debug logging for dataloader test * debug test * assert boxes again * update validation dataloader and more cleanups * fix validation test case * add multi device support to retinanet eval * fix issue with realized on dataloader * remove optional disk tensors in dataloader * remove verbose debugging on datasets test * put back parallel testing and remove img_ids Tensor from dataloader * cleanup train and validation dataloader * return validation targets in dataloader * cleanup boxes and labels in dataloader * fix img_ids repeating its values * remove unnecessary targets from validation dataloader * add validation loop to training script * adjust LR to be the ratio of the batch size * minor cleanups * remove frozen layers from optimizer's params * hyperparameter adjustments and cleanups * model init, hyperparam, and data preprocessing updates * no need to return loaded keys for resnet * fix train script * update loss calculation for regresionhead and some cleanups * add JIT reset support * add nan check during training * Revert "add nan check during training" This reverts commitddf1f0d5dd. * Revert "Revert "add nan check during training"" This reverts commitb7b2943197. * some typing cleanups * update seeding on dataloader and the start of training script * undo changse * undo more changes * more typing fixes * minor cleanups * update dataloader seed * hotfix: log metric and move target metric check outside of CKPT * check for CKPT when target metric is reached before saving * add TRAIN_BEAM and EVAL_BEAM * minor cleanup * update hyperparams and add support for EVAL_BS * add green coloring to metric reached statement * initial work to support f16 * update model initializers to be monkeypatched * update layers to support float32 weight loading + float16 training * don't return loss that's scaled * run eval on benchmark beam * move BEAM to their respective steps * update layers to be compatible with fp16 * end BENCHMARK after first eval * cleanups and adjust learning rate for fp16 * remove duplicated files from test * revert losses changes * Revert "revert losses changes" This reverts commitaebccf93ac. * go back to old LR * cast batchnorm to float32 * set new loss scaler default value for float16 * remove LambdaLRScheduler * remove runner and use dataloader on eval * fix retinanet eval with new dataloader * remove unused import * revert lr_scheduler updates * use BS=96 with new learning rate * rename module initializers * more cleanups on training loop * remove contig from optim.step * simplify sum when computing loss