George Hotz
871df1436a
more beautiful cifar ( #10551 )
...
* enumerate cases of Tensors in the JIT
* optional fused optimizers
* add fused optimizer test
* move that there
* ugh
* work on beautiful_cifar
* speed close to hlb_cifar
* schedule to corealize all
* one line sched step
* less lines
2025-05-28 20:48:20 -07:00
chenyu
5ae252ae83
use at least float32 for optim.lr ( #4297 )
...
* use at least float32 for optim.lr
when doing mixed precision training (float32 weight, default_float=half), still use float32 to store lr.
it would have been upcasted later in actual weight update, but would have lost precision.
this improved resnet convergence significantly
* undo type annotation
2024-04-25 14:42:28 -04:00
wozeparrot
9a9cac58f9
add lars to nn ( #3750 )
...
* feat: add lars
* feat: don't remove this comment
* clean: smaller diff
* clean: shorter line
* feat: remove mlperf lars, switch resnet
* fix: fully remove mlperf lars
* clean: comment
* feat: contiguous
* feat: no weight decay on skip params
* feat: optimizergroup
* feat: classic momentum
* fix: pylint
* clean: move comment
* fix: correct algo
* feat: lrschedulergroup
* feat: skip list tests
* feat: :| forgot that params are a thing
* feat: remove skip_list params from main params
* feat: set moment
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-03-24 11:43:12 -04:00
Yixiang Gao
8a63f26a0f
make LR scheduler work with multigpu ( #3011 )
...
* add a failing test for LR scheduler when using multigpu
* fix calculation order and unnecessary tensor created for float
* min_lr is no longer tensor
2024-01-04 12:10:56 -08:00
George Hotz
cbb8486779
ResNet training changes (update benchmark) ( #2390 )
...
* default arg for chunk
* bring back to_
* good changes
* new set
* unused hash
* fix optim
* new torch loader
* fix test lr scheduler
2023-11-22 17:41:12 -08:00
Jacob Pradels
b112edd2c3
Add pylint trailing whitespace rule ( #1314 )
2023-07-21 13:37:55 -04:00
Yixiang Gao
a8f2c16f8e
add contiguous ( #1246 )
2023-07-15 08:36:34 -07:00
Kunwar Raj Singh
8391648822
Over 90% on CIFAR with examples/hlb_cifar10.py ( #1073 )
...
* fix eval, lr decay, best eval
* 82.27
* 82.64
* 82.79, reproducable
* add lr sched, 85.26
* 87.42
* 87.94
* 87.42
* tta with flip
* training flip aug
* refactor
* using Tensor for LR is faster
* 89.5
* refactor, flip only train set
* 90.01
* 90.64
* eval jit
* refactor
* only JIT model
* fix eval JIT
* fix eval JIT
* 90.82
* STEPS=900 reaches 90.22
* TTA envvar
* TTA default 0
* fully jit training
* refactor optim
* fix sched
* add label smoothing
* param changes
* patial gelu
* OneCycle with pause
* gelu maybe works
* 90.12
* remove pause lr
* maybe fix lr schedulers
* scheduler test passing
* comments
* try mixup
* shuffle!
* add back the missing last eval
* fix shuffle bugs
* add mixup prob
* fix mixup prob
* 90.19
* correct mixup
* correct mixup
* correct mixup
* 90.24
* 90.33
* refactor, add type hints
* add gradient clipping
* maybe fix test
* full JIT
* back to relu for now
* pass mixup prob as param
* add typehints
* maybe CI works
* try erf gelu
* CI, types
* remove useless import/
* refactor optim
* refactor optim
* try leakyrelu
* try celu
* gelu
* 90.67
* remove grad clip
* remove grad clip tests
* revert params
* add test for OneCycleLR
* 90.62
* fix eval timing
* fix eval timing again
* so where i calculate mixup_prob matters
---------
Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain >
2023-07-06 20:46:22 -07:00
Mattis Megevand
606b841d3f
LR Schedulers ( #755 )
...
* lr schedulers + test
* lr scheduler test moved + integration test
* integration test for all lr scheduler
* lr scheduler test now deterministic
* changed optimizer + parameters for lr sched test
2023-05-27 07:47:49 -07:00