Commit Graph

9 Commits

Author SHA1 Message Date
hooved
969a1b35ca LR scheduler for Stable Diffusion mlperf training (#12201)
* add lr scheduler for stable diffusion training

* add lr scheduler test

* rerun ci

* rerun CI

* use np for testing

* move test to CI path

* remove unneeded copy
2025-09-30 21:21:08 -04:00
chenyu
8751d47985 CosineAnnealingLRWithWarmup (#10981) 2025-06-25 17:45:21 -04:00
chenyu
b886d250fb improve test_dropout_on_shard (#4912)
tested some basic property, also minor formatting for a few Tensor.training setups
2024-06-11 11:36:02 -04:00
George Hotz
17faae091b optimizer shouldn't be run without training (#4460)
* optimizer shouldn't be run without training

* set training in relevant tests

* fix multitensor

* that too
2024-05-06 15:34:12 -07:00
wozeparrot
9a9cac58f9 add lars to nn (#3750)
* feat: add lars

* feat: don't remove this comment

* clean: smaller diff

* clean: shorter line

* feat: remove mlperf lars, switch resnet

* fix: fully remove mlperf lars

* clean: comment

* feat: contiguous

* feat: no weight decay on skip params

* feat: optimizergroup

* feat: classic momentum

* fix: pylint

* clean: move comment

* fix: correct algo

* feat: lrschedulergroup

* feat: skip list tests

* feat: :| forgot that params are a thing

* feat: remove skip_list params from main params

* feat: set moment

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-24 11:43:12 -04:00
David Hou
9f66dcf718 PolynomialDecayWithWarmup + tests (#3649)
* working PolynomialDecayWithWarmup + tests.......

add lars_util.py, oops

* keep lars_util.py as intact as possible, simplify our interface

* whitespace

* clean up

* clean up

* asserts

* test polylr for full resnet training run

* add comment

* rename

* fix do_optim

* don't cast lr

* info

* calculate from train_files

* skip it
2024-03-07 18:53:36 -05:00
David Hou
0afaf70d57 lars optimizer + tests (#3631)
* lars optimizer + tests

* fix skip list!

* use id to compare in skip list

* go back to using set

* Tensor(bool) * Tensor(bool) is and

* don't lint external/mlperf_resnet

* whitespace

* add external_test_optim to opencl tests

* give mlperf task a name

* mlperf under onnx

* remove track_gnorm

* contiguous instead of realize

* assert momentum and weight decay positive

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-06 18:11:01 -05:00
Pavol Rusnak
52a92bf95d use class Foo: instead of class Foo(): (#1797)
* use class Foo: instead of class Foo():

* add ruff linter, copy settings from .flake8 to ruff.toml
2023-09-06 12:20:25 -07:00
wozeparrot
7460bd9b02 Add LAMB optimizer (#821)
* feat: initial lamb optimizer

* feat: corrently match tf impl and add test
2023-05-28 15:09:05 -07:00