Commit Graph

43 Commits

Author SHA1 Message Date
Szymon Ożóg
6c36264790 Improve type hints for optimizer (#3583)
* Improve type hints for optimizer

* lint fix
2024-03-02 07:35:44 -08:00
George Hotz
e0ecab3797 touchups from multibuffer branch (#2958) 2024-01-01 11:33:41 -08:00
George Hotz
e1861ab65e remove realize from optimizer (#2880)
* remove realize from optimizer

* one still needed

* opt realize
2023-12-20 16:42:41 -08:00
George Hotz
cbb8486779 ResNet training changes (update benchmark) (#2390)
* default arg for chunk

* bring back to_

* good changes

* new set

* unused hash

* fix optim

* new torch loader

* fix test lr scheduler
2023-11-22 17:41:12 -08:00
George Hotz
de5d603ec1 corealize + remove realize from lazybuffer (#1968)
* corealize + remove realize from lazybuffer

* fix multigpu

* fix graph
2023-10-04 10:59:31 -07:00
Yixiang Gao
a8f2c16f8e add contiguous (#1246) 2023-07-15 08:36:34 -07:00
Kunwar Raj Singh
8391648822 Over 90% on CIFAR with examples/hlb_cifar10.py (#1073)
* fix eval, lr decay, best eval

* 82.27

* 82.64

* 82.79, reproducable

* add lr sched, 85.26

* 87.42

* 87.94

* 87.42

* tta with flip

* training flip aug

* refactor

* using Tensor for LR is faster

* 89.5

* refactor, flip only train set

* 90.01

* 90.64

* eval jit

* refactor

* only JIT model

* fix eval JIT

* fix eval JIT

* 90.82

* STEPS=900 reaches 90.22

* TTA envvar

* TTA default 0

* fully jit training

* refactor optim

* fix sched

* add label smoothing

* param changes

* patial gelu

* OneCycle with pause

* gelu maybe works

* 90.12

* remove pause lr

* maybe fix lr schedulers

* scheduler test passing

* comments

* try mixup

* shuffle!

* add back the missing last eval

* fix shuffle bugs

* add mixup prob

* fix mixup prob

* 90.19

* correct mixup

* correct mixup

* correct mixup

* 90.24

* 90.33

* refactor, add type hints

* add gradient clipping

* maybe fix test

* full JIT

* back to relu for now

* pass mixup prob as param

* add typehints

* maybe CI works

* try erf gelu

* CI, types

* remove useless import/

* refactor optim

* refactor optim

* try leakyrelu

* try celu

* gelu

* 90.67

* remove grad clip

* remove grad clip tests

* revert params

* add test for OneCycleLR

* 90.62

* fix eval timing

* fix eval timing again

* so where i calculate mixup_prob matters

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-06 20:46:22 -07:00
Reza Rezvan
8ae9a054ae Refactor nn.optim (#1091)
* Refactor: nn.optim.py

* Refactor: nn.optim.py; Fix all tests

* Refactor: Replace all optim.get_parameters()

* Refactor: Revert list comp.

* Refactor: Replace optim.get_state_dict

* Refactor: Change quickstart.md
2023-07-02 15:07:30 -07:00
Casey Primozic
52b7105f87 Dedup params in Optimizer (#1047)
* Dedup params in optimizer

 * Passing the same tensor multiple times in the set of learnable params passed to optimizers can result in models completely failing to learn, but no errors are produced.  This dedups tensors to avoid the problem.

* Fix types

* Use new variable to satisfy linter

* Use `helpers.dedup` instead of `set()` to dedup params

* Add test for duped params in optimizers
2023-06-26 00:49:23 -07:00
George Hotz
ed1963b899 Fast DiskTensor to other Tensor (#916)
* make disktensors fast

* loading

* loader for sd and llama
2023-06-03 12:25:41 -07:00
George Hotz
791530045d Refactor LoadOps (#910)
* test

* work

* upd test

* loadops

* cleanups

* real ones

* remove LazyNumpyArray

* fix assign test

* remove range

* np.require

* llama uses arange kernels

* no caching consts

* fix enet

* torch load support

* tests cleanup

* fix shufflenet

* fix image

* fix torch_load test
2023-06-03 09:40:43 -07:00
wozeparrot
bfea5215e9 Add weight decay to SGD (#883)
* feat: add weight decay to sgd

* fix: fix tests
2023-06-01 13:13:18 -07:00
wozeparrot
8c6085a715 Rewrite Adam/W as functions of LAMB (#839)
* feat: rewrite adam/w as functions of lamb

* feat: use adam style adam update + comment

* fix: nvm need to use the lamb adam update
2023-05-29 09:21:35 -07:00
wozeparrot
7460bd9b02 Add LAMB optimizer (#821)
* feat: initial lamb optimizer

* feat: corrently match tf impl and add test
2023-05-28 15:09:05 -07:00
George Hotz
9fb3f9ace3 Revert "move t.grad realize on SGD"
This reverts commit ccdc0290d6.
2023-04-18 17:50:08 -07:00
George Hotz
e93e04ed6e Revert "huh...this is faster"
This reverts commit aedd4685fa.
2023-04-18 17:50:07 -07:00
George Hotz
aedd4685fa huh...this is faster 2023-04-18 17:36:31 -07:00
George Hotz
ccdc0290d6 move t.grad realize on SGD 2023-04-18 16:47:51 -07:00
George Hotz
30b795874a remove RMSprop, nobody uses it anymore 2023-03-20 12:31:34 -07:00
Cyril Roumégous
b629fd4cd8 add AdamW optimizer (#716)
* add AdamW optimizer

* one liner Adam optimizer
2023-03-19 12:51:06 -07:00
George Hotz
58d3824cbe better get_state_dict 2023-03-12 00:10:48 -08:00
George Hotz
046b3952c3 get_state_dict 2023-03-11 23:46:53 -08:00
George Hotz
305b9f2d21 multistep optim tests passing 2023-03-11 17:49:53 -08:00
Cyril Roumégous
3f08613a2a apply flake8 E203 rule (#684) 2023-03-11 11:35:16 -08:00
Cyril Roumégous
c10131ddf5 reduce number of lines (#645) 2023-03-05 15:42:32 -08:00
George Hotz
643e8b0388 fix tests, test bn evaluate too 2023-02-27 10:39:47 -08:00
George Hotz
2f17d151b3 fix batchnorm not realizing 2023-02-27 10:19:54 -08:00
George Hotz
9152bb5b4a momentum support in SGD 2023-02-11 10:22:37 -08:00
George Hotz
51037815b9 add comment so we don't remove self.t tensor again 2023-02-10 23:07:07 -06:00
George Hotz
c0ea538ba0 Revert "revert t as tensor, constant folding should be done better"
This reverts commit 1d800a94ad.
2023-02-10 23:06:00 -06:00
George Hotz
1d800a94ad revert t as tensor, constant folding should be done better 2023-02-10 22:58:39 -06:00
George Hotz
a5a55ac19e GlobalCounters cache + assign in optim 2023-02-08 17:10:55 -06:00
George Hotz
2e1bdc889a write out all the functions, no auto binding (#543)
* write out all the functions, no auto binding

* cleanups, more types

* Slice is for internal calls only

* improve typing

* ugh, put slice back
2023-02-08 12:41:39 -06:00
George Hotz
d854337f0d nn/optim.py compiles now 2023-02-08 11:25:18 -06:00
Andrey
4977d6f225 using tuples in isinstance (#534) 2023-02-06 14:40:26 -06:00
George Hotz
f7291f6ca3 fixes big KOPT, breaks opencl (#505)
* fixes big KOPT, breaks opencl

* fix optimizer

* KernelCache

* oops, broke batchnorm

* hack to fix it

* fix llvm, less hacky gpu

* disable the cache

* cache just breaks things
2023-02-05 10:46:17 -08:00
George Hotz
ff11c4316b move get_parameters to optim.py 2022-09-25 13:16:58 -04:00
George Hotz
acae9a20c1 clipnorm support 2022-09-24 13:26:38 -04:00
George Hotz
271446e3eb set requires_grad to None (#387)
* set requires_grad to None

* some things need gradients

* hmm, why was get_parameters filtering
2022-09-21 11:16:02 -04:00
George Hotz
f215534a64 1100 lines, but sane linter rules 2022-09-06 13:47:45 -07:00
George Hotz
2ed3bb6223 clip model is running 2022-09-05 11:26:32 -07:00
George Hotz
7f15779942 t.assign in optim 2022-08-20 14:04:33 -07:00
George Hotz
b132de677d tinygrad.nn (#367)
* tinygrad.nn

* flake8

* working on pylint

* more pylint

* more pylint

* pylint passes

* networkx

* mypy can't infer that type

* junk
2022-08-18 07:41:00 -07:00