Szymon Ożóg
6c36264790
Improve type hints for optimizer ( #3583 )
...
* Improve type hints for optimizer
* lint fix
2024-03-02 07:35:44 -08:00
George Hotz
e0ecab3797
touchups from multibuffer branch ( #2958 )
2024-01-01 11:33:41 -08:00
George Hotz
e1861ab65e
remove realize from optimizer ( #2880 )
...
* remove realize from optimizer
* one still needed
* opt realize
2023-12-20 16:42:41 -08:00
George Hotz
cbb8486779
ResNet training changes (update benchmark) ( #2390 )
...
* default arg for chunk
* bring back to_
* good changes
* new set
* unused hash
* fix optim
* new torch loader
* fix test lr scheduler
2023-11-22 17:41:12 -08:00
George Hotz
de5d603ec1
corealize + remove realize from lazybuffer ( #1968 )
...
* corealize + remove realize from lazybuffer
* fix multigpu
* fix graph
2023-10-04 10:59:31 -07:00
Yixiang Gao
a8f2c16f8e
add contiguous ( #1246 )
2023-07-15 08:36:34 -07:00
Kunwar Raj Singh
8391648822
Over 90% on CIFAR with examples/hlb_cifar10.py ( #1073 )
...
* fix eval, lr decay, best eval
* 82.27
* 82.64
* 82.79, reproducable
* add lr sched, 85.26
* 87.42
* 87.94
* 87.42
* tta with flip
* training flip aug
* refactor
* using Tensor for LR is faster
* 89.5
* refactor, flip only train set
* 90.01
* 90.64
* eval jit
* refactor
* only JIT model
* fix eval JIT
* fix eval JIT
* 90.82
* STEPS=900 reaches 90.22
* TTA envvar
* TTA default 0
* fully jit training
* refactor optim
* fix sched
* add label smoothing
* param changes
* patial gelu
* OneCycle with pause
* gelu maybe works
* 90.12
* remove pause lr
* maybe fix lr schedulers
* scheduler test passing
* comments
* try mixup
* shuffle!
* add back the missing last eval
* fix shuffle bugs
* add mixup prob
* fix mixup prob
* 90.19
* correct mixup
* correct mixup
* correct mixup
* 90.24
* 90.33
* refactor, add type hints
* add gradient clipping
* maybe fix test
* full JIT
* back to relu for now
* pass mixup prob as param
* add typehints
* maybe CI works
* try erf gelu
* CI, types
* remove useless import/
* refactor optim
* refactor optim
* try leakyrelu
* try celu
* gelu
* 90.67
* remove grad clip
* remove grad clip tests
* revert params
* add test for OneCycleLR
* 90.62
* fix eval timing
* fix eval timing again
* so where i calculate mixup_prob matters
---------
Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain >
2023-07-06 20:46:22 -07:00
Reza Rezvan
8ae9a054ae
Refactor nn.optim ( #1091 )
...
* Refactor: nn.optim.py
* Refactor: nn.optim.py; Fix all tests
* Refactor: Replace all optim.get_parameters()
* Refactor: Revert list comp.
* Refactor: Replace optim.get_state_dict
* Refactor: Change quickstart.md
2023-07-02 15:07:30 -07:00
Casey Primozic
52b7105f87
Dedup params in Optimizer ( #1047 )
...
* Dedup params in optimizer
* Passing the same tensor multiple times in the set of learnable params passed to optimizers can result in models completely failing to learn, but no errors are produced. This dedups tensors to avoid the problem.
* Fix types
* Use new variable to satisfy linter
* Use `helpers.dedup` instead of `set()` to dedup params
* Add test for duped params in optimizers
2023-06-26 00:49:23 -07:00
George Hotz
ed1963b899
Fast DiskTensor to other Tensor ( #916 )
...
* make disktensors fast
* loading
* loader for sd and llama
2023-06-03 12:25:41 -07:00
George Hotz
791530045d
Refactor LoadOps ( #910 )
...
* test
* work
* upd test
* loadops
* cleanups
* real ones
* remove LazyNumpyArray
* fix assign test
* remove range
* np.require
* llama uses arange kernels
* no caching consts
* fix enet
* torch load support
* tests cleanup
* fix shufflenet
* fix image
* fix torch_load test
2023-06-03 09:40:43 -07:00
wozeparrot
bfea5215e9
Add weight decay to SGD ( #883 )
...
* feat: add weight decay to sgd
* fix: fix tests
2023-06-01 13:13:18 -07:00
wozeparrot
8c6085a715
Rewrite Adam/W as functions of LAMB ( #839 )
...
* feat: rewrite adam/w as functions of lamb
* feat: use adam style adam update + comment
* fix: nvm need to use the lamb adam update
2023-05-29 09:21:35 -07:00
wozeparrot
7460bd9b02
Add LAMB optimizer ( #821 )
...
* feat: initial lamb optimizer
* feat: corrently match tf impl and add test
2023-05-28 15:09:05 -07:00
George Hotz
9fb3f9ace3
Revert "move t.grad realize on SGD"
...
This reverts commit ccdc0290d6 .
2023-04-18 17:50:08 -07:00
George Hotz
e93e04ed6e
Revert "huh...this is faster"
...
This reverts commit aedd4685fa .
2023-04-18 17:50:07 -07:00
George Hotz
aedd4685fa
huh...this is faster
2023-04-18 17:36:31 -07:00
George Hotz
ccdc0290d6
move t.grad realize on SGD
2023-04-18 16:47:51 -07:00
George Hotz
30b795874a
remove RMSprop, nobody uses it anymore
2023-03-20 12:31:34 -07:00
Cyril Roumégous
b629fd4cd8
add AdamW optimizer ( #716 )
...
* add AdamW optimizer
* one liner Adam optimizer
2023-03-19 12:51:06 -07:00
George Hotz
58d3824cbe
better get_state_dict
2023-03-12 00:10:48 -08:00
George Hotz
046b3952c3
get_state_dict
2023-03-11 23:46:53 -08:00
George Hotz
305b9f2d21
multistep optim tests passing
2023-03-11 17:49:53 -08:00
Cyril Roumégous
3f08613a2a
apply flake8 E203 rule ( #684 )
2023-03-11 11:35:16 -08:00
Cyril Roumégous
c10131ddf5
reduce number of lines ( #645 )
2023-03-05 15:42:32 -08:00
George Hotz
643e8b0388
fix tests, test bn evaluate too
2023-02-27 10:39:47 -08:00
George Hotz
2f17d151b3
fix batchnorm not realizing
2023-02-27 10:19:54 -08:00
George Hotz
9152bb5b4a
momentum support in SGD
2023-02-11 10:22:37 -08:00
George Hotz
51037815b9
add comment so we don't remove self.t tensor again
2023-02-10 23:07:07 -06:00
George Hotz
c0ea538ba0
Revert "revert t as tensor, constant folding should be done better"
...
This reverts commit 1d800a94ad .
2023-02-10 23:06:00 -06:00
George Hotz
1d800a94ad
revert t as tensor, constant folding should be done better
2023-02-10 22:58:39 -06:00
George Hotz
a5a55ac19e
GlobalCounters cache + assign in optim
2023-02-08 17:10:55 -06:00
George Hotz
2e1bdc889a
write out all the functions, no auto binding ( #543 )
...
* write out all the functions, no auto binding
* cleanups, more types
* Slice is for internal calls only
* improve typing
* ugh, put slice back
2023-02-08 12:41:39 -06:00
George Hotz
d854337f0d
nn/optim.py compiles now
2023-02-08 11:25:18 -06:00
Andrey
4977d6f225
using tuples in isinstance ( #534 )
2023-02-06 14:40:26 -06:00
George Hotz
f7291f6ca3
fixes big KOPT, breaks opencl ( #505 )
...
* fixes big KOPT, breaks opencl
* fix optimizer
* KernelCache
* oops, broke batchnorm
* hack to fix it
* fix llvm, less hacky gpu
* disable the cache
* cache just breaks things
2023-02-05 10:46:17 -08:00
George Hotz
ff11c4316b
move get_parameters to optim.py
2022-09-25 13:16:58 -04:00
George Hotz
acae9a20c1
clipnorm support
2022-09-24 13:26:38 -04:00
George Hotz
271446e3eb
set requires_grad to None ( #387 )
...
* set requires_grad to None
* some things need gradients
* hmm, why was get_parameters filtering
2022-09-21 11:16:02 -04:00
George Hotz
f215534a64
1100 lines, but sane linter rules
2022-09-06 13:47:45 -07:00
George Hotz
2ed3bb6223
clip model is running
2022-09-05 11:26:32 -07:00
George Hotz
7f15779942
t.assign in optim
2022-08-20 14:04:33 -07:00
George Hotz
b132de677d
tinygrad.nn ( #367 )
...
* tinygrad.nn
* flake8
* working on pylint
* more pylint
* more pylint
* pylint passes
* networkx
* mypy can't infer that type
* junk
2022-08-18 07:41:00 -07:00