George Hotz
b132de677d
tinygrad.nn ( #367 )
...
* tinygrad.nn
* flake8
* working on pylint
* more pylint
* more pylint
* pylint passes
* networkx
* mypy can't infer that type
* junk
2022-08-18 07:41:00 -07:00
George Hotz
99b6051467
add ff_dim to transformer
2021-11-29 12:40:52 -05:00
George Hotz
d3f169b267
move good models to models, add a training step test
2021-06-19 11:24:15 -07:00
Marcel Bischoff
42b4761025
transformer >99.98% test accuracy in ~30s ( #230 )
...
* transformer
* BS might divide len(Y_test)
* outoput when accuracy is high
* more readeable
* fixed loss in serious_mnist for new API
2021-01-02 07:45:09 -08:00
George Hotz
f9170505b3
if you like your transformers twice as slow, use the GPU
2020-12-29 17:14:23 -05:00
George Hotz
3f8e137b6f
extra/transformer
2020-12-29 14:14:00 -05:00
George Hotz
bcb3ceeca3
set training in functions
2020-12-28 22:45:46 -05:00
George Hotz
51bf164b72
dropout, training
2020-12-28 22:12:23 -05:00
George Hotz
7b8fee038d
it works! forgot the sqrt
2020-12-28 16:23:52 -05:00
George Hotz
1faf05ef67
ahh, it's better if i don't train the embedding
2020-12-28 16:07:02 -05:00
George Hotz
c3832e1bde
hmm, fix layernorm to not be batchnorm and it breaks
2020-12-28 13:06:21 -05:00
George Hotz
2e89e75dcb
layernorm fixes transformer instability
2020-12-28 12:58:15 -05:00
George Hotz
593233b668
log and exp are first class ops
2020-12-28 10:00:30 -05:00
Marcel Bischoff
ffff98db78
Evaluation in Transformers ( #218 )
...
* 2serious
* load/save
* fixing GPU
* added DEBUG
* needs BatchNorm or doesn't learn anything
* old file not needed
* added conv biases
* added extra/training.py and checkpoint
* assert in test only
* save
* padding
* num_classes
* checkpoint
* checkpoints for padding
* training was broken
* merge
* rotation augmentation
* more aug
* needs testing
* streamline augment, augment is fast thus bicubic
* tidying up
* transformer eval
2020-12-28 09:24:51 -05:00
George Hotz
65b07d2f4f
fix onehot embed
2020-12-27 18:50:38 -05:00
George Hotz
d864e1c71a
transformer is training
2020-12-27 18:46:32 -05:00
George Hotz
a361ef6861
fixup training loop
2020-12-27 18:35:56 -05:00
George Hotz
f15bec6dbc
make multidot work on CPU
2020-12-27 17:25:37 -05:00
George Hotz
131e04c90c
cpu only decorator
2020-12-27 17:18:55 -05:00
George Hotz
2f1b2c0a3b
add transpose, start on transformer
2020-12-27 16:59:12 -05:00