Yixiang Gao
|
8d6662a741
|
.cpu().numpy() -> .numpy() (#1594)
* .cpu().numpy() -> .numpy()
* restore ops_torch
* restore test_speed_v_torch
|
2023-08-21 09:53:29 -07:00 |
|
Jacky Lee
|
2e85fce068
|
Transformer: use Tensor.scaled_dot_product_attention (#1520)
|
2023-08-11 09:00:37 -07:00 |
|
JudeDavis1
|
f3168ee69b
|
default transformer dropout to 0 (#828)
* default mha dropout to 0
* simplify assert
* reform
* default to 0.1
|
2023-05-29 08:06:16 -07:00 |
|
George Hotz
|
1a039306d2
|
good changes from llama branch (#671)
* good changes from llama
* transpose behavior changed
|
2023-03-09 20:51:22 -08:00 |
|
George Hotz
|
2e56a4793e
|
rename log_softmax, support dim, fix onnx Softmax
|
2023-02-24 10:11:24 -08:00 |
|
Jacky Lee
|
cb679cd051
|
Fix weight initialization (#566)
* Fix weight initialization
* Use scaled_uniform in serious_mnist
|
2023-02-19 11:25:29 -08:00 |
|
Kirill
|
7944cfdadc
|
Remove Tensor.data (#565)
|
2023-02-18 16:36:12 -08:00 |
|
George Hotz
|
c8b569a8c7
|
cleaner comments
|
2022-05-14 21:28:39 -07:00 |
|
George Hotz
|
d31ef0ae48
|
make vit names match pytorch
|
2021-11-30 11:34:14 -05:00 |
|
George Hotz
|
4b7c31b5b7
|
break vit into it's own file
|
2021-11-30 11:19:22 -05:00 |
|
George Hotz
|
46bbbcf7f0
|
model touchups
|
2021-11-30 11:13:34 -05:00 |
|
George Hotz
|
835869974c
|
clean up vit code
|
2021-11-30 10:58:03 -05:00 |
|
George Hotz
|
c39824bc62
|
oops, forgot some stars
|
2021-11-30 00:46:14 -05:00 |
|
George Hotz
|
bd21304e3c
|
linear takes in weight and bias
|
2021-11-30 00:38:47 -05:00 |
|
George Hotz
|
535f02cc64
|
use sequential
|
2021-11-30 00:25:39 -05:00 |
|
George Hotz
|
de938c2d9d
|
vit is now tested
|
2021-11-30 00:23:06 -05:00 |
|
George Hotz
|
aff810e722
|
unify transformer block
|
2021-11-29 18:58:15 -05:00 |
|
George Hotz
|
58ed46963e
|
fix broadcastdot
|
2021-11-29 18:54:57 -05:00 |
|
George Hotz
|
dca076dbf1
|
remove dumb nn ops
|
2021-11-29 18:05:31 -05:00 |
|
George Hotz
|
8097b8f7d6
|
vit works
|
2021-11-29 16:28:14 -05:00 |
|
George Hotz
|
f909ab194f
|
gelu with broken test
|
2021-11-29 15:00:50 -05:00 |
|
George Hotz
|
1eafa5580e
|
layernorm with learnable parameters
|
2021-11-29 13:03:57 -05:00 |
|
George Hotz
|
c7f795ca1e
|
added dot affine
|
2021-11-29 12:55:56 -05:00 |
|
George Hotz
|
30eb3afbe1
|
add bias term to transformer
|
2021-11-29 12:45:27 -05:00 |
|
George Hotz
|
99b6051467
|
add ff_dim to transformer
|
2021-11-29 12:40:52 -05:00 |
|
George Hotz
|
d3f169b267
|
move good models to models, add a training step test
|
2021-06-19 11:24:15 -07:00 |
|