Commit Graph

99 Commits

Author SHA1 Message Date
Francis Lata
99efa2cfde Merge branch 'master' into retinanet_mlperf 2024-11-18 04:42:57 -08:00
geohotstan
f8056a74d6 combine pad2d with pad (#7677)
* I have pad2d, I have pad, uuh~, pad2dpad~

* fix some small things

* strategically placed cast hack

* fix more

* fix more more

* tests

* periods
2024-11-14 17:56:02 +08:00
Francis Lata
0aad640465 Merge branch 'master' into retinanet_mlperf 2024-11-12 02:45:23 -08:00
Ahmed Harmouche
9c63c3d8ab These casts should only happen if these are supported (#7644) 2024-11-12 07:56:50 +08:00
Francis Lata
bb6f27d2f3 Merge branch 'master' into retinanet_mlperf 2024-11-04 19:19:22 -08:00
chenyu
fb694a63eb Tensor.erf (#7419)
the same one used in onnx and the one in bert.
2024-10-30 18:12:28 -04:00
eliotgolding
e920f1d663 Llama 3.2 1B load from GGUF (#7295)
* gguf 1b-instruct

* not needed
2024-10-27 09:29:02 +08:00
Francis Lata
498141c579 Merge branch 'master' into retinanet_mlperf 2024-10-16 10:14:39 -04:00
George Hotz
3169cb386d remove graph [pr] (#7085) 2024-10-16 11:40:07 +08:00
Francis Lata
1295a3020f Merge branch 'master' into retinanet_mlperf 2024-10-11 23:08:17 -04:00
Francis Lata
b802f74cee add dataloader 2024-10-11 23:04:21 -04:00
Tobias Fischer
f9e32f2bb2 clip device fix (#6924) 2024-10-07 00:47:32 +08:00
chenyu
01a2d7316d dtype=float in bert log_softmax for loss and accuracy (#6916) 2024-10-06 11:15:56 -04:00
Francis Lata
12c4259b16 Merge branch 'master' into retinanet_mlperf 2024-10-05 16:22:58 -07:00
George Hotz
8ca506ee37 remove the magic methods for moving between devices [pr] (#6881)
* remove the magic methods for moving between devices [pr]

* remove unneeded clang
2024-10-04 20:27:52 +08:00
George Hotz
f4ec39fe58 switch symbolic from old to uops, final PR (#6872)
* switch symbolic from old to uops, final PR

* two wrong answers

* not needed resolves

* symbolic ops passes

* symbolic ops passes

* progress

* tests pass (almost)

* fix last test

* fix some tests

* global binding and unbinding

* Revert "global binding and unbinding"

This reverts commit 9456725630.

* that test works now

* vars on uop doesn't recurse

* fix fuzzer

* update

* fix type

* fix gpt, it's UOp now

* ssimplify symbolics
2024-10-04 16:42:27 +08:00
chenyu
c3c93f332a symbolic bool raise ValueError when not sure [pr] (#6853) 2024-10-02 09:10:58 -04:00
Francis Lata
b8e24b4f4d recursively go through backbone layers to freeze them 2024-10-02 04:21:44 -07:00
Francis Lata
7432cd7857 Merge branch 'master' into retinanet_mlperf 2024-10-01 19:26:35 -07:00
Tobias Fischer
33f7599158 Compute FID Score (#6802)
* compute fid score code

* cleaner s1 and m1 loading
2024-10-01 19:47:58 -04:00
Francis Lata
a879d19c82 update initializers for model 2024-09-27 13:25:28 -07:00
Francis Lata
ea05de325c Merge branch 'master' into retinanet_mlperf 2024-09-26 02:20:28 -07:00
chenyu
396c96357b update mlperf bert scripts (#6755)
removed DISABLE_DROPOUT=1.
updated BS to 54 that works on tinyboxes with dropouts.
used bert's sparse_categorical_crossentropy that takes Tensor ignore_index in accuracy method
2024-09-25 23:55:05 -04:00
Francis Lata
b7a8de1a4e Merge branch 'master' into retinanet_mlperf 2024-09-25 10:57:32 -07:00
Francis Lata
48e1caf8d2 generate_anchors in tinygrad 2024-09-24 02:37:19 -07:00
samm393
19c11792fd Flux.1 (#6334)
* initial commit

* whitespace

* get rid of torch import

* indentation

* less hardcoding

* add flux.1-dev

* jit

* no double

* t5 tidy up

* validation image

* reuse sdxl autoencoder

* typing changes

* empty lines

* remove unneeded comments

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-24 10:08:04 +08:00
Tobias Fischer
c1bbd15bd9 Sharded SDXL Inference (#6328)
* initial sharding fixes

* sigma device fix

* emptyline space fix

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-21 01:26:43 -04:00
George Hotz
8f6d0485e7 hotfix: resnet to obj.device 2024-09-06 13:06:02 +08:00
George Hotz
9d72119a0c minor resnet cleanups (#6382)
* minor resnet cleanups

* that should have been long

* jit

* meh
2024-09-06 12:50:21 +08:00
Tobias Fischer
3517aa89d9 sdxl batched inference fixes (#6293) 2024-08-28 07:44:58 -04:00
Tobias Fischer
211bfb6d8a fixed batched clip computation (#6292) 2024-08-26 20:48:15 -04:00
Tobias Fischer
331b0f5477 new clip gather (#6277) 2024-08-25 19:27:24 -04:00
chenyu
e6c7c3e499 update pylint path to check indent/space for all (#6022)
also fixed many errors. it was not checking nested dirs. exclude autogen for now.

can we use ruff for this?
2024-08-10 14:41:09 -04:00
wozeparrot
d269bc95fa faster tinychat (#5993) 2024-08-08 19:16:26 -07:00
George Hotz
bf8ec23b00 hotfix: contiguous on precompute_freqs_cis 2024-08-07 14:40:56 -07:00
David Hou
9a485f36e4 shard kvcache (#5830) 2024-07-30 20:29:54 -07:00
George Hotz
4e89d45513 hotfix: put contiguous back in llama 2024-07-30 18:43:48 -07:00
George Hotz
21c5e8e1b7 extreme llama speed, 57.34 tok/s (#5827)
* extreme llama speed

* mergable
2024-07-30 18:32:09 -07:00
Tobias Fischer
72da3fe7e6 added clip vision model (#5595)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-19 18:35:51 -04:00
Tobias Fischer
85d4ca7caa FID Inception Model (#5516)
* added model impl

* minor cleanups

* extracted weights loading into from_pretrained

* reorganized model for better weight loading

* removed lru cache for state dict loading
2024-07-16 23:12:03 -04:00
wozeparrot
fa873df9c1 bring tinychat more inline with tinyos' version (#5358) 2024-07-10 13:13:52 -07:00
Tobias Fischer
0c3a35e5c2 Stable Diffusion v2 Inference (#5283)
* model implementation

* clip fix, more qol options
2024-07-03 22:47:10 -04:00
chenyu
b2c3a28a5e nn.RMSNorm (#5272)
the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize
2024-07-02 21:39:01 -04:00
Tobias Fischer
8c9c1cf62f Pulled CLIP and UNet into Seperate Files (#5253)
* pulled clip and unet into seperate files

* reference cleanup, lru cache fix

* better pool indexing
2024-07-01 22:33:01 -04:00
George Hotz
14980f79dd hotfix: unbreak llama 2024-06-30 15:27:54 -07:00
George Hotz
3df47bc21e OpenELM + repeat_interleave (#5234)
* start writing openelm

* progress...hit bug

* repeat_interleave support

* gqa

* add rotary embedding

* spp

* i think it runs correctly

* broken

* output is good now

* cleanups

* no io_uring on android
2024-06-30 15:18:39 -07:00
reddyn12
f1c7944c44 Fix batchnorm shapes for resnet.load_pretrained (#5167)
* Fix batchnorm shapes

* make it general reshape
2024-06-26 18:44:10 -04:00
chenyu
e468601226 update llama attention casting (#5096)
* update llama attention casting

updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention.

* fix that
2024-06-22 10:57:17 -04:00
chenyu
8bd6cb9511 update llama model RMSNorm casting (#5095)
following the original implementation, cast back to input dtype before multiplying weight. slightly faster
https://github.com/meta-llama/llama/blob/main/llama/model.py
2024-06-21 23:02:04 -04:00
chenyu
e2c5054bdd update resnet.load_from_pretrained (#5040) 2024-06-18 16:29:22 -04:00