Commit Graph

1207 Commits

Author SHA1 Message Date
George Hotz
14b613e281 add STEPS to beautiful_mnist 2024-08-10 15:23:44 -07:00
wozeparrot
d269bc95fa faster tinychat (#5993) 2024-08-08 19:16:26 -07:00
Elias Wahl
c9b4602854 no load in INITMLPERF (#5957) 2024-08-08 11:28:24 -04:00
Elias Wahl
c9862e17d4 MLPERF BERT submission scripts (#5931)
* green

* red

* fix benchmark

* log

* count train samples

* oops. 4.0 -> 4.1

* note to todo

* no pillow
2024-08-06 18:09:18 -04:00
chenyu
1dab75ae37 clean up mlperf dataloader import (#5940)
use tinygrad tqdm for dataset, and PIL Image is only needed for resnet
2024-08-06 17:10:08 -04:00
George Hotz
e077bc7baf move memory planner to realize (#5937) 2024-08-06 10:41:29 -07:00
Elias Wahl
937bf5fe12 better hparam (#5891) 2024-08-03 12:38:53 -04:00
Elias Wahl
4a114756f6 New BERT dataloader (#5881)
* One file == One topic

* update test

* new dataloader

* update train script

* get index is faster
2024-08-02 15:12:23 -04:00
David Hou
9a485f36e4 shard kvcache (#5830) 2024-07-30 20:29:54 -07:00
George Hotz
21c5e8e1b7 extreme llama speed, 57.34 tok/s (#5827)
* extreme llama speed

* mergable
2024-07-30 18:32:09 -07:00
Francis Lata
a0baff7a3d update dataloader script example (#5818) 2024-07-30 15:18:29 -04:00
wozeparrot
eebb1b9922 feat: temperature 0 llama3 benchmark (#5806) 2024-07-30 12:05:36 -07:00
wozeparrot
639af3f823 llama3 temperature flag (#5803) 2024-07-29 16:33:51 -07:00
George Hotz
8b34ee2f52 remove global_size and local_size from Kernel class [run_process_replay] (#5720)
* remove global_size and local_size from Kernel class [run_process_replay]

* sizes from the prg
2024-07-25 13:55:08 -07:00
George Hotz
7f5282b2f5 tests if the linearizer is generating dumb code (#5611)
* tests if the linearizer is generating dumb code

* push consts to the end

* sort adds

* sorted add and mul

* this better

* simple expand/contract

* no math contract/expand
2024-07-20 20:36:32 -07:00
George Hotz
b399ccd6ef BEAM bugfix, kernels dedup now (#5617)
* BEAM bugfix, kernels dedup now

* getenv is default
2024-07-20 19:43:50 -07:00
chenyu
d71308ed68 copy mlperf 4.0 to mlperf 4.1 (#5614) 2024-07-20 16:12:00 -04:00
George Hotz
1113e47f96 print best in MCTS + light up the winner in hcopt 2024-07-20 09:39:36 -07:00
George Hotz
06e336bccb mcts search (#5598)
* mcts search

* mcts cleanups

* mcts cleanup

* random shuffle children order

* mcts in handcode_opt

* src and remove_node

* debug 3 to print ast

* print the type

* mcts in extra
2024-07-19 21:38:39 -07:00
George Hotz
0ad87021e2 move acc to end (#5568)
* move acc to end

* confirmed pictures are the same

* relax that

* Update test_ops.py
2024-07-19 03:06:52 -07:00
George Hotz
2de82b8a5d remove get_lazyop_info (#5570)
* don't use get_lazyop_info more

* keep that min

* no ptx for that test
2024-07-19 03:05:33 -07:00
kormann
2c4add6844 pretty print lazy op per default (#5505)
* pretty lop

* min diff

* walrus

* fix

* min diff

* simplify

* pretty helper function

* ws

* pretty uop upat

* tests

* stricter tests

* test passes

* ws

* stronger upat test

* delete print_tree

* min diff

* stricter exp test

* fix merge

* stronger uops eval test

* +readable and deep upat test

* +readable and deep upat test

* sort inv fix

* fix

* revert allowed_len
2024-07-18 09:34:08 -07:00
George Hotz
fa7e734b49 MetaOps.KERNEL (#5543) 2024-07-17 19:41:23 -07:00
chenyu
4193095f67 fix handcode_opt.py with DEBUG=2 (#5530)
only one ast per kernel now
2024-07-17 14:50:47 -04:00
George Hotz
a9f5a764dc make BatchNorm work for 2D and 3D (#5477)
* make BatchNorm work for 2D and 3D

* beautiful mnist shouldn't use BatchNorm2d
2024-07-14 11:39:58 -07:00
George Hotz
aade18d20c beautiful_mnist in torch 2024-07-14 11:09:58 -07:00
George Hotz
cdf63e41bf mnist mlx example uses compile to be fair to tinyjit 2024-07-13 18:14:45 -07:00
George Hotz
8940530290 add mlx beautiful_mnist example 2024-07-13 17:55:47 -07:00
chenyu
28972418c4 s/get_linearizer/get_kernel [run_process_replay] (#5467) 2024-07-13 20:32:22 -04:00
Francis Lata
0345577032 UNet3D dataloader shared memory fix (#5465)
* create separate SharedMemory between inputs and labels

* update path check for shared mem

* clean up unit test for dataset
2024-07-13 20:26:00 -04:00
chenyu
4df63da190 clean up rest of the loadop [run_process_replay] (#5440)
to metaop and filter_sink
2024-07-12 23:38:51 -04:00
George Hotz
03c2dc8bd7 lowerer is kernel [run_process_replay] (#5437) 2024-07-12 18:50:55 -07:00
chenyu
9a187e6102 fix handcode_opt script (#5435)
* fix handcode_opt script

* run in ci

* real run in ci

* HALF=0
2024-07-12 20:52:28 -04:00
George Hotz
870dc8c350 s/Linearizer/Lowerer [run_process_replay] (#5428) 2024-07-12 15:54:07 -07:00
George Hotz
6707c778d0 scheduleitem is not Tuple [run_process_replay] (#5425)
* scheduleitem is not Tuple [run_process_replay]

* fix tests

* fix op + fuzzers

* fix mop test
2024-07-12 15:13:19 -07:00
George Hotz
f6ef283e6a s/loadops/metaops [run_process_replay] (#5421) 2024-07-12 13:26:50 -07:00
wozeparrot
d1cbd6bb95 unity handcode_resnet_opt and handcode_bert_opt (#5418) 2024-07-12 12:05:01 -07:00
wozeparrot
b7cc75a9df usage summary in handcode opt (#5414) 2024-07-12 11:21:18 -07:00
George Hotz
8390feb7b9 optim.OptimizerGroup in hlb_cifar (#5401) 2024-07-11 20:14:36 -07:00
wozeparrot
c24d495ef9 metadata in handcode_opt (#5400) 2024-07-11 17:45:34 -07:00
George Hotz
5232e405ce hotfix: add BS to beautiful_mnist 2024-07-11 10:55:05 -07:00
wozeparrot
c9b3ae6bbf fix llama.py chat mode assert (#5366) 2024-07-10 18:06:14 -07:00
wozeparrot
fa873df9c1 bring tinychat more inline with tinyos' version (#5358) 2024-07-10 13:13:52 -07:00
chenyu
322c37e621 use helpers.JIT in llama and gpt2 examples (#5350)
* use helpers.JIT in llama and gpt2 examples

replaced getenv("JIT"), effectively made gpt2 default jit

* fix test_gpt2
2024-07-09 15:04:43 -04:00
Elias Wahl
73bddc44f6 Fix fake dataloader (#5326) 2024-07-08 09:07:44 -04:00
chenyu
43c3f73fbc handcode_bert_opt.py (#5295)
similar to handcode_resnet50_opt.py, one file to check bert kernels without dataset.
2024-07-05 11:01:20 -04:00
Tobias Fischer
0c3a35e5c2 Stable Diffusion v2 Inference (#5283)
* model implementation

* clip fix, more qol options
2024-07-03 22:47:10 -04:00
reddyn12
d3e244d8b7 prev speed improvements (#5252)
Co-authored-by: reddyn <nikidsniper@gmail.com>
2024-07-03 09:06:01 -07:00
chenyu
191463a919 add timing to SDXL (#5273) 2024-07-02 23:29:54 -04:00
chenyu
b2c3a28a5e nn.RMSNorm (#5272)
the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize
2024-07-02 21:39:01 -04:00