George Hotz
14b613e281
add STEPS to beautiful_mnist
2024-08-10 15:23:44 -07:00
wozeparrot
d269bc95fa
faster tinychat ( #5993 )
2024-08-08 19:16:26 -07:00
Elias Wahl
c9b4602854
no load in INITMLPERF ( #5957 )
2024-08-08 11:28:24 -04:00
Elias Wahl
c9862e17d4
MLPERF BERT submission scripts ( #5931 )
...
* green
* red
* fix benchmark
* log
* count train samples
* oops. 4.0 -> 4.1
* note to todo
* no pillow
2024-08-06 18:09:18 -04:00
chenyu
1dab75ae37
clean up mlperf dataloader import ( #5940 )
...
use tinygrad tqdm for dataset, and PIL Image is only needed for resnet
2024-08-06 17:10:08 -04:00
George Hotz
e077bc7baf
move memory planner to realize ( #5937 )
2024-08-06 10:41:29 -07:00
Elias Wahl
937bf5fe12
better hparam ( #5891 )
2024-08-03 12:38:53 -04:00
Elias Wahl
4a114756f6
New BERT dataloader ( #5881 )
...
* One file == One topic
* update test
* new dataloader
* update train script
* get index is faster
2024-08-02 15:12:23 -04:00
David Hou
9a485f36e4
shard kvcache ( #5830 )
2024-07-30 20:29:54 -07:00
George Hotz
21c5e8e1b7
extreme llama speed, 57.34 tok/s ( #5827 )
...
* extreme llama speed
* mergable
2024-07-30 18:32:09 -07:00
Francis Lata
a0baff7a3d
update dataloader script example ( #5818 )
2024-07-30 15:18:29 -04:00
wozeparrot
eebb1b9922
feat: temperature 0 llama3 benchmark ( #5806 )
2024-07-30 12:05:36 -07:00
wozeparrot
639af3f823
llama3 temperature flag ( #5803 )
2024-07-29 16:33:51 -07:00
George Hotz
8b34ee2f52
remove global_size and local_size from Kernel class [run_process_replay] ( #5720 )
...
* remove global_size and local_size from Kernel class [run_process_replay]
* sizes from the prg
2024-07-25 13:55:08 -07:00
George Hotz
7f5282b2f5
tests if the linearizer is generating dumb code ( #5611 )
...
* tests if the linearizer is generating dumb code
* push consts to the end
* sort adds
* sorted add and mul
* this better
* simple expand/contract
* no math contract/expand
2024-07-20 20:36:32 -07:00
George Hotz
b399ccd6ef
BEAM bugfix, kernels dedup now ( #5617 )
...
* BEAM bugfix, kernels dedup now
* getenv is default
2024-07-20 19:43:50 -07:00
chenyu
d71308ed68
copy mlperf 4.0 to mlperf 4.1 ( #5614 )
2024-07-20 16:12:00 -04:00
George Hotz
1113e47f96
print best in MCTS + light up the winner in hcopt
2024-07-20 09:39:36 -07:00
George Hotz
06e336bccb
mcts search ( #5598 )
...
* mcts search
* mcts cleanups
* mcts cleanup
* random shuffle children order
* mcts in handcode_opt
* src and remove_node
* debug 3 to print ast
* print the type
* mcts in extra
2024-07-19 21:38:39 -07:00
George Hotz
0ad87021e2
move acc to end ( #5568 )
...
* move acc to end
* confirmed pictures are the same
* relax that
* Update test_ops.py
2024-07-19 03:06:52 -07:00
George Hotz
2de82b8a5d
remove get_lazyop_info ( #5570 )
...
* don't use get_lazyop_info more
* keep that min
* no ptx for that test
2024-07-19 03:05:33 -07:00
kormann
2c4add6844
pretty print lazy op per default ( #5505 )
...
* pretty lop
* min diff
* walrus
* fix
* min diff
* simplify
* pretty helper function
* ws
* pretty uop upat
* tests
* stricter tests
* test passes
* ws
* stronger upat test
* delete print_tree
* min diff
* stricter exp test
* fix merge
* stronger uops eval test
* +readable and deep upat test
* +readable and deep upat test
* sort inv fix
* fix
* revert allowed_len
2024-07-18 09:34:08 -07:00
George Hotz
fa7e734b49
MetaOps.KERNEL ( #5543 )
2024-07-17 19:41:23 -07:00
chenyu
4193095f67
fix handcode_opt.py with DEBUG=2 ( #5530 )
...
only one ast per kernel now
2024-07-17 14:50:47 -04:00
George Hotz
a9f5a764dc
make BatchNorm work for 2D and 3D ( #5477 )
...
* make BatchNorm work for 2D and 3D
* beautiful mnist shouldn't use BatchNorm2d
2024-07-14 11:39:58 -07:00
George Hotz
aade18d20c
beautiful_mnist in torch
2024-07-14 11:09:58 -07:00
George Hotz
cdf63e41bf
mnist mlx example uses compile to be fair to tinyjit
2024-07-13 18:14:45 -07:00
George Hotz
8940530290
add mlx beautiful_mnist example
2024-07-13 17:55:47 -07:00
chenyu
28972418c4
s/get_linearizer/get_kernel [run_process_replay] ( #5467 )
2024-07-13 20:32:22 -04:00
Francis Lata
0345577032
UNet3D dataloader shared memory fix ( #5465 )
...
* create separate SharedMemory between inputs and labels
* update path check for shared mem
* clean up unit test for dataset
2024-07-13 20:26:00 -04:00
chenyu
4df63da190
clean up rest of the loadop [run_process_replay] ( #5440 )
...
to metaop and filter_sink
2024-07-12 23:38:51 -04:00
George Hotz
03c2dc8bd7
lowerer is kernel [run_process_replay] ( #5437 )
2024-07-12 18:50:55 -07:00
chenyu
9a187e6102
fix handcode_opt script ( #5435 )
...
* fix handcode_opt script
* run in ci
* real run in ci
* HALF=0
2024-07-12 20:52:28 -04:00
George Hotz
870dc8c350
s/Linearizer/Lowerer [run_process_replay] ( #5428 )
2024-07-12 15:54:07 -07:00
George Hotz
6707c778d0
scheduleitem is not Tuple [run_process_replay] ( #5425 )
...
* scheduleitem is not Tuple [run_process_replay]
* fix tests
* fix op + fuzzers
* fix mop test
2024-07-12 15:13:19 -07:00
George Hotz
f6ef283e6a
s/loadops/metaops [run_process_replay] ( #5421 )
2024-07-12 13:26:50 -07:00
wozeparrot
d1cbd6bb95
unity handcode_resnet_opt and handcode_bert_opt ( #5418 )
2024-07-12 12:05:01 -07:00
wozeparrot
b7cc75a9df
usage summary in handcode opt ( #5414 )
2024-07-12 11:21:18 -07:00
George Hotz
8390feb7b9
optim.OptimizerGroup in hlb_cifar ( #5401 )
2024-07-11 20:14:36 -07:00
wozeparrot
c24d495ef9
metadata in handcode_opt ( #5400 )
2024-07-11 17:45:34 -07:00
George Hotz
5232e405ce
hotfix: add BS to beautiful_mnist
2024-07-11 10:55:05 -07:00
wozeparrot
c9b3ae6bbf
fix llama.py chat mode assert ( #5366 )
2024-07-10 18:06:14 -07:00
wozeparrot
fa873df9c1
bring tinychat more inline with tinyos' version ( #5358 )
2024-07-10 13:13:52 -07:00
chenyu
322c37e621
use helpers.JIT in llama and gpt2 examples ( #5350 )
...
* use helpers.JIT in llama and gpt2 examples
replaced getenv("JIT"), effectively made gpt2 default jit
* fix test_gpt2
2024-07-09 15:04:43 -04:00
Elias Wahl
73bddc44f6
Fix fake dataloader ( #5326 )
2024-07-08 09:07:44 -04:00
chenyu
43c3f73fbc
handcode_bert_opt.py ( #5295 )
...
similar to handcode_resnet50_opt.py, one file to check bert kernels without dataset.
2024-07-05 11:01:20 -04:00
Tobias Fischer
0c3a35e5c2
Stable Diffusion v2 Inference ( #5283 )
...
* model implementation
* clip fix, more qol options
2024-07-03 22:47:10 -04:00
reddyn12
d3e244d8b7
prev speed improvements ( #5252 )
...
Co-authored-by: reddyn <nikidsniper@gmail.com >
2024-07-03 09:06:01 -07:00
chenyu
191463a919
add timing to SDXL ( #5273 )
2024-07-02 23:29:54 -04:00
chenyu
b2c3a28a5e
nn.RMSNorm ( #5272 )
...
the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize
2024-07-02 21:39:01 -04:00