Commit Graph

547 Commits

Author SHA1 Message Date
George Hotz
c80884884e event driven hip (#3160)
* event driven hip

* simpler, src makes copy

* pass mypy
2024-01-18 14:35:18 -08:00
chenyu
e52a609240 make WINO a context var, and LATEWINO in hlb_cifar (#3161) 2024-01-17 20:21:26 -05:00
George Hotz
9cc2577a08 use hip events (#3157)
* use hip events

* cleanup
2024-01-17 10:39:57 -08:00
George Hotz
a72b1b6d65 sharding for llama (#3151)
* shard llama

* sharding works

* simpler

* simpler

* consume option

* disable that test

* save a line

---------

Co-authored-by: George Hotz <george@tinygrad.org>
2024-01-16 19:28:00 -08:00
chenyu
589c16756f hlb_cifar multi gpu training (#3150)
* cifar train with multi gpu

* GPUS=1 is noop
2024-01-16 14:38:45 -05:00
George Hotz
228f30b96a multitensor jit (#3149)
* initial multitensor jit support and tests

* Added graphs to multitensor jit and updated tests

* update unbind api

* fix set device, add TinyJit to resnet

* update_stats includes device

---------

Co-authored-by: ramenguy99 <ramenguy99@gmail.com>
2024-01-16 09:09:15 -08:00
chenyu
b9d470577c gelu -> quick_gelu in hlb_cifar (#3147)
89 -> 86 seconds, same eval acc
2024-01-16 02:03:37 -05:00
chenyu
ec5a212b0a modernize hlb_cifar (#3146)
* modernize hlb_cifar

do more things in Tensor space instead of numpy, clean up dtypes and use more Tensor methods.

* eigens are float64
2024-01-16 01:35:11 -05:00
chenyu
22920a7e55 add LATEBEAM to hlb_cifar (#3142)
still too slow to search on tinybox though
2024-01-15 23:26:03 -05:00
George Hotz
cec0a7bc37 use shard api to eval resnet fast (#3136)
* use shard api to eval resnet fast

* to supports shard

* test to in multitensor
2024-01-15 16:49:38 -08:00
George Hotz
a464909d79 fast resnet eval (#3135)
* fast resnet eval

* fix HIP multidevice graph

* neater expression for devices

* lines

* add decorator test
2024-01-15 14:15:18 -08:00
chenyu
79f4627fbc fix conversation: llama generates token not prob now (#3120) 2024-01-14 13:10:01 -05:00
chenyu
fb3f8f7597 move sample inside jit for beautiful_mnist (#3115)
also removed .realize() for jit functions since jit does it automatically now. a little more beautiful
2024-01-14 01:36:30 -05:00
chenyu
c3c35f9142 flag to profile mixtral - 1.7 tok/s now (#3104) 2024-01-12 18:54:27 -05:00
chenyu
f96fc6e9d4 fix gpt2 with empty prompt take 2 (#3102)
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:46:36 -05:00
chenyu
ca46d3541b Revert "fix gpt2 with empty prompt" (#3101) 2024-01-12 14:27:41 -05:00
chenyu
1d7f01bc6d fix gpt2 with empty prompt (#3100)
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:18:17 -05:00
chenyu
507e0afba0 fix onehot and jit in examples/transformer (#3073)
trained to 0.999 in < 6 seconds on M1 Max consistently
2024-01-10 02:22:41 -05:00
George Hotz
ae83733431 hotfix: examples/transformer.py 2024-01-09 19:28:09 -08:00
chenyu
f0d7ad8aaa fix gpt2 attention with start_pos = 0 (#3061)
* fix gpt2 attention with start_pos size 1

test cases taken from ll_transformer branch

* fix interpreted
2024-01-09 16:14:55 -05:00
George Hotz
655c6f61d3 St real size (#3046)
* track the size in the lazybuffer

* shapetracker real size

* lint
2024-01-08 14:44:53 -08:00
George Hotz
c003be7309 Revert "track size in shapetracker" (#3043)
* Revert "track size in shapetracker (#3026)"

This reverts commit a8ba1ac08f.

* st.size
2024-01-08 13:13:39 -08:00
George Hotz
c5a941d466 webgl backend in extra (#3041)
* WebGL WIP

* 84% of ops passing test

* tests passing 100%

* Cleanup, refactor

* Shave off some lines

* Work on dtypes

* TestOps at 100% again

* Efficient net shaders compile in browser webgl2

* Compile all efficientnet shaders in browser

* Create empty textures for tensor buffers

* Run program. Up next weight loading

* Exported WebGL model working

* Add tests, refactor

* Explicit cast alu for GLSL

* Fix CI tests

* WebGL efficientnet demo

* Compile and run yolov8 in browser

* Fix imports

* Simplify yolo compile

* Fix bool*bool and cast cmplt to float

* More tests

* Do std tests pass on CI?

* Skip std tests on CI

* Remove explicit_cast_alu hack, and solve it in code_for_op

* Move to new dtype-less alloc api

* Remove local size hack: optimize local_size only if device has local

* Remove glsl.py, and move content to cstyle

* dont_use_locals in opts

* Fix dtype tests

* type_map in CStyleLanguage

* Make core changes smaller, cleaner, refactor export_model and demo

* Skip pad_slice

* Simplify: render_const, render_conditional

* solve bool alu for other binops, cleaner ops_webgl

* Fix noopt hack

* Remove some skipIfs

* WebGL image hack

* type_names is a better name

* global_max

* Fix dtype import

* Fix type_names -> type_map

* Fix lint

* Remove webgpu, back to 5k lines (#3040)

* remove webgpu

* max 5000 lines

* revert those to master

* retain that cstyle

---------

Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>
2024-01-08 09:29:13 -08:00
George Hotz
cf2eea961c more beautiful_cartpole with exposed hparams 2024-01-07 17:41:09 -08:00
chenyu
fa707c81e5 move beautiful cartpole action sampling inside jit (#3028)
tested by getting 3 full scores in a row
2024-01-06 00:39:55 -05:00
George Hotz
ebb81e8f11 hotfix: st.size() -> st.size in llama 2024-01-05 20:18:52 -08:00
George Hotz
f432ec9c33 Bitcast hip fix + fix mixtral (#3022)
* fix bitcast in hip

* wrong dtype for precast, double COPY
2024-01-05 14:51:25 -08:00
chenyu
7c80b78be9 cleanup gpt2 build function (#3018) 2024-01-04 23:14:53 -05:00
chenyu
f88506e630 move gpt2/llama sampling inside the model call (#3013)
* move gpt2/llama sampling inside the model call

* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
chenyu
8524493748 minor gpt2 cleanup (#3012) 2024-01-04 13:53:18 -05:00
Yixiang Gao
8e1fd6ae9d test works 2024-01-03 07:22:01 -08:00
Yixiang Gao
4f89f8b73a make sure the old hyp breaks the test 2024-01-03 07:13:54 -08:00
Yixiang Gao
b753d280f7 move hyp out of the train so it can be imported 2024-01-02 15:56:17 -08:00
Yixiang Gao
2e4d9ad936 adjsut div factor to avoid underflow 2024-01-02 13:47:13 -08:00
chenyu
58d3d5030b vars_from_ast -> LazyOp.vars (#2965) 2024-01-01 18:12:38 -05:00
George Hotz
980f421442 hotfix: remove cast from beautiful_cartpole 2024-01-01 15:02:03 -08:00
George Hotz
a280cfe169 move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
c81ce9643d move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
chenyu
61e255d197 use max for gpt2 and llama (#2949)
not using argmax yet because there's a multinomial outside of function.
2023-12-28 23:26:00 -05:00
chenyu
2f67f1e580 remove obsolete TODO in beautiful_mnist (#2946)
the compiler error was due to `error: call to 'max' is ambiguous` when we have max(int, float) in kernel.
it was first fixed in 4380ccb1 the non fp32 math PR, and further solidified with dtype refactor
2023-12-28 17:09:23 -05:00
chenyu
50927defad s/lazydata.realized/lazydata.base.realized/g (#2914)
* s/lazydata.realized/lazydata.base.realized/g

* not that
2023-12-22 14:45:13 -05:00
chenyu
7dc3352877 increase stable diffusion validation threshold 1e-4 -> 3e-4 (#2897)
saw a flaky CI failure with 1.1e-4, and 3e-4 is a good number
2023-12-21 11:45:25 -05:00
George Hotz
64dded27f0 pad ops broke coder (#2881)
* pad ops broke coder

* that contiguous fixes it

* Update lazy.py
2023-12-20 17:03:41 -08:00
George Hotz
1765849937 new lazy, benchmark (#2878)
* lazy rewrite, try 2

* min fix tests

* pass contig test

* put broken pads back

* move that to realize

* no contig child fixes array packing

* so wrong

* now that's correct

* base children

* fix bind issues

* disable to_image_idx

* fix tests

* that failure shouldn't break other tests

* more fixes

* fix torch

* skip failing tests in CI

* 1e-7

* half is broken

* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
chenyu
857c35d256 make gpt2 decode output just once at the end (#2869)
also updated function name from greedy_until to generate, as it's not greedy nor until
2023-12-20 12:14:55 -05:00
chenyu
6d7e9e0a56 hotfix convert Y_train to int before passing into index (#2850) 2023-12-19 11:40:56 -05:00
chenyu
0723f26c80 dtypes.default_float and dtypes.default_int (#2824) 2023-12-18 12:21:44 -05:00
George Hotz
c6eb618013 tests from new lazy branch (#2774)
* tests from new lazy branch

* fix lin 11

* that was needed

* doesn't fail

* mark

* meant that

* llvm passes
2023-12-14 23:06:39 -08:00
chenyu
a044125c39 validate stable diffusion for seed 0 (#2773)
* validate stable diffusion for seed 0

the closest false positive i can get is with the setup and one less step. dist = 0.0036
same setup with fp16 has dist=5e-6.
so setting validation threshold to 1e-4 should be good

* run with --seed 0
2023-12-15 00:07:09 -05:00
chenyu
9afa8009c1 hot fix explicitly set arange dtype to float (#2772) 2023-12-14 23:14:38 -05:00