George Hotz
2e60012bcf
move create schedule and delete old API ( #3377 )
...
* move create schedule and delete old API
* fix test multitensor
2024-02-12 18:10:45 +01:00
George Hotz
41efaa848c
move graph.py and jit.py into features ( #3376 )
...
* move graph.py into features
* move jit into features
* fix quickstart
2024-02-12 17:34:34 +01:00
chenyu
d8ad9e5660
verify eval acc for hlb_cifar training ( #3344 )
...
set to 93% to reduce flakiness for now
2024-02-07 19:19:59 -05:00
chenyu
18e854cdbf
shrink MLB on sharded axis ( #3255 )
...
* shrink MLB on sharded axis
use onehot structure to store the real partition. goal is unsynced batchnorm2d that can be run on multigpu for training.
draft version in https://github.com/chenyuxyz/tinygrad/pull/109
* SYNCBN flag
* test unclean shrinks
* UnsyncedBatchNorm reuses BatchNorm
* more robust pad arg check
* better types
* more tests!
* 6 gpus in benchmark
* disable slow GPUS=6 benchmark
2024-01-31 21:48:25 -05:00
chenyu
77251336d5
fix handcode_resnet50_opt.py ( #3289 )
...
linearizer_opts has moved. also update the logging to print after total_tm update
2024-01-31 19:01:08 -05:00
chenyu
b0a755288f
cifar EVAL_BS set default value to BS ( #3274 )
...
less compile time for eval due to cache. 500 was a slow uneven number for 6 GPU too. eval time 5.9s -> 3.4s
2024-01-29 17:37:12 -05:00
Francis Lata
86748f4a8c
fix bbox format to be a list ( #3265 )
2024-01-27 17:54:19 -08:00
chenyu
9e5409be6c
cifar move GlobalCounters.reset() before shard ( #3217 )
...
* cifar move GlobalCounters.reset() before shard
also shard mini batch inplace
* don't eval with DISABLE_BACKWARD
2024-01-23 16:07:43 -05:00
chenyu
3c179cc27c
cifar only shuffle data at epoch start ( #3216 )
...
save 1ms CPU time per batch. also only shuffle training set
2024-01-23 14:41:22 -05:00
chenyu
8465938d29
minor hlb_cifar cleanups ( #3208 )
...
mostly cosmetic. LATEBEAM=4 single 7900xtx 59.2 seconds
2024-01-22 12:38:39 -05:00
chenyu
827b7a3c64
cleanup pad_reflect and make_square_mask in hlb_cifar ( #3206 )
...
removed some complicated looking stuff. no wall time difference
2024-01-22 11:30:46 -05:00
chenyu
99884f4c98
cifar flags for RANDOM_CROP, RANDOM_FLIP, and CUTMIX ( #3204 )
...
experimenting with different setups, also would like to jit the data augmentation next
2024-01-22 01:12:51 -05:00
chenyu
53afec2841
add HALF to handcode_resnet50_opt.py ( #3202 )
...
use this to study tensor cores on HIP
2024-01-21 23:03:59 -05:00
chenyu
836883fedc
comment out cutmix in hlb_cifar ( #3201 )
...
it's no-op with multi gpu and less STEPS. also the patch was selected from the whole dataset, not from the same batch
2024-01-21 22:24:53 -05:00
George Hotz
c80884884e
event driven hip ( #3160 )
...
* event driven hip
* simpler, src makes copy
* pass mypy
2024-01-18 14:35:18 -08:00
chenyu
e52a609240
make WINO a context var, and LATEWINO in hlb_cifar ( #3161 )
2024-01-17 20:21:26 -05:00
George Hotz
9cc2577a08
use hip events ( #3157 )
...
* use hip events
* cleanup
2024-01-17 10:39:57 -08:00
George Hotz
a72b1b6d65
sharding for llama ( #3151 )
...
* shard llama
* sharding works
* simpler
* simpler
* consume option
* disable that test
* save a line
---------
Co-authored-by: George Hotz <george@tinygrad.org >
2024-01-16 19:28:00 -08:00
chenyu
589c16756f
hlb_cifar multi gpu training ( #3150 )
...
* cifar train with multi gpu
* GPUS=1 is noop
2024-01-16 14:38:45 -05:00
George Hotz
228f30b96a
multitensor jit ( #3149 )
...
* initial multitensor jit support and tests
* Added graphs to multitensor jit and updated tests
* update unbind api
* fix set device, add TinyJit to resnet
* update_stats includes device
---------
Co-authored-by: ramenguy99 <ramenguy99@gmail.com >
2024-01-16 09:09:15 -08:00
chenyu
b9d470577c
gelu -> quick_gelu in hlb_cifar ( #3147 )
...
89 -> 86 seconds, same eval acc
2024-01-16 02:03:37 -05:00
chenyu
ec5a212b0a
modernize hlb_cifar ( #3146 )
...
* modernize hlb_cifar
do more things in Tensor space instead of numpy, clean up dtypes and use more Tensor methods.
* eigens are float64
2024-01-16 01:35:11 -05:00
chenyu
22920a7e55
add LATEBEAM to hlb_cifar ( #3142 )
...
still too slow to search on tinybox though
2024-01-15 23:26:03 -05:00
George Hotz
cec0a7bc37
use shard api to eval resnet fast ( #3136 )
...
* use shard api to eval resnet fast
* to supports shard
* test to in multitensor
2024-01-15 16:49:38 -08:00
George Hotz
a464909d79
fast resnet eval ( #3135 )
...
* fast resnet eval
* fix HIP multidevice graph
* neater expression for devices
* lines
* add decorator test
2024-01-15 14:15:18 -08:00
chenyu
79f4627fbc
fix conversation: llama generates token not prob now ( #3120 )
2024-01-14 13:10:01 -05:00
chenyu
fb3f8f7597
move sample inside jit for beautiful_mnist ( #3115 )
...
also removed .realize() for jit functions since jit does it automatically now. a little more beautiful
2024-01-14 01:36:30 -05:00
chenyu
c3c35f9142
flag to profile mixtral - 1.7 tok/s now ( #3104 )
2024-01-12 18:54:27 -05:00
chenyu
f96fc6e9d4
fix gpt2 with empty prompt take 2 ( #3102 )
...
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:46:36 -05:00
chenyu
ca46d3541b
Revert "fix gpt2 with empty prompt" ( #3101 )
2024-01-12 14:27:41 -05:00
chenyu
1d7f01bc6d
fix gpt2 with empty prompt ( #3100 )
...
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:18:17 -05:00
chenyu
507e0afba0
fix onehot and jit in examples/transformer ( #3073 )
...
trained to 0.999 in < 6 seconds on M1 Max consistently
2024-01-10 02:22:41 -05:00
George Hotz
ae83733431
hotfix: examples/transformer.py
2024-01-09 19:28:09 -08:00
chenyu
f0d7ad8aaa
fix gpt2 attention with start_pos = 0 ( #3061 )
...
* fix gpt2 attention with start_pos size 1
test cases taken from ll_transformer branch
* fix interpreted
2024-01-09 16:14:55 -05:00
George Hotz
655c6f61d3
St real size ( #3046 )
...
* track the size in the lazybuffer
* shapetracker real size
* lint
2024-01-08 14:44:53 -08:00
George Hotz
c003be7309
Revert "track size in shapetracker" ( #3043 )
...
* Revert "track size in shapetracker (#3026 )"
This reverts commit a8ba1ac08f .
* st.size
2024-01-08 13:13:39 -08:00
George Hotz
c5a941d466
webgl backend in extra ( #3041 )
...
* WebGL WIP
* 84% of ops passing test
* tests passing 100%
* Cleanup, refactor
* Shave off some lines
* Work on dtypes
* TestOps at 100% again
* Efficient net shaders compile in browser webgl2
* Compile all efficientnet shaders in browser
* Create empty textures for tensor buffers
* Run program. Up next weight loading
* Exported WebGL model working
* Add tests, refactor
* Explicit cast alu for GLSL
* Fix CI tests
* WebGL efficientnet demo
* Compile and run yolov8 in browser
* Fix imports
* Simplify yolo compile
* Fix bool*bool and cast cmplt to float
* More tests
* Do std tests pass on CI?
* Skip std tests on CI
* Remove explicit_cast_alu hack, and solve it in code_for_op
* Move to new dtype-less alloc api
* Remove local size hack: optimize local_size only if device has local
* Remove glsl.py, and move content to cstyle
* dont_use_locals in opts
* Fix dtype tests
* type_map in CStyleLanguage
* Make core changes smaller, cleaner, refactor export_model and demo
* Skip pad_slice
* Simplify: render_const, render_conditional
* solve bool alu for other binops, cleaner ops_webgl
* Fix noopt hack
* Remove some skipIfs
* WebGL image hack
* type_names is a better name
* global_max
* Fix dtype import
* Fix type_names -> type_map
* Fix lint
* Remove webgpu, back to 5k lines (#3040 )
* remove webgpu
* max 5000 lines
* revert those to master
* retain that cstyle
---------
Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com >
2024-01-08 09:29:13 -08:00
George Hotz
cf2eea961c
more beautiful_cartpole with exposed hparams
2024-01-07 17:41:09 -08:00
chenyu
fa707c81e5
move beautiful cartpole action sampling inside jit ( #3028 )
...
tested by getting 3 full scores in a row
2024-01-06 00:39:55 -05:00
George Hotz
ebb81e8f11
hotfix: st.size() -> st.size in llama
2024-01-05 20:18:52 -08:00
George Hotz
f432ec9c33
Bitcast hip fix + fix mixtral ( #3022 )
...
* fix bitcast in hip
* wrong dtype for precast, double COPY
2024-01-05 14:51:25 -08:00
chenyu
7c80b78be9
cleanup gpt2 build function ( #3018 )
2024-01-04 23:14:53 -05:00
chenyu
f88506e630
move gpt2/llama sampling inside the model call ( #3013 )
...
* move gpt2/llama sampling inside the model call
* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
chenyu
8524493748
minor gpt2 cleanup ( #3012 )
2024-01-04 13:53:18 -05:00
Yixiang Gao
8e1fd6ae9d
test works
2024-01-03 07:22:01 -08:00
Yixiang Gao
4f89f8b73a
make sure the old hyp breaks the test
2024-01-03 07:13:54 -08:00
Yixiang Gao
b753d280f7
move hyp out of the train so it can be imported
2024-01-02 15:56:17 -08:00
Yixiang Gao
2e4d9ad936
adjsut div factor to avoid underflow
2024-01-02 13:47:13 -08:00
chenyu
58d3d5030b
vars_from_ast -> LazyOp.vars ( #2965 )
2024-01-01 18:12:38 -05:00
George Hotz
980f421442
hotfix: remove cast from beautiful_cartpole
2024-01-01 15:02:03 -08:00