tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 05:48:08 -05:00

Author	SHA1	Message	Date
qazal	337cd53444	multioutput ScheduleItem (#3699 ) * refactor realize.py * update docs * update test_sched * update runners and devices * update openpilot and unit tests * cleanup runner lowering * update more tests	2024-03-13 08:59:38 -07:00
David Hou	2befdf86d9	dataloader worker/shm cleanup (#3710 )	2024-03-12 21:44:24 -04:00
chenyu	b13457e4a7	explicit dtypes in hlb_cifar (#3707 ) prepared bfloat16 change. added float() and cast(default_float) in whiteing, explicitly set dtype in various places that convert between numpy and Tensor	2024-03-12 18:20:23 -04:00
qazal	aec4c4f01b	linearizer ast as a tuple of lazyops (#3689 ) * multi store op linearizer * currently we do only one output per kernel * named opts	2024-03-11 15:39:04 -07:00
rnxyfvls	490c5a3ec3	examples/stable_diffusion: support model checkpoints without alphas_cumprod key (#3681 ) * examples/stable_diffusion: support model checkpoints without alphas_cumprod key (which is most models on civitai) * fix indent --------- Co-authored-by: a <a@a.aa>	2024-03-11 16:05:52 -04:00
chenyu	d69170e27e	add llama 2 70B in ci and verify output (#3682 ) * add llama 2 70B in ci and verify output * ln -s llama2 dir	2024-03-11 12:48:22 -04:00
George Hotz	3415b0ee54	hotfix: mixtral copies norms together for 2% speed	2024-03-11 01:28:03 +00:00
chenyu	bad6adaf8c	add mixtral and 6 gpus cifar to tinybox ci (#3676 ) * add mixtral and 6 gpus cifar to tinybox ci * print total ram used at the end of loading	2024-03-10 18:25:31 -04:00
David Hou	9f66dcf718	PolynomialDecayWithWarmup + tests (#3649 ) * working PolynomialDecayWithWarmup + tests....... add lars_util.py, oops * keep lars_util.py as intact as possible, simplify our interface * whitespace * clean up * clean up * asserts * test polylr for full resnet training run * add comment * rename * fix do_optim * don't cast lr * info * calculate from train_files * skip it	2024-03-07 18:53:36 -05:00
chenyu	fcf4a5ccf2	fix example that calls Tensor.__bool__ (#3650 ) also removed `.cpu()` calls in mask_rcnn so `python3 examples/mlperf/model_spec.py` runs	2024-03-07 16:59:26 -05:00
David Hou	0afaf70d57	lars optimizer + tests (#3631 ) * lars optimizer + tests * fix skip list! * use id to compare in skip list * go back to using set * Tensor(bool) * Tensor(bool) is and * don't lint external/mlperf_resnet * whitespace * add external_test_optim to opencl tests * give mlperf task a name * mlperf under onnx * remove track_gnorm * contiguous instead of realize * assert momentum and weight decay positive --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-06 18:11:01 -05:00
David Hou	d16aa89561	don't allow MLB assigns with different axes (#3557 ) * allow LB <- MLB assign, but don't reuse buffer * update test * update test * assign assert axes are the same * update tests to manually shard running stats * unused import	2024-03-01 07:59:06 -05:00
David Hou	e5385eecfc	UnsyncedBatchNorm with synced trainable weights for hlb cifar (#3472 ) * UnsyncedBatchNorm with synced trainable weights for hlb cifar * multitensor reshape tests * test mlb assign change axis * E501 * argfix axis * don't import batchnorm from hlb_cifar in test_multitensor * pass num_devices to UnsyncedBatchNorm in test, allow UnsyncedBatchNorm to be used with LB * add backprop test for UnsyncedBatchNorm * break out MLB assign and reshape changes * manually shard running mean and running var * don't shard unless syncbn=0 * replace nn.BatchNorm2d with UnsyncedBatchNorm * don't increment num_batches_tracked if not tracking running stats * update tests * oops * Revert "oops" This reverts commit `5e8a67a535`. * Revert "update tests" This reverts commit `7ebf65d89a`. * Revert "don't increment num_batches_tracked if not tracking running stats" This reverts commit `78de0ea9ee`. * Revert "replace nn.BatchNorm2d with UnsyncedBatchNorm" This reverts commit `d03da53da7`. * don't increment num_batched_tracked if not tracking running stats * oops * test_batchnorm_axis * compare against torch * types --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-29 22:52:07 -05:00
George Hotz	2e60012bcf	move create schedule and delete old API (#3377 ) * move create schedule and delete old API * fix test multitensor	2024-02-12 18:10:45 +01:00
George Hotz	41efaa848c	move graph.py and jit.py into features (#3376 ) * move graph.py into features * move jit into features * fix quickstart	2024-02-12 17:34:34 +01:00
chenyu	d8ad9e5660	verify eval acc for hlb_cifar training (#3344 ) set to 93% to reduce flakiness for now	2024-02-07 19:19:59 -05:00
chenyu	18e854cdbf	shrink MLB on sharded axis (#3255 ) * shrink MLB on sharded axis use onehot structure to store the real partition. goal is unsynced batchnorm2d that can be run on multigpu for training. draft version in https://github.com/chenyuxyz/tinygrad/pull/109 * SYNCBN flag * test unclean shrinks * UnsyncedBatchNorm reuses BatchNorm * more robust pad arg check * better types * more tests! * 6 gpus in benchmark * disable slow GPUS=6 benchmark	2024-01-31 21:48:25 -05:00
chenyu	77251336d5	fix handcode_resnet50_opt.py (#3289 ) linearizer_opts has moved. also update the logging to print after total_tm update	2024-01-31 19:01:08 -05:00
chenyu	b0a755288f	cifar EVAL_BS set default value to BS (#3274 ) less compile time for eval due to cache. 500 was a slow uneven number for 6 GPU too. eval time 5.9s -> 3.4s	2024-01-29 17:37:12 -05:00
Francis Lata	86748f4a8c	fix bbox format to be a list (#3265 )	2024-01-27 17:54:19 -08:00
chenyu	9e5409be6c	cifar move GlobalCounters.reset() before shard (#3217 ) * cifar move GlobalCounters.reset() before shard also shard mini batch inplace * don't eval with DISABLE_BACKWARD	2024-01-23 16:07:43 -05:00
chenyu	3c179cc27c	cifar only shuffle data at epoch start (#3216 ) save 1ms CPU time per batch. also only shuffle training set	2024-01-23 14:41:22 -05:00
chenyu	8465938d29	minor hlb_cifar cleanups (#3208 ) mostly cosmetic. LATEBEAM=4 single 7900xtx 59.2 seconds	2024-01-22 12:38:39 -05:00
chenyu	827b7a3c64	cleanup pad_reflect and make_square_mask in hlb_cifar (#3206 ) removed some complicated looking stuff. no wall time difference	2024-01-22 11:30:46 -05:00
chenyu	99884f4c98	cifar flags for RANDOM_CROP, RANDOM_FLIP, and CUTMIX (#3204 ) experimenting with different setups, also would like to jit the data augmentation next	2024-01-22 01:12:51 -05:00
chenyu	53afec2841	add HALF to handcode_resnet50_opt.py (#3202 ) use this to study tensor cores on HIP	2024-01-21 23:03:59 -05:00
chenyu	836883fedc	comment out cutmix in hlb_cifar (#3201 ) it's no-op with multi gpu and less STEPS. also the patch was selected from the whole dataset, not from the same batch	2024-01-21 22:24:53 -05:00
George Hotz	c80884884e	event driven hip (#3160 ) * event driven hip * simpler, src makes copy * pass mypy	2024-01-18 14:35:18 -08:00
chenyu	e52a609240	make WINO a context var, and LATEWINO in hlb_cifar (#3161 )	2024-01-17 20:21:26 -05:00
George Hotz	9cc2577a08	use hip events (#3157 ) * use hip events * cleanup	2024-01-17 10:39:57 -08:00
George Hotz	a72b1b6d65	sharding for llama (#3151 ) * shard llama * sharding works * simpler * simpler * consume option * disable that test * save a line --------- Co-authored-by: George Hotz <george@tinygrad.org>	2024-01-16 19:28:00 -08:00
chenyu	589c16756f	hlb_cifar multi gpu training (#3150 ) * cifar train with multi gpu * GPUS=1 is noop	2024-01-16 14:38:45 -05:00
George Hotz	228f30b96a	multitensor jit (#3149 ) * initial multitensor jit support and tests * Added graphs to multitensor jit and updated tests * update unbind api * fix set device, add TinyJit to resnet * update_stats includes device --------- Co-authored-by: ramenguy99 <ramenguy99@gmail.com>	2024-01-16 09:09:15 -08:00
chenyu	b9d470577c	gelu -> quick_gelu in hlb_cifar (#3147 ) 89 -> 86 seconds, same eval acc	2024-01-16 02:03:37 -05:00
chenyu	ec5a212b0a	modernize hlb_cifar (#3146 ) * modernize hlb_cifar do more things in Tensor space instead of numpy, clean up dtypes and use more Tensor methods. * eigens are float64	2024-01-16 01:35:11 -05:00
chenyu	22920a7e55	add LATEBEAM to hlb_cifar (#3142 ) still too slow to search on tinybox though	2024-01-15 23:26:03 -05:00
George Hotz	cec0a7bc37	use shard api to eval resnet fast (#3136 ) * use shard api to eval resnet fast * to supports shard * test to in multitensor	2024-01-15 16:49:38 -08:00
George Hotz	a464909d79	fast resnet eval (#3135 ) * fast resnet eval * fix HIP multidevice graph * neater expression for devices * lines * add decorator test	2024-01-15 14:15:18 -08:00
chenyu	79f4627fbc	fix conversation: llama generates token not prob now (#3120 )	2024-01-14 13:10:01 -05:00
chenyu	fb3f8f7597	move sample inside jit for beautiful_mnist (#3115 ) also removed .realize() for jit functions since jit does it automatically now. a little more beautiful	2024-01-14 01:36:30 -05:00
chenyu	c3c35f9142	flag to profile mixtral - 1.7 tok/s now (#3104 )	2024-01-12 18:54:27 -05:00
chenyu	f96fc6e9d4	fix gpt2 with empty prompt take 2 (#3102 ) logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes	2024-01-12 14:46:36 -05:00
chenyu	ca46d3541b	Revert "fix gpt2 with empty prompt" (#3101 )	2024-01-12 14:27:41 -05:00
chenyu	1d7f01bc6d	fix gpt2 with empty prompt (#3100 ) logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes	2024-01-12 14:18:17 -05:00
chenyu	507e0afba0	fix onehot and jit in examples/transformer (#3073 ) trained to 0.999 in < 6 seconds on M1 Max consistently	2024-01-10 02:22:41 -05:00
George Hotz	ae83733431	hotfix: examples/transformer.py	2024-01-09 19:28:09 -08:00
chenyu	f0d7ad8aaa	fix gpt2 attention with start_pos = 0 (#3061 ) * fix gpt2 attention with start_pos size 1 test cases taken from ll_transformer branch * fix interpreted	2024-01-09 16:14:55 -05:00
George Hotz	655c6f61d3	St real size (#3046 ) * track the size in the lazybuffer * shapetracker real size * lint	2024-01-08 14:44:53 -08:00
George Hotz	c003be7309	Revert "track size in shapetracker" (#3043 ) * Revert "track size in shapetracker (#3026)" This reverts commit `a8ba1ac08f`. * st.size	2024-01-08 13:13:39 -08:00
George Hotz	c5a941d466	webgl backend in extra (#3041 ) * WebGL WIP * 84% of ops passing test * tests passing 100% * Cleanup, refactor * Shave off some lines * Work on dtypes * TestOps at 100% again * Efficient net shaders compile in browser webgl2 * Compile all efficientnet shaders in browser * Create empty textures for tensor buffers * Run program. Up next weight loading * Exported WebGL model working * Add tests, refactor * Explicit cast alu for GLSL * Fix CI tests * WebGL efficientnet demo * Compile and run yolov8 in browser * Fix imports * Simplify yolo compile * Fix boolbool and cast cmplt to float More tests * Do std tests pass on CI? * Skip std tests on CI * Remove explicit_cast_alu hack, and solve it in code_for_op * Move to new dtype-less alloc api * Remove local size hack: optimize local_size only if device has local * Remove glsl.py, and move content to cstyle * dont_use_locals in opts * Fix dtype tests * type_map in CStyleLanguage * Make core changes smaller, cleaner, refactor export_model and demo * Skip pad_slice * Simplify: render_const, render_conditional * solve bool alu for other binops, cleaner ops_webgl * Fix noopt hack * Remove some skipIfs * WebGL image hack * type_names is a better name * global_max * Fix dtype import * Fix type_names -> type_map * Fix lint * Remove webgpu, back to 5k lines (#3040) * remove webgpu * max 5000 lines * revert those to master * retain that cstyle --------- Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>	2024-01-08 09:29:13 -08:00

... 11 12 13 14 15 ...

1174 Commits