tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
George Hotz	c80884884e	event driven hip (#3160 ) * event driven hip * simpler, src makes copy * pass mypy	2024-01-18 14:35:18 -08:00
chenyu	e52a609240	make WINO a context var, and LATEWINO in hlb_cifar (#3161 )	2024-01-17 20:21:26 -05:00
George Hotz	9cc2577a08	use hip events (#3157 ) * use hip events * cleanup	2024-01-17 10:39:57 -08:00
George Hotz	a72b1b6d65	sharding for llama (#3151 ) * shard llama * sharding works * simpler * simpler * consume option * disable that test * save a line --------- Co-authored-by: George Hotz <george@tinygrad.org>	2024-01-16 19:28:00 -08:00
chenyu	589c16756f	hlb_cifar multi gpu training (#3150 ) * cifar train with multi gpu * GPUS=1 is noop	2024-01-16 14:38:45 -05:00
George Hotz	228f30b96a	multitensor jit (#3149 ) * initial multitensor jit support and tests * Added graphs to multitensor jit and updated tests * update unbind api * fix set device, add TinyJit to resnet * update_stats includes device --------- Co-authored-by: ramenguy99 <ramenguy99@gmail.com>	2024-01-16 09:09:15 -08:00
chenyu	b9d470577c	gelu -> quick_gelu in hlb_cifar (#3147 ) 89 -> 86 seconds, same eval acc	2024-01-16 02:03:37 -05:00
chenyu	ec5a212b0a	modernize hlb_cifar (#3146 ) * modernize hlb_cifar do more things in Tensor space instead of numpy, clean up dtypes and use more Tensor methods. * eigens are float64	2024-01-16 01:35:11 -05:00
chenyu	22920a7e55	add LATEBEAM to hlb_cifar (#3142 ) still too slow to search on tinybox though	2024-01-15 23:26:03 -05:00
George Hotz	cec0a7bc37	use shard api to eval resnet fast (#3136 ) * use shard api to eval resnet fast * to supports shard * test to in multitensor	2024-01-15 16:49:38 -08:00
George Hotz	a464909d79	fast resnet eval (#3135 ) * fast resnet eval * fix HIP multidevice graph * neater expression for devices * lines * add decorator test	2024-01-15 14:15:18 -08:00
chenyu	79f4627fbc	fix conversation: llama generates token not prob now (#3120 )	2024-01-14 13:10:01 -05:00
chenyu	fb3f8f7597	move sample inside jit for beautiful_mnist (#3115 ) also removed .realize() for jit functions since jit does it automatically now. a little more beautiful	2024-01-14 01:36:30 -05:00
chenyu	c3c35f9142	flag to profile mixtral - 1.7 tok/s now (#3104 )	2024-01-12 18:54:27 -05:00
chenyu	f96fc6e9d4	fix gpt2 with empty prompt take 2 (#3102 ) logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes	2024-01-12 14:46:36 -05:00
chenyu	ca46d3541b	Revert "fix gpt2 with empty prompt" (#3101 )	2024-01-12 14:27:41 -05:00
chenyu	1d7f01bc6d	fix gpt2 with empty prompt (#3100 ) logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes	2024-01-12 14:18:17 -05:00
chenyu	507e0afba0	fix onehot and jit in examples/transformer (#3073 ) trained to 0.999 in < 6 seconds on M1 Max consistently	2024-01-10 02:22:41 -05:00
George Hotz	ae83733431	hotfix: examples/transformer.py	2024-01-09 19:28:09 -08:00
chenyu	f0d7ad8aaa	fix gpt2 attention with start_pos = 0 (#3061 ) * fix gpt2 attention with start_pos size 1 test cases taken from ll_transformer branch * fix interpreted	2024-01-09 16:14:55 -05:00
George Hotz	655c6f61d3	St real size (#3046 ) * track the size in the lazybuffer * shapetracker real size * lint	2024-01-08 14:44:53 -08:00
George Hotz	c003be7309	Revert "track size in shapetracker" (#3043 ) * Revert "track size in shapetracker (#3026)" This reverts commit `a8ba1ac08f`. * st.size	2024-01-08 13:13:39 -08:00
George Hotz	c5a941d466	webgl backend in extra (#3041 ) * WebGL WIP * 84% of ops passing test * tests passing 100% * Cleanup, refactor * Shave off some lines * Work on dtypes * TestOps at 100% again * Efficient net shaders compile in browser webgl2 * Compile all efficientnet shaders in browser * Create empty textures for tensor buffers * Run program. Up next weight loading * Exported WebGL model working * Add tests, refactor * Explicit cast alu for GLSL * Fix CI tests * WebGL efficientnet demo * Compile and run yolov8 in browser * Fix imports * Simplify yolo compile * Fix boolbool and cast cmplt to float More tests * Do std tests pass on CI? * Skip std tests on CI * Remove explicit_cast_alu hack, and solve it in code_for_op * Move to new dtype-less alloc api * Remove local size hack: optimize local_size only if device has local * Remove glsl.py, and move content to cstyle * dont_use_locals in opts * Fix dtype tests * type_map in CStyleLanguage * Make core changes smaller, cleaner, refactor export_model and demo * Skip pad_slice * Simplify: render_const, render_conditional * solve bool alu for other binops, cleaner ops_webgl * Fix noopt hack * Remove some skipIfs * WebGL image hack * type_names is a better name * global_max * Fix dtype import * Fix type_names -> type_map * Fix lint * Remove webgpu, back to 5k lines (#3040) * remove webgpu * max 5000 lines * revert those to master * retain that cstyle --------- Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>	2024-01-08 09:29:13 -08:00
George Hotz	cf2eea961c	more beautiful_cartpole with exposed hparams	2024-01-07 17:41:09 -08:00
chenyu	fa707c81e5	move beautiful cartpole action sampling inside jit (#3028 ) tested by getting 3 full scores in a row	2024-01-06 00:39:55 -05:00
George Hotz	ebb81e8f11	hotfix: st.size() -> st.size in llama	2024-01-05 20:18:52 -08:00
George Hotz	f432ec9c33	Bitcast hip fix + fix mixtral (#3022 ) * fix bitcast in hip * wrong dtype for precast, double COPY	2024-01-05 14:51:25 -08:00
chenyu	7c80b78be9	cleanup gpt2 build function (#3018 )	2024-01-04 23:14:53 -05:00
chenyu	f88506e630	move gpt2/llama sampling inside the model call (#3013 ) * move gpt2/llama sampling inside the model call * argmax uses one more kernel	2024-01-04 17:01:50 -05:00
chenyu	8524493748	minor gpt2 cleanup (#3012 )	2024-01-04 13:53:18 -05:00
Yixiang Gao	8e1fd6ae9d	test works	2024-01-03 07:22:01 -08:00
Yixiang Gao	4f89f8b73a	make sure the old hyp breaks the test	2024-01-03 07:13:54 -08:00
Yixiang Gao	b753d280f7	move hyp out of the train so it can be imported	2024-01-02 15:56:17 -08:00
Yixiang Gao	2e4d9ad936	adjsut div factor to avoid underflow	2024-01-02 13:47:13 -08:00
chenyu	58d3d5030b	vars_from_ast -> LazyOp.vars (#2965 )	2024-01-01 18:12:38 -05:00
George Hotz	980f421442	hotfix: remove cast from beautiful_cartpole	2024-01-01 15:02:03 -08:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
chenyu	61e255d197	use max for gpt2 and llama (#2949 ) not using argmax yet because there's a multinomial outside of function.	2023-12-28 23:26:00 -05:00
chenyu	2f67f1e580	remove obsolete TODO in beautiful_mnist (#2946 ) the compiler error was due to `error: call to 'max' is ambiguous` when we have max(int, float) in kernel. it was first fixed in `4380ccb1` the non fp32 math PR, and further solidified with dtype refactor	2023-12-28 17:09:23 -05:00
chenyu	50927defad	s/lazydata.realized/lazydata.base.realized/g (#2914 ) * s/lazydata.realized/lazydata.base.realized/g * not that	2023-12-22 14:45:13 -05:00
chenyu	7dc3352877	increase stable diffusion validation threshold 1e-4 -> 3e-4 (#2897 ) saw a flaky CI failure with 1.1e-4, and 3e-4 is a good number	2023-12-21 11:45:25 -05:00
George Hotz	64dded27f0	pad ops broke coder (#2881 ) * pad ops broke coder * that contiguous fixes it * Update lazy.py	2023-12-20 17:03:41 -08:00
George Hotz	1765849937	new lazy, benchmark (#2878 ) * lazy rewrite, try 2 * min fix tests * pass contig test * put broken pads back * move that to realize * no contig child fixes array packing * so wrong * now that's correct * base children * fix bind issues * disable to_image_idx * fix tests * that failure shouldn't break other tests * more fixes * fix torch * skip failing tests in CI * 1e-7 * half is broken * 1e-6 margin of error	2023-12-20 14:33:21 -08:00
chenyu	857c35d256	make gpt2 decode output just once at the end (#2869 ) also updated function name from greedy_until to generate, as it's not greedy nor until	2023-12-20 12:14:55 -05:00
chenyu	6d7e9e0a56	hotfix convert Y_train to int before passing into index (#2850 )	2023-12-19 11:40:56 -05:00
chenyu	0723f26c80	dtypes.default_float and dtypes.default_int (#2824 )	2023-12-18 12:21:44 -05:00
George Hotz	c6eb618013	tests from new lazy branch (#2774 ) * tests from new lazy branch * fix lin 11 * that was needed * doesn't fail * mark * meant that * llvm passes	2023-12-14 23:06:39 -08:00
chenyu	a044125c39	validate stable diffusion for seed 0 (#2773 ) * validate stable diffusion for seed 0 the closest false positive i can get is with the setup and one less step. dist = 0.0036 same setup with fp16 has dist=5e-6. so setting validation threshold to 1e-4 should be good * run with --seed 0	2023-12-15 00:07:09 -05:00
chenyu	9afa8009c1	hot fix explicitly set arange dtype to float (#2772 )	2023-12-14 23:14:38 -05:00

1 2 3 4 5 ...

547 Commits