tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
George Hotz	655c6f61d3	St real size (#3046 ) * track the size in the lazybuffer * shapetracker real size * lint	2024-01-08 14:44:53 -08:00
George Hotz	c003be7309	Revert "track size in shapetracker" (#3043 ) * Revert "track size in shapetracker (#3026)" This reverts commit `a8ba1ac08f`. * st.size	2024-01-08 13:13:39 -08:00
George Hotz	c5a941d466	webgl backend in extra (#3041 ) * WebGL WIP * 84% of ops passing test * tests passing 100% * Cleanup, refactor * Shave off some lines * Work on dtypes * TestOps at 100% again * Efficient net shaders compile in browser webgl2 * Compile all efficientnet shaders in browser * Create empty textures for tensor buffers * Run program. Up next weight loading * Exported WebGL model working * Add tests, refactor * Explicit cast alu for GLSL * Fix CI tests * WebGL efficientnet demo * Compile and run yolov8 in browser * Fix imports * Simplify yolo compile * Fix boolbool and cast cmplt to float More tests * Do std tests pass on CI? * Skip std tests on CI * Remove explicit_cast_alu hack, and solve it in code_for_op * Move to new dtype-less alloc api * Remove local size hack: optimize local_size only if device has local * Remove glsl.py, and move content to cstyle * dont_use_locals in opts * Fix dtype tests * type_map in CStyleLanguage * Make core changes smaller, cleaner, refactor export_model and demo * Skip pad_slice * Simplify: render_const, render_conditional * solve bool alu for other binops, cleaner ops_webgl * Fix noopt hack * Remove some skipIfs * WebGL image hack * type_names is a better name * global_max * Fix dtype import * Fix type_names -> type_map * Fix lint * Remove webgpu, back to 5k lines (#3040) * remove webgpu * max 5000 lines * revert those to master * retain that cstyle --------- Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>	2024-01-08 09:29:13 -08:00
George Hotz	cf2eea961c	more beautiful_cartpole with exposed hparams	2024-01-07 17:41:09 -08:00
chenyu	fa707c81e5	move beautiful cartpole action sampling inside jit (#3028 ) tested by getting 3 full scores in a row	2024-01-06 00:39:55 -05:00
George Hotz	ebb81e8f11	hotfix: st.size() -> st.size in llama	2024-01-05 20:18:52 -08:00
George Hotz	f432ec9c33	Bitcast hip fix + fix mixtral (#3022 ) * fix bitcast in hip * wrong dtype for precast, double COPY	2024-01-05 14:51:25 -08:00
chenyu	7c80b78be9	cleanup gpt2 build function (#3018 )	2024-01-04 23:14:53 -05:00
chenyu	f88506e630	move gpt2/llama sampling inside the model call (#3013 ) * move gpt2/llama sampling inside the model call * argmax uses one more kernel	2024-01-04 17:01:50 -05:00
chenyu	8524493748	minor gpt2 cleanup (#3012 )	2024-01-04 13:53:18 -05:00
Yixiang Gao	8e1fd6ae9d	test works	2024-01-03 07:22:01 -08:00
Yixiang Gao	4f89f8b73a	make sure the old hyp breaks the test	2024-01-03 07:13:54 -08:00
Yixiang Gao	b753d280f7	move hyp out of the train so it can be imported	2024-01-02 15:56:17 -08:00
Yixiang Gao	2e4d9ad936	adjsut div factor to avoid underflow	2024-01-02 13:47:13 -08:00
chenyu	58d3d5030b	vars_from_ast -> LazyOp.vars (#2965 )	2024-01-01 18:12:38 -05:00
George Hotz	980f421442	hotfix: remove cast from beautiful_cartpole	2024-01-01 15:02:03 -08:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
chenyu	61e255d197	use max for gpt2 and llama (#2949 ) not using argmax yet because there's a multinomial outside of function.	2023-12-28 23:26:00 -05:00
chenyu	2f67f1e580	remove obsolete TODO in beautiful_mnist (#2946 ) the compiler error was due to `error: call to 'max' is ambiguous` when we have max(int, float) in kernel. it was first fixed in `4380ccb1` the non fp32 math PR, and further solidified with dtype refactor	2023-12-28 17:09:23 -05:00
chenyu	50927defad	s/lazydata.realized/lazydata.base.realized/g (#2914 ) * s/lazydata.realized/lazydata.base.realized/g * not that	2023-12-22 14:45:13 -05:00
chenyu	7dc3352877	increase stable diffusion validation threshold 1e-4 -> 3e-4 (#2897 ) saw a flaky CI failure with 1.1e-4, and 3e-4 is a good number	2023-12-21 11:45:25 -05:00
George Hotz	64dded27f0	pad ops broke coder (#2881 ) * pad ops broke coder * that contiguous fixes it * Update lazy.py	2023-12-20 17:03:41 -08:00
George Hotz	1765849937	new lazy, benchmark (#2878 ) * lazy rewrite, try 2 * min fix tests * pass contig test * put broken pads back * move that to realize * no contig child fixes array packing * so wrong * now that's correct * base children * fix bind issues * disable to_image_idx * fix tests * that failure shouldn't break other tests * more fixes * fix torch * skip failing tests in CI * 1e-7 * half is broken * 1e-6 margin of error	2023-12-20 14:33:21 -08:00
chenyu	857c35d256	make gpt2 decode output just once at the end (#2869 ) also updated function name from greedy_until to generate, as it's not greedy nor until	2023-12-20 12:14:55 -05:00
chenyu	6d7e9e0a56	hotfix convert Y_train to int before passing into index (#2850 )	2023-12-19 11:40:56 -05:00
chenyu	0723f26c80	dtypes.default_float and dtypes.default_int (#2824 )	2023-12-18 12:21:44 -05:00
George Hotz	c6eb618013	tests from new lazy branch (#2774 ) * tests from new lazy branch * fix lin 11 * that was needed * doesn't fail * mark * meant that * llvm passes	2023-12-14 23:06:39 -08:00
chenyu	a044125c39	validate stable diffusion for seed 0 (#2773 ) * validate stable diffusion for seed 0 the closest false positive i can get is with the setup and one less step. dist = 0.0036 same setup with fp16 has dist=5e-6. so setting validation threshold to 1e-4 should be good * run with --seed 0	2023-12-15 00:07:09 -05:00
chenyu	9afa8009c1	hot fix explicitly set arange dtype to float (#2772 )	2023-12-14 23:14:38 -05:00
chenyu	c0f76ed4ea	transformer kvcache and mask have same dtype as input (#2771 ) * transformer kvcache and mask have same dtype as input * don't use `=0` in cstyle ternary where * (bool) * where float16 test	2023-12-14 22:41:51 -05:00
jaredeh	d8952fc575	updating to work with new internal apis (#2755 )	2023-12-13 21:54:47 -08:00
Ivan Vnučec	8d206f6bfd	fix help message (#2705 ) llama -> mixtral	2023-12-10 22:04:35 -08:00
George Hotz	59ab3675a3	faster mixtral + green for new kernels (#2701 ) * green for new kernels * track ram	2023-12-10 19:04:58 -08:00
George Hotz	b01e3907a1	mixtral touch up: two lines	2023-12-10 17:21:49 -08:00
George Hotz	b3982187d1	Mixtral Example (#2691 ) * mixtral * simpler * global counters * simpler * weights arg	2023-12-10 17:18:31 -08:00
George Hotz	0fd44259cd	bf16 fix + cleanups from mixtral (#2698 ) * bf16 fix + cleanups from mixtral * generic bf16 cast	2023-12-10 16:31:52 -08:00
chenyu	fae5394845	validate llama output (#2681 ) * validate llama output * does not work with quantize	2023-12-08 16:42:01 -05:00
nickovaras	182d067407	Update yolov3.py (#2680 ) The current yolov3 example is broken with the current implementation of of fetch in the helpers. I was tempted to fix the helpers instead but that could have just as well broken other examples.	2023-12-08 12:59:38 -08:00
George Hotz	00d9eda961	FROM -> COPY, move vars_from_ast (#2675 )	2023-12-07 16:32:30 -08:00
chenyu	539b00a645	move llama getenv("JIT") from models to examples (#2671 ) Transformer class has a jit param so we should use that in the caller	2023-12-07 12:43:22 -05:00
chenyu	371005cb2d	use one kvcache tensor in gpt2 instead of two separate caches (#2662 ) * use one kvcache tensor in gpt2 * test case * is None * better test cases	2023-12-06 20:59:17 -05:00
chenyu	0978c24b8e	fast gpt2 embedding with variable bs=1 (#2596 )	2023-12-05 23:01:17 -05:00
chenyu	229ada5fe5	Gpt2 benchmark with HALF and BEAM (#2636 ) * benchmark gpt2 with half and beam * BEAM=4 * optional validation * green is good * we care	2023-12-05 22:15:16 -05:00
Oleg Rybalko	7c427d738c	don't apply padding on script call (#2585 ) * don't apply padding on script call * no need for new param because batch_size value can be utilized to check * fixed argument naming	2023-12-05 16:34:10 -08:00
George Hotz	9d7ead84e1	hotfix: no need for model cache in examples/coder.py	2023-12-05 16:27:36 -08:00
George Hotz	232ed2af3f	more test cleanups (#2631 ) * more test cleanups * move test example back	2023-12-05 16:17:57 -08:00
chenyu	a63f48d3db	gpt2 half for kvcache and output logits (#2630 ) * gpt2 more half * hlaf is fine after softmax	2023-12-05 16:54:56 -05:00
George Hotz	8c67eb1c92	GPT bugfixes (#2624 ) * simple fixes * fix exp2 * fixed * parallel beam for CUDA * fix image dtypes	2023-12-05 11:42:28 -08:00
qazal	ab2d4d8d29	Fix cl import in the copy_speed test and cifar example (#2586 ) * fix CL import * update test to only run on GPU * update hlb_cifar too	2023-12-03 09:22:07 -08:00

1 2 3 4 5 ...

527 Commits