tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 14:28:09 -05:00

Author	SHA1	Message	Date
SnakeOnex	025fbf4e80	One hot in tensor.py (#3093 ) * onehot in Tensor.py * one_hot tests * works for all shapes, not just 1 * pylint * not a static method * moved around, num_classes mandatory * pylint * pylint * space & moving * formatting * moved tests	2024-01-12 13:31:18 -05:00
chenyu	7086d77db1	bugfix do not reset shapetracker of 0 size lazybuffer (#3096 ) it might be coming from an expand, and resetting results incorrect stride. caught by interpreted backend	2024-01-11 23:22:52 -05:00
Yixiang Gao	13e872b53f	add mutigpu support for llama attention (#3064 ) * add llama attention test for multigpu * test fails * kv cache trying to shrink on sharded axis * mask None works for scale dot product * kv cache seems to be working but scale dot product breaks * scaled dot product works, but the last linear layer failed * running into the reshape case where it could be wrong for multigpu * making sure it was the reshape * adding contiguous doesn't solve * need to shard more properly * remove reshape test * minor adjustment to scale dot product attention test * weights are sharded wrong * continue fix new weight sharding * clean up * fix attention when start_pos is 0 * remove print * add TODOs for the best mutigpu interface	2024-01-11 16:31:02 -08:00
chenyu	dcf7ecaaff	update jit type annotation post lazy rewrite (#3091 )	2024-01-11 15:49:30 -05:00
chenyu	0fe6904351	use device from LinearizerOptions in kernel search (#3090 ) * use device from LinearizerOptions in kernel search removed all Device.DEFAULT in search.py * pass device string for parallel pickle * device for interpreted backends in LinearizerOptions	2024-01-11 14:46:03 -05:00
chenyu	93e3f952aa	use BEAM=2 instead of BEAM=4 in cuda ci gpt2 (#3089 ) BEAM=2 is faster and less search time. investigating why BEAM2+BEAM4 is slower than BEAM2 alone	2024-01-11 13:21:06 -05:00
chenyu	f502c9b08f	minor cleanup of View.reshape (#3088 ) * minor cleanup of View.reshape removed some redundant logic * new_strides * revert that	2024-01-11 13:05:54 -05:00
chenyu	f40299c3fe	remove the third merging state in view._merge_dims (#3085 ) no logic depends on state == 0 or state == 2	2024-01-11 12:07:43 -05:00
chenyu	7f9590d357	hotfix disable flaky mac runner wino cifar (#3087 )	2024-01-11 11:57:05 -05:00
Yixiang Gao	adcc844755	cat works (#3086 )	2024-01-11 08:25:20 -08:00
chenyu	cdeab9ad97	mem_estimate is always int, not symbolic (#3083 ) * mem_estimate is always int, not symbolic op_estimate can be symbolic, but mem_estimate is always int, thus we don't need to sym_infer it. fixed some long lines too. update_stats is a very big function * operator does not need underscores	2024-01-10 23:39:51 -05:00
Francis Lam	162fa61a32	wmma: clean up device specific tensor core code (#3081 )	2024-01-10 21:03:09 -05:00
chenyu	d218d13885	minor cleanups of lazy.py (#3080 )	2024-01-10 20:17:56 -05:00
chenyu	56dda33fc6	Tensor.expand resolves the new_shape before shortcut return (#3078 ) similar to how reshape is done. also updated shrink shortcut criteria to read similar to pad	2024-01-10 14:29:15 -05:00
Yixiang Gao	6842476ca6	better test demonstration (#3077 ) * a better test demonstration * fix white space	2024-01-10 10:50:52 -08:00
chenyu	507e0afba0	fix onehot and jit in examples/transformer (#3073 ) trained to 0.999 in < 6 seconds on M1 Max consistently	2024-01-10 02:22:41 -05:00
chenyu	4342fccc83	filter_strides -> canonicalize_strides (#3072 )	2024-01-10 01:06:48 -05:00
chenyu	023f5df0e9	simpler idxs_to_idx (#3071 )	2024-01-10 00:30:10 -05:00
George Hotz	2495ca95c7	early gate the graph (#3070 )	2024-01-09 20:17:13 -08:00
George Hotz	ff0d6e4551	jit autorealizes output (#3069 )	2024-01-09 20:10:22 -08:00
George Hotz	ae83733431	hotfix: examples/transformer.py	2024-01-09 19:28:09 -08:00
chenyu	145718a90f	unbind view or shapetracker also returns var_val (#3067 ) * unbind view or shapetracker also returns var_val 4% faster for llama compile time * one line less * unbound_views	2024-01-09 21:45:05 -05:00
jxdv	ef3aa6d7fb	update gh actions (#3033 ) * update checkout actions * update upload artifact * update setup python --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-09 17:52:22 -08:00
George Hotz	3f80c1a098	speedtweaks3: apply shouldn't use the tensor constructor (#3065 ) * speedtweaks3: apply shouldn't use the tensor constructor * replace 0 size with CONST, not 0 in shape	2024-01-09 17:42:33 -08:00
George Hotz	0abe72b677	hotfix: use is for enum compare, a few more	2024-01-09 16:53:13 -08:00
George Hotz	b2b5849f74	hotfix: use is for enum compare	2024-01-09 16:47:27 -08:00
George Hotz	ac3f246c11	cached size (#3060 ) * cached size * simplify simplify * 0 doesn't have base * fix test * cleaner cache * hmm, metal is flaky on this...might be real(ish) but useless as test * short circuit reshape/expand properly * better reshape bypass	2024-01-09 16:37:37 -08:00
Yixiang Gao	73b72b8de2	test scaled dot product attention (#3063 ) * add test * add initial test for scaled dot product attention * test pass for scaled dot product attention	2024-01-09 14:30:57 -08:00
chenyu	55ac2a2cf7	Tensor.cat with 0 shape tensors (#3062 ) * Tensor.cat with 0 shape tensors supported both 0 in cat axis (for a subset of input), or 0 in non-cat axis (all needs to be 0) * no shp	2024-01-09 16:54:06 -05:00
chenyu	f0d7ad8aaa	fix gpt2 attention with start_pos = 0 (#3061 ) * fix gpt2 attention with start_pos size 1 test cases taken from ll_transformer branch * fix interpreted	2024-01-09 16:14:55 -05:00
George Hotz	39b91131bc	Speed tweaks (#3059 ) * base doesn't have to be a function * no double fetch * pop, don't check * make the gc happy * avoid hasattr * cache canonicalize * remove assert, faster base * don't redefine that every time	2024-01-09 11:34:17 -08:00
George Hotz	bf6281f316	hotfix: remove useless slow assert from ShapeTracker	2024-01-09 10:56:36 -08:00
George Hotz	4b687af98f	explicit lazybuffer caching (#3058 )	2024-01-09 10:52:37 -08:00
George Hotz	2c6f2e899d	No extra vars call (#3054 ) * remove unused reciprocal * comment * remove unneeded call to vars * free speedup v0.8.0	2024-01-09 09:52:58 -08:00
Yixiang Gao	259bf9bffc	add multigpu test for RMSNorm (#3056 ) * need all gather * add two multigpu test scenarios for RMSNorm	2024-01-09 09:52:51 -08:00
chenyu	dab8214103	unit tests for Device.canonicalize (#3055 )	2024-01-09 12:47:20 -05:00
George Hotz	374f7659a7	remove unused reciprocal (#3053 ) * remove unused reciprocal * comment	2024-01-09 08:59:04 -08:00
Yixiang Gao	a686663657	make Embedding device aware for multigpu (#3051 ) * make Embedding device aware for multigpu * split line instead of igore because that's cheating * add test incomplete * add test complete * remove comment * fix white space * remove nn.Embedding	2024-01-08 20:09:26 -08:00
chenyu	19298e7a3f	Device._buffers -> Device._devices (#3052 ) backend devices used to be called buffers	2024-01-08 21:30:38 -05:00
chenyu	4f4e8634b8	use in_features directly in nn.Linear.__init__ bound check (#3050 ) * use in_features directly in nn.Linear.__init__ bound check get rid of the unnecessary check of isinstance int * that is always int * long lines	2024-01-08 19:32:35 -05:00
chenyu	ee6a73826b	clean up test_nn.py (#3049 ) used Tensor.train decorator, reordered to always tinygrad instances first, and removed redundant idx cast	2024-01-08 18:45:03 -05:00
chenyu	3eb3664074	fix nn.Embedding with empty length input (#3048 )	2024-01-08 18:08:36 -05:00
George Hotz	7ea2e0035b	move children for speed (#3047 ) * move children for speed * no children anymore	2024-01-08 15:02:32 -08:00
George Hotz	655c6f61d3	St real size (#3046 ) * track the size in the lazybuffer * shapetracker real size * lint	2024-01-08 14:44:53 -08:00
chenyu	1d730b8853	remove ACCUM_FP32 in simple_matmul.py (#3045 ) * remove ACCUM_FP32 in simple_matmul.py accumate for half inputs is always in float * move test llama compile speed to metal	2024-01-08 17:37:57 -05:00
George Hotz	47d67da830	track the size in the lazybuffer (#3044 )	2024-01-08 13:44:55 -08:00
George Hotz	c003be7309	Revert "track size in shapetracker" (#3043 ) * Revert "track size in shapetracker (#3026)" This reverts commit `a8ba1ac08f`. * st.size	2024-01-08 13:13:39 -08:00
George Hotz	50754f1494	add caches there (#3042 ) * add caches there * no curl	2024-01-08 13:02:16 -08:00
George Hotz	c5a941d466	webgl backend in extra (#3041 ) * WebGL WIP * 84% of ops passing test * tests passing 100% * Cleanup, refactor * Shave off some lines * Work on dtypes * TestOps at 100% again * Efficient net shaders compile in browser webgl2 * Compile all efficientnet shaders in browser * Create empty textures for tensor buffers * Run program. Up next weight loading * Exported WebGL model working * Add tests, refactor * Explicit cast alu for GLSL * Fix CI tests * WebGL efficientnet demo * Compile and run yolov8 in browser * Fix imports * Simplify yolo compile * Fix boolbool and cast cmplt to float More tests * Do std tests pass on CI? * Skip std tests on CI * Remove explicit_cast_alu hack, and solve it in code_for_op * Move to new dtype-less alloc api * Remove local size hack: optimize local_size only if device has local * Remove glsl.py, and move content to cstyle * dont_use_locals in opts * Fix dtype tests * type_map in CStyleLanguage * Make core changes smaller, cleaner, refactor export_model and demo * Skip pad_slice * Simplify: render_const, render_conditional * solve bool alu for other binops, cleaner ops_webgl * Fix noopt hack * Remove some skipIfs * WebGL image hack * type_names is a better name * global_max * Fix dtype import * Fix type_names -> type_map * Fix lint * Remove webgpu, back to 5k lines (#3040) * remove webgpu * max 5000 lines * revert those to master * retain that cstyle --------- Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>	2024-01-08 09:29:13 -08:00
George Hotz	8cbcd1b342	Remove webgpu, back to 5k lines (#3040 ) * remove webgpu * max 5000 lines	2024-01-08 09:10:07 -08:00

1 2 3 4 5 ...

3361 Commits