tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
George Hotz	2c6f2e899d	No extra vars call (#3054 ) * remove unused reciprocal * comment * remove unneeded call to vars * free speedup	2024-01-09 09:52:58 -08:00
Yixiang Gao	259bf9bffc	add multigpu test for RMSNorm (#3056 ) * need all gather * add two multigpu test scenarios for RMSNorm	2024-01-09 09:52:51 -08:00
chenyu	dab8214103	unit tests for Device.canonicalize (#3055 )	2024-01-09 12:47:20 -05:00
George Hotz	374f7659a7	remove unused reciprocal (#3053 ) * remove unused reciprocal * comment	2024-01-09 08:59:04 -08:00
Yixiang Gao	a686663657	make Embedding device aware for multigpu (#3051 ) * make Embedding device aware for multigpu * split line instead of igore because that's cheating * add test incomplete * add test complete * remove comment * fix white space * remove nn.Embedding	2024-01-08 20:09:26 -08:00
chenyu	19298e7a3f	Device._buffers -> Device._devices (#3052 ) backend devices used to be called buffers	2024-01-08 21:30:38 -05:00
chenyu	ee6a73826b	clean up test_nn.py (#3049 ) used Tensor.train decorator, reordered to always tinygrad instances first, and removed redundant idx cast	2024-01-08 18:45:03 -05:00
chenyu	3eb3664074	fix nn.Embedding with empty length input (#3048 )	2024-01-08 18:08:36 -05:00
George Hotz	7ea2e0035b	move children for speed (#3047 ) * move children for speed * no children anymore	2024-01-08 15:02:32 -08:00
George Hotz	655c6f61d3	St real size (#3046 ) * track the size in the lazybuffer * shapetracker real size * lint	2024-01-08 14:44:53 -08:00
George Hotz	c003be7309	Revert "track size in shapetracker" (#3043 ) * Revert "track size in shapetracker (#3026)" This reverts commit `a8ba1ac08f`. * st.size	2024-01-08 13:13:39 -08:00
George Hotz	c5a941d466	webgl backend in extra (#3041 ) * WebGL WIP * 84% of ops passing test * tests passing 100% * Cleanup, refactor * Shave off some lines * Work on dtypes * TestOps at 100% again * Efficient net shaders compile in browser webgl2 * Compile all efficientnet shaders in browser * Create empty textures for tensor buffers * Run program. Up next weight loading * Exported WebGL model working * Add tests, refactor * Explicit cast alu for GLSL * Fix CI tests * WebGL efficientnet demo * Compile and run yolov8 in browser * Fix imports * Simplify yolo compile * Fix boolbool and cast cmplt to float More tests * Do std tests pass on CI? * Skip std tests on CI * Remove explicit_cast_alu hack, and solve it in code_for_op * Move to new dtype-less alloc api * Remove local size hack: optimize local_size only if device has local * Remove glsl.py, and move content to cstyle * dont_use_locals in opts * Fix dtype tests * type_map in CStyleLanguage * Make core changes smaller, cleaner, refactor export_model and demo * Skip pad_slice * Simplify: render_const, render_conditional * solve bool alu for other binops, cleaner ops_webgl * Fix noopt hack * Remove some skipIfs * WebGL image hack * type_names is a better name * global_max * Fix dtype import * Fix type_names -> type_map * Fix lint * Remove webgpu, back to 5k lines (#3040) * remove webgpu * max 5000 lines * revert those to master * retain that cstyle --------- Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>	2024-01-08 09:29:13 -08:00
chenyu	ef5f545fd8	add more Tensor.clip test cases (#3034 ) * add more Tensor.clip test cases add cases for same low/high and both negative etc * case min > max	2024-01-07 13:08:59 -05:00
George Hotz	a8ba1ac08f	track size in shapetracker (#3026 ) * track size in shapetracker * shapetracker adapter * size is an int * create Buffer with st.size * only compare the views for the jit * fix webgpu	2024-01-05 20:15:53 -08:00
chenyu	138c17c094	enable argmax tests for METAL/WEBGPU in CI (#3027 ) not sure why it was skipped but works now in CI	2024-01-05 21:43:00 -05:00
George Hotz	2a2d3233d2	add test that the compiler isn't used (#3025 ) * add test that the compiler isn't used * one print_tree * improve speed with st size cache * switch to gpt-2	2024-01-05 17:24:01 -08:00
chenyu	520406cf3a	add Tensor.unflatten and Tensor.flatten(end_dim) (#3023 ) simplified cases when splitting a dim, or merge dims in predix	2024-01-05 17:55:29 -05:00
George Hotz	f432ec9c33	Bitcast hip fix + fix mixtral (#3022 ) * fix bitcast in hip * wrong dtype for precast, double COPY	2024-01-05 14:51:25 -08:00
George Hotz	60abc62a3f	fast hip read (#3014 ) * fast hip read * hip read faster * fix tests * to_mv * simplify * bump to 6k lines	2024-01-05 10:33:13 -08:00
chenyu	4465ef28c5	add test_softmax to test_ops (#3020 ) * add test_softmax to test_ops somehow it was not tested * too many buffers in softmax backward for WEBGPU	2024-01-05 11:19:49 -05:00
chenyu	f88506e630	move gpt2/llama sampling inside the model call (#3013 ) * move gpt2/llama sampling inside the model call * argmax uses one more kernel	2024-01-04 17:01:50 -05:00
Yixiang Gao	8a63f26a0f	make LR scheduler work with multigpu (#3011 ) * add a failing test for LR scheduler when using multigpu * fix calculation order and unnecessary tensor created for float * min_lr is no longer tensor	2024-01-04 12:10:56 -08:00
chenyu	91665ef143	rewrite MUL CAST SUM to CAST MULACC	2024-01-04 13:12:22 -05:00
chenyu	ab7dfd637b	use float for acc dtype for half tensor sum we previously only upcast uint and int, and half was using half for acc. change to acc in float for precision. but cast the result back to half to match torch/jax output dtype	2024-01-04 13:12:22 -05:00
geohotstan	57817028bb	removed redundant dtype hacks in onnx_ops (#2939 ) * updated most dtype hacks in onnx_ops * temporarily revert dequantizelinear change * I think this is right... * MORE FIXES WOOOO NEW DTYPE IS AWESOME * ok * oops missed a print * half -> float32 for CI * is npdtype * some more * fix if ordering * more clean ups * final cleanups * casting to half not allowed * k nvm * revert ArgMax change * only GPU * llvm begone * teeny tiny change * fix: attempt to add cast tests * try this * fix dequantizelinear * revert some stuff * tests pass pls * less lines in onnx_tests * oops missed string tensor tests * clean up * try: revert default behavior changes * fix: disabled Cast and Castlike tests * docs: small changes * fix: fixed isNaN op and enabled associated tests * fix: forgot about float16 * done * update disabled test * gah missed another float16 * disable rest of failing tests * rm extra line * try... --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-01-04 01:45:24 -05:00
chenyu	9f39165188	correct (dtype, device) in test_dtype.is_dtype_supported (#3007 ) corrected dtypes for TORCH and float64 support	2024-01-04 00:25:37 -05:00
chenyu	ae112c9dbe	fix some long lines in tests (#3006 ) * fix some long lines in tests * better	2024-01-03 23:53:33 -05:00
George Hotz	9699c8c90b	don't alloc for InterpretedASTRunner (#2999 )	2024-01-03 17:05:53 -08:00
chenyu	74cc6fd3c2	remove AndNode.__floordiv__ special case (#2996 ) * remove AndNode.__floordiv__ AndNode produces a Node that min/max is bounded by [0, 1] so `//` on top of that is almost always 0. we don't really use that either * keep the test	2024-01-03 17:44:55 -05:00
Yixiang Gao	5663dd46b6	Merge branch 'master' of github.com:tinygrad/tinygrad into cifar_fp16	2024-01-03 10:11:46 -08:00
chenyu	81b97cd2c6	canonicalize device in LazyBuffer constructor (#2991 ) fixed the multitensor +1 then sum bug	2024-01-03 12:55:25 -05:00
chenyu	db525cf8c2	multitensor failed test case with +1 then sum on DEVICE:0 (#2990 )	2024-01-03 12:17:11 -05:00
George Hotz	5dbaaa7061	hotfix: make multitensor shard contiguous	2024-01-03 08:48:30 -08:00
Yixiang Gao	84eb6dd32a	skip GPU cause opencl on intel can't compile half	2024-01-03 07:07:21 -08:00
Yixiang Gao	73879b50ad	only need to check the min_lr for the nan bug	2024-01-03 07:00:50 -08:00
Yixiang Gao	99f8740c60	running half in CI CPU is slow	2024-01-02 18:44:35 -08:00
Yixiang Gao	781690fd99	how long it takes on CI CPU without the lr scheduler	2024-01-02 18:33:48 -08:00
Yixiang Gao	dd00bcb9c0	fix whitespace	2024-01-02 18:16:33 -08:00
Yixiang Gao	841487cad9	add half test with using hyp from benchmarks	2024-01-02 18:14:30 -08:00
George Hotz	f494b9d463	simple multitensor API (#2903 ) * simple multitensor API * test multitensor * mt work * new api * copies * all but data parallel * allreduce there * works, but axis sharded * fix all mt tests * features/multi * work * backprop * fix tests * tests passing * mt progress * cleanups * less lines * tensor cleanup * save more lines * mypy passes * fix tests * skip for cuda too * bump download cache	2024-01-02 17:49:44 -08:00
chenyu	ff5399f053	move one last dtype test from test_helpers to test_dtype (#2975 )	2024-01-02 12:37:56 -05:00
Kevin Herro	bd6a0c90a0	add Tensor.split (#2750 ) * add Tensor.split (#2677) * fix mypy errors * add list support for Tensor.split * fix ruff comments * match tensor.split api * simplify split and test_split --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-01 22:09:04 -08:00
George Hotz	e7a432b479	search refactor (#2969 ) * minor search cleanup * now that saves lines * fix	2024-01-01 17:39:26 -08:00
chenyu	58d3d5030b	vars_from_ast -> LazyOp.vars (#2965 )	2024-01-01 18:12:38 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
chenyu	8291986959	Variable.sum -> Node.sum, Variable.ands -> Node.ands (#2961 )	2024-01-01 16:21:28 -05:00
chenyu	3d720b5761	move expand_idx, iter_idxs and expand_node from symbolic to linearizer (#2959 )	2024-01-01 14:41:21 -05:00
George Hotz	56f44bd10e	move the compiler cache to be global (#2957 ) * move the compiler cache to be global * remove non robust test * remove dead code	2024-01-01 10:59:56 -08:00
George Hotz	063f465604	simpler webgpu (#2956 ) * simpler webgpu * skip that test	2024-01-01 10:28:59 -08:00

1 2 3 4 5 ...

1253 Commits