tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-15 00:55:11 -05:00

Author	SHA1	Message	Date
David Hou	f513c37e64	support same uidx in multiple shape positions (#3205 ) * support same uidx in multiple shape positions * rename var * update comment * add contiguous index check to global_store too * update comment * small change * is this better? * smh * smaller change? * get rid of more changes * get rid of more changes * is this even making anything better * comment * fix test --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-21 19:37:03 +01:00
chenyu	86efdf0b34	remove create_rednode (#3444 ) handle Node collapsing into NumNode similar to OpNode	2024-02-18 21:08:19 -05:00
qazal	e1a57fe58a	test the behavior, not the implementation (#3419 )	2024-02-15 17:23:42 +01:00
qazal	7919a1e6ec	dtypes: delete the float cast in realize.py (#3401 ) * remove float cast * cast scalars to the correct value in creation time * cast scalar in the correct place * wrong, use y_dtype * make consts have a unique cache key * add cast_scalar back * test_load_cache_const_bufs * add bool dtype * test_const_dtype * fix linters	2024-02-15 14:20:30 +01:00
Francis Lam	668324d92b	wmma: protect TC locals from modification and use only LOCAL (#3379 ) also remove unnecesssary upcast_dim from tensor_core and calculate it from the dimensions and thread sizes	2024-02-13 10:19:35 +01:00
George Hotz	2e60012bcf	move create schedule and delete old API (#3377 ) * move create schedule and delete old API * fix test multitensor	2024-02-12 18:10:45 +01:00
George Hotz	41efaa848c	move graph.py and jit.py into features (#3376 ) * move graph.py into features * move jit into features * fix quickstart	2024-02-12 17:34:34 +01:00
qazal	c8fd66a131	Run RDNA3 tensor core tests in CI (#3367 ) * add test_linearizer * skip test_padto_matmul	2024-02-11 19:54:06 -05:00
Francis Lam	ddb22a60c8	linearizer: fix up edge case bugs in UNROLL opt (#3362 ) Fully UNROLLing the first_reduce should not change the number of local_dims. Fully UNROLLing a GROUP dim should reduce the number of group_for_reduces by one. Also changed group_for_reduces to be a count as the axis number isn't used anywhere (they are always the first reduce dims).	2024-02-10 11:49:25 +01:00
Francis Lam	ce21fdfb67	ops_python: add HIP tensor core mock and refactor METAL (#3354 ) * ops_python: add HIP tensor core mock and refactor METAL * Add tests to CI * add DEBUG=2 to full tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-09 12:46:06 +01:00
George Hotz	c32ea95d7d	Python uop emulator (#3327 ) * start uop emu * tiny_add passes * more ops * emulate the whole warp * test_gemm passes * metal gemm test pass * works on big gemm * works on big gemm * more tests pass * touch ups * fix mypy * cleanups * exp2 mypy * arch is where it belongs * actually emulate tensor cores * fix test * new style	2024-02-08 19:24:55 +01:00
Francis Lam	2266152b28	linearizer: added FUZZ_BEAM to fuzz_linearizer and additional tests (#3340 ) Fixed test_tensor_core_opts to test all the TCs. Added commented out failing tests in test_color_shapes_with_local.	2024-02-08 16:12:58 +01:00
David Hou	aebaab011f	faster wino compile by catting consts across data expand dim (#3293 ) * PoC faster wino compile by catting consts across data expand dim * fix fusions * faster + golf it * noqa 501 * implicit broadcast * Revert "implicit broadcast" This reverts commit 5915a9083d045ec1e6be84dcb492333325d48666. * shorter * shorter * oops * 216 upcasts is probably fine * wino kernel count test * test winograd number of sts * specify device for apply_matrix mat elements	2024-02-02 03:47:45 -05:00
Francis Lam	927f2dd24d	wmma: add HIP FP16 to FP16 tensor core (#3287 ) * wmma: add HIP FP16 to FP16 tensor core * test: fix test_tensor_core to use separate tolerances for half	2024-01-31 23:00:51 -05:00
Francis Lam	861d5ac224	wmma: fix the upcasts after WMMA to be hcopt ordering invariant (#3250 ) will correctly handle and permutation of optops after the TC one	2024-01-29 11:51:57 -08:00
George Hotz	9e17378b60	Fix metal tests (#3266 ) * small fixes for tests on mac * remove device from TensorCore	2024-01-27 18:09:42 -08:00
George Hotz	3c728d1082	compiler support (#3260 ) * compiler support * revert that * fix tests	2024-01-26 23:36:40 -08:00
Francis Lam	4273aabe31	extra/gemm: add a simple_conv.py along with correctness check (#3236 ) * extra/gemm: add a simple_conv.py along with correctness check The goal is to easily test tensor core triggering situations * test: add tests for acc_dtype handling and fixed typing	2024-01-26 19:06:57 -08:00
Francis Lam	595d05a250	test: fix test_linearizer to use the correct tc_dims (#3218 ) also re-enable the test_tensor_core_opts	2024-01-23 16:07:31 -05:00
David Hou	3378625773	name upcast variables (#3200 ) * name upcast variables * typing * unused	2024-01-22 11:37:28 -05:00
chenyu	e52a609240	make WINO a context var, and LATEWINO in hlb_cifar (#3161 )	2024-01-17 20:21:26 -05:00
George Hotz	1f9aee8b6f	remove numpy from device (#3123 ) * remove numpy from device * fix tests * np item * cleanups * simplify with as_buffer * no toCPU * tinygradic * cast to scalar	2024-01-14 19:36:05 -08:00
Francis Lam	ddbdb52f77	wmma: enable METAL half tensor cores and clean up cstyle (#3095 ) * wmma: enable METAL half tensor cores and clean up cstyle * revert simple_matmul rand changes and break line in tensor * added metal fp16->fp32 tensor core	2024-01-12 16:25:28 -05:00
George Hotz	c003be7309	Revert "track size in shapetracker" (#3043 ) * Revert "track size in shapetracker (#3026)" This reverts commit `a8ba1ac08f`. * st.size	2024-01-08 13:13:39 -08:00
George Hotz	a8ba1ac08f	track size in shapetracker (#3026 ) * track size in shapetracker * shapetracker adapter * size is an int * create Buffer with st.size * only compare the views for the jit * fix webgpu	2024-01-05 20:15:53 -08:00
chenyu	91665ef143	rewrite MUL CAST SUM to CAST MULACC	2024-01-04 13:12:22 -05:00
chenyu	ab7dfd637b	use float for acc dtype for half tensor sum we previously only upcast uint and int, and half was using half for acc. change to acc in float for precision. but cast the result back to half to match torch/jax output dtype	2024-01-04 13:12:22 -05:00
chenyu	ae112c9dbe	fix some long lines in tests (#3006 ) * fix some long lines in tests * better	2024-01-03 23:53:33 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
chenyu	8291986959	Variable.sum -> Node.sum, Variable.ands -> Node.ands (#2961 )	2024-01-01 16:21:28 -05:00
chenyu	3d720b5761	move expand_idx, iter_idxs and expand_node from symbolic to linearizer (#2959 )	2024-01-01 14:41:21 -05:00
chenyu	50f2e31d26	cleanup float4 grouping in global_load and global_store (#2942 ) * cleanup float4 grouping in global_load and global_store * fix test decorator	2023-12-27 14:10:04 -05:00
chenyu	820f2e054e	fix PADTO optimization (#2935 ) the correct condition is that PADTO cannot be applied to reduce axis, not Reduce.MAX in ops. even for Reduce.SUM it's possible that the reduce axis had a div before, and the padded 0 became inf then sum over it is incorrect.	2023-12-25 22:52:49 -05:00
chenyu	50927defad	s/lazydata.realized/lazydata.base.realized/g (#2914 ) * s/lazydata.realized/lazydata.base.realized/g * not that	2023-12-22 14:45:13 -05:00
George Hotz	1765849937	new lazy, benchmark (#2878 ) * lazy rewrite, try 2 * min fix tests * pass contig test * put broken pads back * move that to realize * no contig child fixes array packing * so wrong * now that's correct * base children * fix bind issues * disable to_image_idx * fix tests * that failure shouldn't break other tests * more fixes * fix torch * skip failing tests in CI * 1e-7 * half is broken * 1e-6 margin of error	2023-12-20 14:33:21 -08:00
George Hotz	8fe24038d8	Revert "mulacc fusion cleanup (#2871 )" (#2876 ) This reverts commit `863c5b26ed`.	2023-12-20 13:26:25 -08:00
qazal	863c5b26ed	mulacc fusion cleanup (#2871 ) * add mulacc fusion tests * cleanup the implementation * fix indent in the test utility * less verbose	2023-12-20 15:39:54 -05:00
qazal	5f07ef455e	update dtypes (#2872 )	2023-12-20 15:04:02 -05:00
George Hotz	90fb09b55c	remove unused _device_extra_args	2023-12-18 22:14:58 -08:00
chenyu	e4bbbc5bc3	Revert "Use the reduceop dtype to define the acc in linearizer (#2625 )" (#2783 ) This reverts commit `f3ed96a929`.	2023-12-15 16:29:10 -05:00
qazal	f3ed96a929	Use the reduceop dtype to define the acc in linearizer (#2625 ) * upcast the other way * Revert "upcast the other way" This reverts commit `355692ba79`. * remove uop cast, this should have never been there * add regression test * now fuzz it correct test * the accumulator is always the output type lint * fuzz all reduce ops * MULACC upcast_dtype could be half too opencl supports it https://man.opencl.org/mad.html * cast to the same dtype is a noop * internal casting support for MULACC * fuzz test mulacc internal casting * get_reduce_dtype handle vectorized acc update get_reduce_acc calls with the correct dtype update tests * pending _complete_ implementation of a function that gets the dtype based on self.reduceop +more failing tests * get_reduce_dtype try 2 add TODO * get_lazyop_info already does it * cleanup * bring back internal casting support for mulacc * use the scalar version of the acc dtype * conceptual diff cleanup * one extra line to a cleaner linearizer * correct test assumptions - these should promote? * rm mulacc cast, the cast of vins happens with the acc dtype promotion linearizer hacks * Revert "rm mulacc cast, the cast of vins happens with the acc dtype promotion" This reverts commit `afdd540733`. Revert "correct test assumptions - these should promote?" This reverts commit `49ae2206ed`. * skip tests blocked by MULACC->lazyop cleanup * final changes to add back internal casting for MULACC and update skip test logic, upcast works but downcast does not * only test the linearizer abstraction layer we wanna ensure that linearizer matches whatever lazy is returning * remove unused hypothesis module * remove mulacc related changes, those will move to the lazy pr * remove midcast test * move to helpers * Revert "remove midcast test" This reverts commit `86e74d7960`. add TODO with skip --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-15 16:14:32 -05:00
qazal	3cf4376ce2	test_linearizer cleanup (#2766 ) * test_linearizer cleanup * use unittest.skipIf * update msg	2023-12-14 17:20:09 -05:00
qazal	746cb5de21	Test coverage for matvec (#2762 ) * add test coverage for matvec * skip devices that don't support locals	2023-12-14 11:34:56 -05:00
George Hotz	6d6eb9302d	ruff checks the max line length is 150 (#2734 ) * ruff checks the max line length is 150 * fix tensor.py * a lot more * done	2023-12-12 17:34:47 -08:00
Ahmed Harmouche	4b01839774	support vals on WebGPU, run more tests (#2668 ) * Vals on webgpu, run more tests * Skip slow tests, run symbolic ops tests * Balance out tests	2023-12-07 16:45:21 -08:00
George Hotz	2c363b5f0b	new style device (#2530 ) * cpu tests pass * torch works * works * metal works * fix ops_disk * metal jit works * fix openpilot * llvm and clang work * fix webgpu * docs are rly broken * LRU works on metal * delete comment * revert name to ._buf. LRU only on Compiled * changes * allocator * allocator, getting closer * lru alloc * LRUAllocator * all pass * metal * cuda * test examples * linearizer * test fixes * fix custom + clean realize * fix hip * skip tests * fix tests * fix size=0 * fix MOCKHIP * fix thneed * copy better * simple * old style metal copy * fix thneed * np reshape * give cuda a device	2023-11-30 17:07:16 -08:00
George Hotz	5629fc368c	Use Buffer.STORE at the end of ASTs (#2494 ) * work * store broken * interpreteds work * this passes * symbolic cpu * fix tests * fix opt tests * images fail * fix InterpretedFlopCounter * stupid hack for images	2023-11-28 20:11:37 -08:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
qazal	262cd26d28	Simplify openpilot kernel (#2460 ) * a conditional with the same results either way is a noop * add unit test	2023-11-27 10:02:27 -08:00
George Hotz	0505c5ea50	remove force_wait, refactor to graph (#2405 ) * remove force_wait * refactor * get rid of stupid ASTRunner * fix del in diskbuffer * BufferOps.FROM_UNDERLYING * put offset in the rawbuffer * fix bugs * use exec	2023-11-23 12:46:07 -08:00

... 2 3 4 5 6

283 Commits