tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-12 23:54:58 -05:00

Author	SHA1	Message	Date
qazal	cd77e51810	fix tensor realization bug in #8975 (#8984 ) * fix tensor realization bug in #8975 * that's a reshape now * work * works * give those tests better names * test when multiple mops result in the same ShapeTracker * test_become_existing_buf_complex is enough * that too	2025-02-10 13:51:30 +01:00
qazal	fd9f9ec772	realized base tensors become RESHAPE(BUFFER) [pr] (#8994 )	2025-02-10 10:17:54 +01:00
qazal	7eba5fb413	Tensor.empty is RESHAPE(BUFFER) (#8987 ) * empty is RESHAPE(BUFFER) * eh * add test_empty_buf * can we unsupport this * linter * Revert "can we unsupport this" This reverts commit `0f71e1aadb`.	2025-02-09 18:42:51 +01:00
qazal	55351ebb31	minimal failing test for #8975 [pr] (#8982 )	2025-02-09 14:10:37 +01:00
chenyu	cfd28517df	move pow folding tests to test_schedule [pr] (#8955 ) not really belongs to test_const_folding	2025-02-07 12:51:43 -05:00
chenyu	488200f16c	move more pow const to rewrite (#8916 ) * move more pow const to rewrite one less use of _to_const_val * fix	2025-02-05 20:30:12 -05:00
qazal	af4f9d1aa9	use matchers to verify AST shape [pr] (#8828 ) * use matchers to verify kernel AST [pr] * work * use swizzle_cnt * add comment * imports * modified_ast comment * brief	2025-01-31 09:17:42 +02:00
George Hotz	643c09a6c6	tensor uop spec should be in spec.py [pr] (#8827 ) * tensor uop spec should be in spec.py [pr] * err, spec.py * print uops can stay	2025-01-31 13:54:04 +08:00
qazal	a78f0f85d3	remove support for checking tensor uops in FUSE_ARANGE [pr] (#8829 )	2025-01-31 07:48:28 +02:00
qazal	1fce864a6d	delete multi output support (#8822 ) * delete multioutput for now * test_schedule * test_assign too * linter * 515 for sd * update tests and ctx * update that assign check	2025-01-30 22:45:50 -05:00
qazal	530961f7d5	realized only exists on base (#8815 ) * realized only exists on base [pr] * shorter * update that too	2025-01-30 23:02:25 +02:00
qazal	5643429c17	give BUFFER UOp a ShapeTracker [pr] (#8811 ) * give BUFFER UOp a ShapeTracker [pr] * move that * update contiguous * test_advancedindex should use movement ops	2025-01-30 22:33:32 +02:00
qazal	ba17786068	do not construct unmasked VALID (#8759 ) * new lines that exist in codegen/ops * update tests * update sops.gz (13071 -> 13070 asts) * fix viz too * remove that TODO * diff pruning * mask assert + device * work * diff pruning * re: fix viz too --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-28 20:51:21 +02:00
qazal	3417bc1814	fix ShapeTracker spec for const [pr] (#8791 )	2025-01-28 19:53:36 +02:00
George Hotz	96bff0b4f7	contiguous is no longer needed in SGD [pr] (#8760 ) * contiguous is no longer needed in SGD [pr] * add allow condition	2025-01-27 15:19:11 +09:00
qazal	ac70f63d4b	tensor_map cleanups [pr] (#8754 ) * tensor_map cleanups [pr] * update test_schedule too	2025-01-26 11:41:54 +02:00
George Hotz	b4bf6a7dea	switch backward to use gradient [pr] (#8235 ) * switch backward to use gradient [pr] * set device correctly, dedup * why does that fail? * add noop cast * simple backward * fix beautiful_mnist * touchups * set in compute_gradient * uop_count * uop_count was wrong * collections * no note * skip that test * update sched kernel counts * train mnist is 65 * fix metadata and gc * fixes * materialize_grads * no pathlib stuff * add contiguous_backward, fix bugs * add some realize * fix multi	2025-01-26 09:12:16 +09:00
qazal	8e5bd0cd7a	fix buffer init and skip test_swizzle_failure_permute [pr] (#8732 ) * fix buffer init and skip test_swizzle_failure_permute [pr] * replace preload with just load * add	2025-01-23 17:21:38 +02:00
qazal	07ec99001a	keep VIEW in big_sink + copy of buffer view spec [pr] (#8727 ) * keep views in sink [pr] * tests * things from the gpt2 bug	2025-01-23 11:29:30 +02:00
qazal	e3d1464ba4	move assign preload out of schedule item [pr] (#8710 ) * move assign preload out of schedule item [pr] * fix that	2025-01-22 12:43:57 +02:00
qazal	d6bf1feaab	remove the "no copy" line from copy_to_device (#8702 ) * delete the no copy one * add tests	2025-01-21 17:09:33 +02:00
qazal	f0d424ecdf	Tensor UOps can become a buffer or const after scheduling (#8698 ) * spec * work * update test_viewed_consts_do_not_realize * remove	2025-01-21 12:33:19 +02:00
qazal	e2008c98c3	allow symbolic shape in tensor const parents [pr] (#8699 )	2025-01-21 12:01:25 +02:00
qazal	66ac0087e8	more high level contiguous tests + scheduler deletions [pr] (#8695 ) * delete those * move the upat too * rename ops_folding to just sym * keep that	2025-01-21 01:52:58 +02:00
qazal	08eb1f1f56	simplify tensors before scheduling [pr] (#8580 ) * delete forced_realize * put that back * work * remove forced_realize * expectedFailures * contiguous(buffer) * multi * expectedFailures * cleaner create_subbuffer * more comments * remove that * note * realizes * work * one upat and image is back * remove * cleaner * fix test_complex_backward for now --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-01-20 23:42:42 +02:00
chenyu	679b1ad058	move softmax upcast to after subtracting max (#8684 ) * move softmax upcast to after subtracting max max can always be done in the same dtype without any numerical loss, so this is better when explicitly upcasting in softmax * skipUnless half	2025-01-20 12:16:32 -05:00
qazal	9e55495b4d	fold double contiguous [pr] (#8687 )	2025-01-20 14:38:33 +02:00
qazal	ed63ff2372	Remove contiguous on buffer (#8676 ) * remove contiguous on buffer * spec * make things that can't be images not images	2025-01-20 13:48:33 +02:00
George Hotz	168c16646a	change create_schedule_with_vars api to big_sink [pr] (#8677 )	2025-01-19 13:30:26 -08:00
chenyu	beba490ba8	update mask in scaled_dot_product_attention (#8674 ) built is_causal mask with ones_like and start with boolean, and reversed the mask -inf order	2025-01-19 15:19:23 -05:00
chenyu	5842ee56c6	raise if attn_mask is set when is_causal=True in sdpa [pr] (#8675 ) matches torch, also fixed incorrect usage in tests	2025-01-19 12:55:04 -05:00
qazal	2faf8774fe	replace DEVICE of CONST after copy folding (#8673 )	2025-01-19 11:33:39 -05:00
qazal	d957a4f108	add tests for div buffer collapsing in the scheduler [pr] (#8671 ) * add tests for mul/div buffer collapsing in the scheduler [pr] * lint * merge with test_linearizer's version of this * 4*3	2025-01-18 14:15:29 -05:00
qazal	2b7db9b45d	delete unused cast/bitcast lines from ops.py [pr] (#8651 ) * move cast and bitcast out * more deletion of bitcast arg * fix test_bitcast_fuses * update tests * work	2025-01-17 03:04:18 -05:00
qazal	81a84aa85a	remove is_unrealized_unmasked_const [pr] (#8644 )	2025-01-16 05:27:47 -05:00
qazal	a1f70ce7d0	only use BUFFER_VIEW in disk [pr] (#8629 ) * only use BUFFER_VIEW in disk [pr] * delete can_view * BUFFER_VIEW op on DISK * remove that allow_buffer_view=False * notes * bitcast is a low-level op too * this passes on AMD and LLVM	2025-01-15 12:34:15 -05:00
George Hotz	504ad08e73	hotfix: add test_example_matmul_same	2025-01-14 19:03:17 -08:00
George Hotz	bfbe81df71	remove cast before view (#8613 ) * remove cast before view * greener * indexing * that passes too * openpilot too * ack --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-14 15:04:58 -05:00
qazal	ae2229d727	assert kernel buffer limit at compile time [pr] (#8595 ) * remove the BUF_LIMIT assert * skip the base one	2025-01-13 16:32:07 -05:00
qazal	7562cc0399	better test for reduce swizzle + don't use double dtype [pr] (#8586 ) * better test_permute_rewrite * use float32	2025-01-13 05:02:21 -05:00
qazal	cff1ee9038	add SINK folding from the tensor_map branch [pr] (#8562 ) * delete is_constant from the scheduler * add sink folding * always give BUFFER uops Buffers [pr] * spec for view, var (bind) and const * add test_buffer_only_after_realize * work * 3 lines * more work	2025-01-12 03:39:34 -05:00
qazal	87cbff3ac0	always give BUFFER uops Buffers [pr] (#8572 ) * always give BUFFER uops Buffers [pr] * add test_buffer_only_after_realize	2025-01-11 23:17:09 +02:00
chenyu	d09897c2aa	allow double copy [pr] (#8559 ) fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark	2025-01-10 18:21:01 -05:00
qazal	2fd068ffc0	delete empty op (#8544 ) * simple delete EMPTY op * there's no schedule for empty	2025-01-09 14:10:15 -05:00
qazal	f6eb0574f2	start tests for putting the tensor graph in a single kernel [pr] (#8542 ) * start tests for putting the tensor graph in a single kernel [pr] * parallel actually * better view_left test * test a softmax * put all that in sym	2025-01-09 13:33:21 -05:00
qazal	947de23cac	add VIEW(DEVICE) to tensor variable [pr] (#8529 ) * add VIEW(DEVICE) to tensor variable [pr] * bind 2 * restrict shapetracker * move var and bind closer * one less line	2025-01-08 01:39:42 -05:00
qazal	b22494b710	restrict tensor const ShapeTracker in spec [pr] (#8447 ) * restrict tensor const ShapeTracker in spec [pr] * pass sink srcs * reject if any of the specs disagree * deceive mypy * viz * default to float * just check the view * create_schedule is gone * test_verify_arg is flaky	2025-01-07 19:05:11 -05:00
qazal	0e97f807e0	test fixup prereqs for delete_buffer_view [pr] (#8523 )	2025-01-07 11:52:18 +02:00
qazal	ed618a72e7	do not use subbuffer for bitcast (#8514 ) * do not use subbuffer for bitcast * edit that test * explicit test for ptx * ptx	2025-01-06 18:40:46 +02:00
qazal	ed121d235c	spec for CAST_BEFORE_VIEW=1 [pr] (#8512 )	2025-01-06 10:43:58 +02:00

1 2 3 4 5 ...

280 Commits