tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 22:08:08 -05:00

Author	SHA1	Message	Date
George Hotz	98d01a059d	rename uopgraph to rewriter [pr] (#8682 )	2025-01-19 17:03:12 -08:00
chenyu	2d0842386d	fix parse_valid for float uop (#8681 ) x < c -> X <= c-1 only works for int	2025-01-19 18:15:49 -05:00
George Hotz	168c16646a	change create_schedule_with_vars api to big_sink [pr] (#8677 )	2025-01-19 13:30:26 -08:00
chenyu	beba490ba8	update mask in scaled_dot_product_attention (#8674 ) built is_causal mask with ones_like and start with boolean, and reversed the mask -inf order	2025-01-19 15:19:23 -05:00
chenyu	5842ee56c6	raise if attn_mask is set when is_causal=True in sdpa [pr] (#8675 ) matches torch, also fixed incorrect usage in tests	2025-01-19 12:55:04 -05:00
qazal	2faf8774fe	replace DEVICE of CONST after copy folding (#8673 )	2025-01-19 11:33:39 -05:00
qazal	d957a4f108	add tests for div buffer collapsing in the scheduler [pr] (#8671 ) * add tests for mul/div buffer collapsing in the scheduler [pr] * lint * merge with test_linearizer's version of this * 4*3	2025-01-18 14:15:29 -05:00
ignaciosica	d2234e308a	tf32 tc for nv and ptx (#8635 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-17 17:43:57 -08:00
nimlgen	5afb0a4a81	metal: fix transfer profiling (#8659 )	2025-01-17 23:47:01 +03:00
George Hotz	8609b880bd	hotfix: test_backward_sum	2025-01-17 10:25:02 -08:00
chenyu	f8cc971c3b	raise RuntimeError for uneven shards in Tensor.shard [pr] (#8656 )	2025-01-17 12:48:39 -05:00
mesozoic-egg	3506a7585f	upcast overflowed idx to int64 [pr] (#8268 ) * use full_shape to determine if index can potentially overflow * update comment * use shapetracker to check max index value * wip * lint * handle mask * upcast to int64 by st is noop on WGSL * fix comments * Handle negative overflow, intermediaries overflow, int64 support handle negative overflow handle symbolic wip handle intermediate values wip check if typemap support int64 lint comment * add invalid_dtype lint * Fix bug on checking mask overflow wip wip * Add more tests, need to resolve partial upcast test Valid_view_dup test valid op overflow refine test cases clean up cleanup wip refine tests lint * Upcast is handled by lower_load_store upcast as graph_rewrite to backtrack update test wip cleanup wip cleanup do upcast in lower_load_store lint * cleanup * do upcast within lower_load_store and mutate ctx * do upcast in get_idx and view revert lint * cleanup * Upcast in vec, const upcast to const test case 3 upcast on vector lint * simplify idx with symbolic in case of fake overflow test case4 test case 4 update test * test case4 is only for metal * try: upcast inside graph_rewrite instead of shapetracker wip * checking overflow can just be done directly on all views, with idxs * cleanup * REMOVE hard coded uop test for idx upcast * refactor cleanup refactor * do actual casting when necessary, instead of rewriting all idx hard code uop test new upcast * check dtype for int64 in webgpu * cleanup cleanup * cleanup * update tests cleanup comment cleanup cleanup * comment * comment * update comment update comment * refactor * typo * keep the scope to only upcasting * white space * Revert "white space" This reverts commit `314d7eb184`. * Revert "keep the scope to only upcasting" This reverts commit `1ef701dd85`. * sym folding is not necessary lint1 * fold symbolic lint * use symbolic simple when folding shapetracker idx * full sym folding is required after all... * Ops.CAST should retain the src min max * put rewrite to lowerer wip * start testing on higher level wip test higher level in test_tensor * find Ops.STORE in list instead of recursively * check dtype support when upcasting * remove invalid_dtype * lint * fix int64 support checks in upcast lint * skipif skipunless * revert fold to find test case * Revert "revert fold to find test case" This reverts commit `225bb6e801`. * test sym folding * handle ptx * wip * wip * delete hard coded uop test * lint fixes * wip * fix checking for None * lint * handle ptx * comment * dtype for overflow() * update skipIf skipUnless * assert in wgsl renderer for int64 wip * do folded_upcast in to_indexed_op, real_size uses views_to_indexed_ops * assert in lowerer for dtype support lint * Revert "assert in lowerer for dtype support" This reverts commit `8e9b1b79bf`. * assert dtype in kernel.py * Revert "assert dtype in kernel.py" This reverts commit `e29b9a9893`. * wip * assert in render * remove old assert * check dtype from rendere, assert in upcast wip * smaller arange for sym fold case * linearize directly * use expand directly * lint * lint * rename * no need to check dtype in device.py * trigger pr * remove dtype assert in upcast, make wgpu fail in render * use DType for type hint instead of dtypes * assert on KeyError in tests for webgpu backend int64 * use a tuple for src * test real kernel run wip * lint error * restore * fix real_size * update test example * resolve merge stuff --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>	2025-01-17 11:52:31 -05:00
qazal	23f0ff0ed8	add bitcast to multi [pr] (#8652 )	2025-01-17 03:17:19 -05:00
qazal	2b7db9b45d	delete unused cast/bitcast lines from ops.py [pr] (#8651 ) * move cast and bitcast out * more deletion of bitcast arg * fix test_bitcast_fuses * update tests * work	2025-01-17 03:04:18 -05:00
eliotgolding	0289fbb1c2	limit real_size to the size of first View of ShapeTracker (#8628 ) * fix real_size * add fuzzer; typing * spacing --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-16 16:27:39 -05:00
qazal	81a84aa85a	remove is_unrealized_unmasked_const [pr] (#8644 )	2025-01-16 05:27:47 -05:00
qazal	a1f70ce7d0	only use BUFFER_VIEW in disk [pr] (#8629 ) * only use BUFFER_VIEW in disk [pr] * delete can_view * BUFFER_VIEW op on DISK * remove that allow_buffer_view=False * notes * bitcast is a low-level op too * this passes on AMD and LLVM	2025-01-15 12:34:15 -05:00
qazal	6193e279d4	isolate simple failing test for subbuffer on CONST [pr] (#8630 ) * simple failing test for subbuffer on CONST [pr] * add view_supported_devices check	2025-01-15 05:45:03 -05:00
George Hotz	504ad08e73	hotfix: add test_example_matmul_same	2025-01-14 19:03:17 -08:00
George Hotz	f29d6f54b8	support multilb gradient [pr] (#8624 )	2025-01-14 18:33:33 -08:00
chenyu	0790d8059f	remove MultiLazyBuffer.from_sharded [pr] (#8620 ) it's eqivalent to taking the lazydata from Tensor.split, then copy to devices	2025-01-14 18:00:49 -05:00
George Hotz	c85737c200	assert to prepare for grad uop [pr] (#8280 ) * assert to prepare for grad uop [pr] * fix test_nn * fix most of test_tensor * few more tests * fix multi * uniform gradient * acc_dtype * any for multi * fix typing * fix assert, CAST_BEFORE_VIEW is still the issue * explict test for CAST_BEFORE_VIEW --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 13:26:56 -08:00
George Hotz	fdd46c9f28	delete view instant rule (#8616 ) * remove cast before view * greener * indexing * delete view instant rule * that passes too * openpilot too * ack * base on cast_before_view * add it as a rewrite rule * VIEW(DEVICE) is also fine * test_shard_memory depends on forced_realize removal * put that back, will go soon * UOp representations change once we don't instantly fold things * do not duplicate tests --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 16:15:13 -05:00
qazal	dddd4e5f9f	hotfix: remove duplicate TestTensorMutates [pr] (#8619 ) * hotfix: remove duplicate TestTensorMutates [pr] * imports	2025-01-14 16:03:17 -05:00
George Hotz	bfbe81df71	remove cast before view (#8613 ) * remove cast before view * greener * indexing * that passes too * openpilot too * ack --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-14 15:04:58 -05:00
chenyu	393eec3201	raise RuntimeError for uneven shard [pr] (#8593 ) no 7B llama on 6 GPUs skip 70B	2025-01-14 14:51:48 -05:00
chenyu	52e7003414	Revert "make kits19 dataset samples have small sizes (#8591 )" (#8610 ) This reverts commit `76a03e950a`.	2025-01-14 12:24:27 -05:00
Francis Lata	76a03e950a	make kits19 dataset samples have small sizes (#8591 )	2025-01-14 08:27:45 -08:00
qazal	5aab2806f0	rename to test_tensor_uop + use upats for asserting [pr] (#8604 ) * rename to test_tensor_uop + use upats for asserting [pr] * fix pr	2025-01-14 05:09:56 -05:00
qazal	863abc7140	scheduling graph_rewrite prereqs for BLOCK in ASSIGN (#8598 ) * remove the BUF_LIMIT assert * skip the base one * work * work * good error * ok comment * shorter check	2025-01-14 03:01:59 -05:00
chenyu	d443e91d82	remove custom splits in Tensor.shard [pr] (#8602 ) towards even split only	2025-01-13 21:29:13 -05:00
chenyu	c4e33048c6	test Tensor.clone has a different lazydata [pr] (#8600 )	2025-01-13 20:13:44 -05:00
qazal	ae2229d727	assert kernel buffer limit at compile time [pr] (#8595 ) * remove the BUF_LIMIT assert * skip the base one	2025-01-13 16:32:07 -05:00
geohotstan	4abe631b56	fix onnx mobilenetv2-7-quantized.onnx (#8574 ) * is 67% considered fixed? * move test up * share function * add qgemm too * make sure qgemm comes out as int * actually that note is not right * remove qgemm (I did it wrong) and add it later lol.	2025-01-13 09:25:06 -08:00
George Hotz	d19c1c7f03	bump 75 -> 73 for test failure	2025-01-13 09:18:38 -08:00
nimlgen	d224d0ed7f	nv: fix fault info (#8587 ) * nv: fix fault info * and emu for amd * skip if not mock	2025-01-13 14:38:43 +03:00
qazal	586e730d32	use UOp.st for kernel reduce axes (#8499 ) * use UOp.st for kernel reduce axes [pr] * do not return dict	2025-01-13 06:24:11 -05:00
qazal	7562cc0399	better test for reduce swizzle + don't use double dtype [pr] (#8586 ) * better test_permute_rewrite * use float32	2025-01-13 05:02:21 -05:00
George Hotz	4ac4c1415a	free intermediate buffers in the jit [pr] (#8581 ) * free intermediate buffers in the jit [pr] * intermediates_freed * deallocate if not allocated * self._first_run is simpler	2025-01-12 15:41:41 -08:00
George Hotz	d817dc10db	start on test rewrite map [pr] (#8432 ) * start on test rewrite map [pr] * chatgpt writes dumb tests * comment out failing * fix that test * fix gc issue * oh, frame 2 * remove uop mutability * map is only the map * simplier + more tests * test tiny passes * tests that need to pass * parent test passes * child test passes * remove uop mutability [pr] * test fixups * most tests pass * more tests pass * lil test fixups * them too * fix test * unneeded * err, that * fix test_hcq * fix test failures * fix that test * tensor universe * does this pass test * Revert "does this pass test" This reverts commit `ed516b3169`. * Revert "tensor universe" This reverts commit `c21301852a`. * test_mutate_add passes * this can pass * Revert "Merge remote-tracking branch 'origin/no_uop_mutability' into test_rewrite_map" This reverts commit `657822dcdc`, reversing changes made to `2a126c145b`. * Revert "test_mutate_add passes" This reverts commit `ab4fc4c78e`. * correct enough * remove test_rewrite_map_schedule.py * viz * uops are immutable --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-12 13:13:51 -05:00
qazal	cde18fddce	fix DEBUG=2 output for copy runners [pr] (#8579 ) * fix DEBUG=2 output for copy runners [pr] * itemsize is constant	2025-01-12 12:03:01 -05:00
eliotgolding	867004fbeb	use unravel in views_to_indexed_uops [pr] (#8560 ) * use unravel in shape * make process replay work * earlier View.minify() * fix * fix tests * mypy * get rid of early minify * fix * linter * clean and add test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-12 10:25:55 -05:00
nimlgen	38b5ac4d4a	mypy for mockgpu/cuda & dsp/run (#8575 )	2025-01-12 18:25:39 +03:00
qazal	ae241e96db	fix half4 on qcom and gpu (#8573 ) * add test_setitem_half * this fixes comma benchmark	2025-01-12 06:23:05 -05:00
qazal	cff1ee9038	add SINK folding from the tensor_map branch [pr] (#8562 ) * delete is_constant from the scheduler * add sink folding * always give BUFFER uops Buffers [pr] * spec for view, var (bind) and const * add test_buffer_only_after_realize * work * 3 lines * more work	2025-01-12 03:39:34 -05:00
qazal	87cbff3ac0	always give BUFFER uops Buffers [pr] (#8572 ) * always give BUFFER uops Buffers [pr] * add test_buffer_only_after_realize	2025-01-11 23:17:09 +02:00
qazal	79738d768c	do not require PYTHONPATH=. for process replay [pr] (#8567 )	2025-01-11 09:45:34 -05:00
qazal	a70d1bf439	move print_diff to process replay [pr] (#8566 ) * move print_diff to process replay [pr] * ruff rightfully complians	2025-01-11 09:28:45 -05:00
qazal	60503c8621	use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564 )	2025-01-11 06:03:48 -05:00
chenyu	d09897c2aa	allow double copy [pr] (#8559 ) fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark	2025-01-10 18:21:01 -05:00

... 23 24 25 26 27 ...

4433 Commits