tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 06:18:01 -05:00

Author	SHA1	Message	Date
qazal	f0d424ecdf	Tensor UOps can become a buffer or const after scheduling (#8698 ) * spec * work * update test_viewed_consts_do_not_realize * remove	2025-01-21 12:33:19 +02:00
qazal	e2008c98c3	allow symbolic shape in tensor const parents [pr] (#8699 )	2025-01-21 12:01:25 +02:00
nimlgen	2b239db5d2	temp() with usernames (#8697 )	2025-01-21 12:26:43 +03:00
qazal	66ac0087e8	more high level contiguous tests + scheduler deletions [pr] (#8695 ) * delete those * move the upat too * rename ops_folding to just sym * keep that	2025-01-21 01:52:58 +02:00
qazal	08eb1f1f56	simplify tensors before scheduling [pr] (#8580 ) * delete forced_realize * put that back * work * remove forced_realize * expectedFailures * contiguous(buffer) * multi * expectedFailures * cleaner create_subbuffer * more comments * remove that * note * realizes * work * one upat and image is back * remove * cleaner * fix test_complex_backward for now --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-01-20 23:42:42 +02:00
qazal	02ad450e22	add failing assert for gradient realization [pr] (#8692 )	2025-01-20 22:50:09 +02:00
qazal	b14c9848cc	small changes to make the tensor_map_simple diff cleaner [pr] (#8691 )	2025-01-20 22:25:59 +02:00
Sieds Lykles	1a15c0e89d	Move define_acc down an unrolled add chain (#8404 ) * Move define_acc down an unrolled add chain * Prevent possible infinite recursion * Add test * Fix typo in test * Move mulacc_unrolled to devoctorize + load_store_indexing pass * Add test for mulacc_unrolled by itself * undo formatter * import from ops, not rewriter * Add a const version --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-20 14:56:27 -05:00
geohotstan	dd82b4c913	make onnx runner a class (#8647 ) * this * clean up * more clean ups and improve debug msg * more correct training toggler * remove manual training toggling * change some variable names * actually just add the training toggle for LIMIT envvar too * more refinement * __call__ and OnnxRunner * fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later * ahhhh found another mistake * remove limit from __call__ --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-20 10:11:05 -08:00
George Hotz	46a8c5e1e5	delete forced_realize (#8615 ) * delete forced_realize * put that back * expectedFailures * cleaner create_subbuffer * more comments --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-20 09:40:36 -08:00
chenyu	679b1ad058	move softmax upcast to after subtracting max (#8684 ) * move softmax upcast to after subtracting max max can always be done in the same dtype without any numerical loss, so this is better when explicitly upcasting in softmax * skipUnless half	2025-01-20 12:16:32 -05:00
nimlgen	08ca871d77	am: remove pm block (#8688 ) * am: remove pm block * hm * oops	2025-01-20 18:05:22 +03:00
nimlgen	9d3c40601f	am: fast memory manager (#8654 ) * start * progress * fixes * smth * mini fixes * fix2 * ugh, need this for now * faster * cleanups * tiny linters * make mypy happier * test & free pts * ops * linter * cleanup vm * fix * remove map_from * tiny fixes * add test to ci	2025-01-20 16:58:22 +03:00
qazal	9e55495b4d	fold double contiguous [pr] (#8687 )	2025-01-20 14:38:33 +02:00
qazal	ed63ff2372	Remove contiguous on buffer (#8676 ) * remove contiguous on buffer * spec * make things that can't be images not images	2025-01-20 13:48:33 +02:00
qazal	3499a2c72d	start moving image things to rewrite rules (#8678 ) * start moving image things to rewrite rules [pr] * that too * as expected * fix * Revert "fix" This reverts commit `fd03c9464b`.	2025-01-20 13:34:29 +02:00
qazal	b1847d561f	smaller do_realize and some cleanups [pr] (#8685 ) * do_realize cleanups [pr] * cleanup assign * unwrap ShapeTracker as we expect it to exist	2025-01-20 12:47:01 +02:00
qazal	689bf68cfc	remove GroupOp.Meta [pr] (#8686 )	2025-01-20 12:24:19 +02:00
George Hotz	4198bce150	_apply_map_to_tensors [pr] (#8683 )	2025-01-19 17:56:04 -08:00
George Hotz	98d01a059d	rename uopgraph to rewriter [pr] (#8682 )	2025-01-19 17:03:12 -08:00
Ignacio Sica	f532c78889	minor space hotfix (#8679 )	2025-01-19 17:00:24 -08:00
chenyu	2d0842386d	fix parse_valid for float uop (#8681 ) x < c -> X <= c-1 only works for int	2025-01-19 18:15:49 -05:00
George Hotz	168c16646a	change create_schedule_with_vars api to big_sink [pr] (#8677 )	2025-01-19 13:30:26 -08:00
chenyu	beba490ba8	update mask in scaled_dot_product_attention (#8674 ) built is_causal mask with ones_like and start with boolean, and reversed the mask -inf order	2025-01-19 15:19:23 -05:00
chenyu	5842ee56c6	raise if attn_mask is set when is_causal=True in sdpa [pr] (#8675 ) matches torch, also fixed incorrect usage in tests	2025-01-19 12:55:04 -05:00
qazal	2faf8774fe	replace DEVICE of CONST after copy folding (#8673 )	2025-01-19 11:33:39 -05:00
qazal	d957a4f108	add tests for div buffer collapsing in the scheduler [pr] (#8671 ) * add tests for mul/div buffer collapsing in the scheduler [pr] * lint * merge with test_linearizer's version of this * 4*3	2025-01-18 14:15:29 -05:00
qazal	bd0fb14d70	hotfix: add ctx to VIZ rewrite (#8667 )	2025-01-18 07:58:47 -05:00
qazal	0ef85b52e6	init folding changes from the tensor_map branch [pr] (#8666 ) * init folding changes from the tensor_map branch [pr] * add ops_folding to the viz rewrite	2025-01-18 07:15:52 -05:00
qazal	5267a411e7	remove movementops in viz graph rewrite [pr] (#8665 )	2025-01-18 06:34:12 -05:00
ignaciosica	b49a04145e	fix for int plus minor cleanup (#8650 )	2025-01-17 22:30:39 -05:00
chenyu	c49e0fca60	GlobalCounters.reset() in sdxl step [pr] (#8664 )	2025-01-17 21:10:28 -05:00
ignaciosica	d2234e308a	tf32 tc for nv and ptx (#8635 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-17 17:43:57 -08:00
nimlgen	5afb0a4a81	metal: fix transfer profiling (#8659 )	2025-01-17 23:47:01 +03:00
George Hotz	0d7bd4f389	empty graph rewrite to VIZ tensor graph [pr] (#8658 ) * empty graph rewrite to VIZ tensor graph [pr] * fix lint	2025-01-17 11:29:33 -08:00
George Hotz	8609b880bd	hotfix: test_backward_sum	2025-01-17 10:25:02 -08:00
chenyu	f8cc971c3b	raise RuntimeError for uneven shards in Tensor.shard [pr] (#8656 )	2025-01-17 12:48:39 -05:00
mesozoic-egg	3506a7585f	upcast overflowed idx to int64 [pr] (#8268 ) * use full_shape to determine if index can potentially overflow * update comment * use shapetracker to check max index value * wip * lint * handle mask * upcast to int64 by st is noop on WGSL * fix comments * Handle negative overflow, intermediaries overflow, int64 support handle negative overflow handle symbolic wip handle intermediate values wip check if typemap support int64 lint comment * add invalid_dtype lint * Fix bug on checking mask overflow wip wip * Add more tests, need to resolve partial upcast test Valid_view_dup test valid op overflow refine test cases clean up cleanup wip refine tests lint * Upcast is handled by lower_load_store upcast as graph_rewrite to backtrack update test wip cleanup wip cleanup do upcast in lower_load_store lint * cleanup * do upcast within lower_load_store and mutate ctx * do upcast in get_idx and view revert lint * cleanup * Upcast in vec, const upcast to const test case 3 upcast on vector lint * simplify idx with symbolic in case of fake overflow test case4 test case 4 update test * test case4 is only for metal * try: upcast inside graph_rewrite instead of shapetracker wip * checking overflow can just be done directly on all views, with idxs * cleanup * REMOVE hard coded uop test for idx upcast * refactor cleanup refactor * do actual casting when necessary, instead of rewriting all idx hard code uop test new upcast * check dtype for int64 in webgpu * cleanup cleanup * cleanup * update tests cleanup comment cleanup cleanup * comment * comment * update comment update comment * refactor * typo * keep the scope to only upcasting * white space * Revert "white space" This reverts commit `314d7eb184`. * Revert "keep the scope to only upcasting" This reverts commit `1ef701dd85`. * sym folding is not necessary lint1 * fold symbolic lint * use symbolic simple when folding shapetracker idx * full sym folding is required after all... * Ops.CAST should retain the src min max * put rewrite to lowerer wip * start testing on higher level wip test higher level in test_tensor * find Ops.STORE in list instead of recursively * check dtype support when upcasting * remove invalid_dtype * lint * fix int64 support checks in upcast lint * skipif skipunless * revert fold to find test case * Revert "revert fold to find test case" This reverts commit `225bb6e801`. * test sym folding * handle ptx * wip * wip * delete hard coded uop test * lint fixes * wip * fix checking for None * lint * handle ptx * comment * dtype for overflow() * update skipIf skipUnless * assert in wgsl renderer for int64 wip * do folded_upcast in to_indexed_op, real_size uses views_to_indexed_ops * assert in lowerer for dtype support lint * Revert "assert in lowerer for dtype support" This reverts commit `8e9b1b79bf`. * assert dtype in kernel.py * Revert "assert dtype in kernel.py" This reverts commit `e29b9a9893`. * wip * assert in render * remove old assert * check dtype from rendere, assert in upcast wip * smaller arange for sym fold case * linearize directly * use expand directly * lint * lint * rename * no need to check dtype in device.py * trigger pr * remove dtype assert in upcast, make wgpu fail in render * use DType for type hint instead of dtypes * assert on KeyError in tests for webgpu backend int64 * use a tuple for src * test real kernel run wip * lint error * restore * fix real_size * update test example * resolve merge stuff --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>	2025-01-17 11:52:31 -05:00
qazal	23f0ff0ed8	add bitcast to multi [pr] (#8652 )	2025-01-17 03:17:19 -05:00
qazal	2b7db9b45d	delete unused cast/bitcast lines from ops.py [pr] (#8651 ) * move cast and bitcast out * more deletion of bitcast arg * fix test_bitcast_fuses * update tests * work	2025-01-17 03:04:18 -05:00
Mike Ashcroft	4f0d1b4759	Disable graphs by default if using an intel macbook (#8648 ) (#8649 )	2025-01-16 18:24:56 -08:00
eliotgolding	0289fbb1c2	limit real_size to the size of first View of ShapeTracker (#8628 ) * fix real_size * add fuzzer; typing * spacing --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-16 16:27:39 -05:00
nimlgen	f91ca508cf	am: bind for sdma (#8633 ) * am: bind for sdma * fix	2025-01-16 15:22:27 +03:00
nimlgen	f671da6755	ci: add AM start time to benchmark (#8637 ) * ci: add AM start time to benchmark * am: unlock it * add AMD * revert this	2025-01-16 14:47:36 +03:00
qazal	81a84aa85a	remove is_unrealized_unmasked_const [pr] (#8644 )	2025-01-16 05:27:47 -05:00
uuuvn	00e5979897	Use full soname for libgcc_s in CPUProgram (#8642 ) Number after .so is abi version, it is always 1 for libgcc_s. Most linux systems set default library versions via symlinks that are simply followed to get actual elf, however conda does it via linker scripts which ctypes doesn't follow (below contents of libgcc_s.so): ``` /* GNU ld script Use the shared library, but some functions are only in the static library. */ GROUP ( libgcc_s.so.1 -lgcc ) ``` ctypes.util.find_library thinks that this is the actual elf and ctypes.CDLL just loads this text file as a shared library. The result is: ``` File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__ self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header ```	2025-01-16 12:56:52 +03:00
qazal	611208cd8a	Revert "Revert "move subbuffer to a rewrite rule in the scheduler (#8639 )" (…" (#8643 ) This reverts commit `82ef956cb8`.	2025-01-16 04:30:11 -05:00
qazal	82ef956cb8	Revert "move subbuffer to a rewrite rule in the scheduler (#8639 )" (#8641 ) This reverts commit `d5c90da286`.	2025-01-16 03:29:07 -05:00
qazal	d5c90da286	move subbuffer to a rewrite rule in the scheduler (#8639 ) * delete buffer_view from tensor * add to the scheduler * move buffer_view to the scheduler * gradient doesn't care. * for/with	2025-01-16 03:14:28 +02:00
nimlgen	b3efeeb717	docs: start am docs (#8638 ) * docs: init am docs * missing	2025-01-16 00:22:35 +03:00

1 2 3 4 5 ...

7581 Commits