tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
George Hotz	0d7bd4f389	empty graph rewrite to VIZ tensor graph [pr] (#8658 ) * empty graph rewrite to VIZ tensor graph [pr] * fix lint	2025-01-17 11:29:33 -08:00
George Hotz	8609b880bd	hotfix: test_backward_sum	2025-01-17 10:25:02 -08:00
chenyu	f8cc971c3b	raise RuntimeError for uneven shards in Tensor.shard [pr] (#8656 )	2025-01-17 12:48:39 -05:00
mesozoic-egg	3506a7585f	upcast overflowed idx to int64 [pr] (#8268 ) * use full_shape to determine if index can potentially overflow * update comment * use shapetracker to check max index value * wip * lint * handle mask * upcast to int64 by st is noop on WGSL * fix comments * Handle negative overflow, intermediaries overflow, int64 support handle negative overflow handle symbolic wip handle intermediate values wip check if typemap support int64 lint comment * add invalid_dtype lint * Fix bug on checking mask overflow wip wip * Add more tests, need to resolve partial upcast test Valid_view_dup test valid op overflow refine test cases clean up cleanup wip refine tests lint * Upcast is handled by lower_load_store upcast as graph_rewrite to backtrack update test wip cleanup wip cleanup do upcast in lower_load_store lint * cleanup * do upcast within lower_load_store and mutate ctx * do upcast in get_idx and view revert lint * cleanup * Upcast in vec, const upcast to const test case 3 upcast on vector lint * simplify idx with symbolic in case of fake overflow test case4 test case 4 update test * test case4 is only for metal * try: upcast inside graph_rewrite instead of shapetracker wip * checking overflow can just be done directly on all views, with idxs * cleanup * REMOVE hard coded uop test for idx upcast * refactor cleanup refactor * do actual casting when necessary, instead of rewriting all idx hard code uop test new upcast * check dtype for int64 in webgpu * cleanup cleanup * cleanup * update tests cleanup comment cleanup cleanup * comment * comment * update comment update comment * refactor * typo * keep the scope to only upcasting * white space * Revert "white space" This reverts commit `314d7eb184`. * Revert "keep the scope to only upcasting" This reverts commit `1ef701dd85`. * sym folding is not necessary lint1 * fold symbolic lint * use symbolic simple when folding shapetracker idx * full sym folding is required after all... * Ops.CAST should retain the src min max * put rewrite to lowerer wip * start testing on higher level wip test higher level in test_tensor * find Ops.STORE in list instead of recursively * check dtype support when upcasting * remove invalid_dtype * lint * fix int64 support checks in upcast lint * skipif skipunless * revert fold to find test case * Revert "revert fold to find test case" This reverts commit `225bb6e801`. * test sym folding * handle ptx * wip * wip * delete hard coded uop test * lint fixes * wip * fix checking for None * lint * handle ptx * comment * dtype for overflow() * update skipIf skipUnless * assert in wgsl renderer for int64 wip * do folded_upcast in to_indexed_op, real_size uses views_to_indexed_ops * assert in lowerer for dtype support lint * Revert "assert in lowerer for dtype support" This reverts commit `8e9b1b79bf`. * assert dtype in kernel.py * Revert "assert dtype in kernel.py" This reverts commit `e29b9a9893`. * wip * assert in render * remove old assert * check dtype from rendere, assert in upcast wip * smaller arange for sym fold case * linearize directly * use expand directly * lint * lint * rename * no need to check dtype in device.py * trigger pr * remove dtype assert in upcast, make wgpu fail in render * use DType for type hint instead of dtypes * assert on KeyError in tests for webgpu backend int64 * use a tuple for src * test real kernel run wip * lint error * restore * fix real_size * update test example * resolve merge stuff --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>	2025-01-17 11:52:31 -05:00
qazal	23f0ff0ed8	add bitcast to multi [pr] (#8652 )	2025-01-17 03:17:19 -05:00
qazal	2b7db9b45d	delete unused cast/bitcast lines from ops.py [pr] (#8651 ) * move cast and bitcast out * more deletion of bitcast arg * fix test_bitcast_fuses * update tests * work	2025-01-17 03:04:18 -05:00
Mike Ashcroft	4f0d1b4759	Disable graphs by default if using an intel macbook (#8648 ) (#8649 )	2025-01-16 18:24:56 -08:00
eliotgolding	0289fbb1c2	limit real_size to the size of first View of ShapeTracker (#8628 ) * fix real_size * add fuzzer; typing * spacing --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-16 16:27:39 -05:00
nimlgen	f91ca508cf	am: bind for sdma (#8633 ) * am: bind for sdma * fix	2025-01-16 15:22:27 +03:00
nimlgen	f671da6755	ci: add AM start time to benchmark (#8637 ) * ci: add AM start time to benchmark * am: unlock it * add AMD * revert this	2025-01-16 14:47:36 +03:00
qazal	81a84aa85a	remove is_unrealized_unmasked_const [pr] (#8644 )	2025-01-16 05:27:47 -05:00
uuuvn	00e5979897	Use full soname for libgcc_s in CPUProgram (#8642 ) Number after .so is abi version, it is always 1 for libgcc_s. Most linux systems set default library versions via symlinks that are simply followed to get actual elf, however conda does it via linker scripts which ctypes doesn't follow (below contents of libgcc_s.so): ``` /* GNU ld script Use the shared library, but some functions are only in the static library. */ GROUP ( libgcc_s.so.1 -lgcc ) ``` ctypes.util.find_library thinks that this is the actual elf and ctypes.CDLL just loads this text file as a shared library. The result is: ``` File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__ self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header ```	2025-01-16 12:56:52 +03:00
qazal	611208cd8a	Revert "Revert "move subbuffer to a rewrite rule in the scheduler (#8639 )" (…" (#8643 ) This reverts commit `82ef956cb8`.	2025-01-16 04:30:11 -05:00
qazal	82ef956cb8	Revert "move subbuffer to a rewrite rule in the scheduler (#8639 )" (#8641 ) This reverts commit `d5c90da286`.	2025-01-16 03:29:07 -05:00
qazal	d5c90da286	move subbuffer to a rewrite rule in the scheduler (#8639 ) * delete buffer_view from tensor * add to the scheduler * move buffer_view to the scheduler * gradient doesn't care. * for/with	2025-01-16 03:14:28 +02:00
nimlgen	b3efeeb717	docs: start am docs (#8638 ) * docs: init am docs * missing	2025-01-16 00:22:35 +03:00
uuuvn	7ecced7f6d	LLVM JIT prereqs (#8634 ) * LLVM JIT prereqs This commit moves jit loading, disassembling and CPUProgram logic from `ops_clang.py` to `elf.py`, `helpers.py` and `device.py` respectively I don't quite like the `helpers.py` destination for capstone_flatdump but this is where cpu_objdump is so presumably this is how it's supposed to be * Types	2025-01-15 09:47:08 -08:00
qazal	a1f70ce7d0	only use BUFFER_VIEW in disk [pr] (#8629 ) * only use BUFFER_VIEW in disk [pr] * delete can_view * BUFFER_VIEW op on DISK * remove that allow_buffer_view=False * notes * bitcast is a low-level op too * this passes on AMD and LLVM	2025-01-15 12:34:15 -05:00
ignaciosica	bae20e5043	Generic PTX wmma rendering [pr] (#8632 ) * make wmma rendering dtype size generic * use var instead of calculating multiple times * compact rendering	2025-01-15 09:31:48 -08:00
qazal	6193e279d4	isolate simple failing test for subbuffer on CONST [pr] (#8630 ) * simple failing test for subbuffer on CONST [pr] * add view_supported_devices check	2025-01-15 05:45:03 -05:00
George Hotz	e1f7c90459	gradient is a set [pr] (#8626 ) * gradient is a set [pr] * typing for deepwalk	2025-01-14 20:48:23 -08:00
chenyu	7fb1c7af61	minor multi cleanups [pr] (#8625 )	2025-01-14 22:25:23 -05:00
George Hotz	504ad08e73	hotfix: add test_example_matmul_same	2025-01-14 19:03:17 -08:00
George Hotz	f29d6f54b8	support multilb gradient [pr] (#8624 )	2025-01-14 18:33:33 -08:00
chenyu	4ee3243c93	JITBEAM=2 for LLaMA-3 8B on 4 GPUs [pr] (#8623 ) is it fast?	2025-01-14 19:52:38 -05:00
chenyu	7860a80801	simpler MultiLazyBuffer alu [pr] (#8622 )	2025-01-14 19:19:13 -05:00
chenyu	930728c069	bert BS 72->66 [pr] (#8621 ) 72 does not fit now	2025-01-14 18:41:41 -05:00
chenyu	0790d8059f	remove MultiLazyBuffer.from_sharded [pr] (#8620 ) it's eqivalent to taking the lazydata from Tensor.split, then copy to devices	2025-01-14 18:00:49 -05:00
George Hotz	c85737c200	assert to prepare for grad uop [pr] (#8280 ) * assert to prepare for grad uop [pr] * fix test_nn * fix most of test_tensor * few more tests * fix multi * uniform gradient * acc_dtype * any for multi * fix typing * fix assert, CAST_BEFORE_VIEW is still the issue * explict test for CAST_BEFORE_VIEW --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 13:26:56 -08:00
George Hotz	fdd46c9f28	delete view instant rule (#8616 ) * remove cast before view * greener * indexing * delete view instant rule * that passes too * openpilot too * ack * base on cast_before_view * add it as a rewrite rule * VIEW(DEVICE) is also fine * test_shard_memory depends on forced_realize removal * put that back, will go soon * UOp representations change once we don't instantly fold things * do not duplicate tests --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 16:15:13 -05:00
qazal	dddd4e5f9f	hotfix: remove duplicate TestTensorMutates [pr] (#8619 ) * hotfix: remove duplicate TestTensorMutates [pr] * imports	2025-01-14 16:03:17 -05:00
nimlgen	c5782e85d2	tlsf: optimize alloc (#8608 )	2025-01-14 23:48:07 +03:00
George Hotz	bfbe81df71	remove cast before view (#8613 ) * remove cast before view * greener * indexing * that passes too * openpilot too * ack --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-14 15:04:58 -05:00
chenyu	393eec3201	raise RuntimeError for uneven shard [pr] (#8593 ) no 7B llama on 6 GPUs skip 70B	2025-01-14 14:51:48 -05:00
ignaciosica	d5a646d492	CUDA Turing TC (#8597 ) * init turing tc * reorder tc * hotfix: remove some spaces * revert var name to x * consistent order of factors * revert order of terms to match old stuff --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-14 10:35:14 -08:00
chenyu	cbfd51f5a5	make MultiLazyBuffer.bounds a property [pr] (#8614 ) determined by lbs shapes and axis	2025-01-14 13:25:54 -05:00
chenyu	52e7003414	Revert "make kits19 dataset samples have small sizes (#8591 )" (#8610 ) This reverts commit `76a03e950a`.	2025-01-14 12:24:27 -05:00
Francis Lata	76a03e950a	make kits19 dataset samples have small sizes (#8591 )	2025-01-14 08:27:45 -08:00
ignaciosica	4057b98f7f	rename i and j into k and row/col (#8607 )	2025-01-14 08:27:05 -08:00
nimlgen	1ff6862a3d	ci: sleep a bit to let the driver unload the prev pid (#8605 )	2025-01-14 15:55:23 +03:00
qazal	97ec564b03	noop changes from the block_assign branch [pr] (#8606 )	2025-01-14 07:47:17 -05:00
qazal	5aab2806f0	rename to test_tensor_uop + use upats for asserting [pr] (#8604 ) * rename to test_tensor_uop + use upats for asserting [pr] * fix pr	2025-01-14 05:09:56 -05:00
qazal	863abc7140	scheduling graph_rewrite prereqs for BLOCK in ASSIGN (#8598 ) * remove the BUF_LIMIT assert * skip the base one * work * work * good error * ok comment * shorter check	2025-01-14 03:01:59 -05:00
chenyu	05e54f00d3	remove bounds from MultiLazyBuffer.from_sharded [pr] (#8603 ) without a custom bound, the bound is uniquely determined by shape and axis	2025-01-13 23:40:05 -05:00
chenyu	d443e91d82	remove custom splits in Tensor.shard [pr] (#8602 ) towards even split only	2025-01-13 21:29:13 -05:00
chenyu	227d96d7a3	remove unused src from metaop [pr] (#8601 )	2025-01-13 20:28:14 -05:00
chenyu	c4e33048c6	test Tensor.clone has a different lazydata [pr] (#8600 )	2025-01-13 20:13:44 -05:00
qazal	ae2229d727	assert kernel buffer limit at compile time [pr] (#8595 ) * remove the BUF_LIMIT assert * skip the base one	2025-01-13 16:32:07 -05:00
nimlgen	c2504357af	am: lock to access dev (#8594 ) * amm lock to access dev * wording * just works * disbale	2025-01-13 23:53:13 +03:00
geohotstan	4abe631b56	fix onnx mobilenetv2-7-quantized.onnx (#8574 ) * is 67% considered fixed? * move test up * share function * add qgemm too * make sure qgemm comes out as int * actually that note is not right * remove qgemm (I did it wrong) and add it later lol.	2025-01-13 09:25:06 -08:00

1 2 3 4 5 ...

7547 Commits