tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-28 08:17:58 -05:00

Author	SHA1	Message	Date
eliotgolding	0289fbb1c2	limit real_size to the size of first View of ShapeTracker (#8628 ) * fix real_size * add fuzzer; typing * spacing --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-16 16:27:39 -05:00
nimlgen	f91ca508cf	am: bind for sdma (#8633 ) * am: bind for sdma * fix	2025-01-16 15:22:27 +03:00
nimlgen	f671da6755	ci: add AM start time to benchmark (#8637 ) * ci: add AM start time to benchmark * am: unlock it * add AMD * revert this	2025-01-16 14:47:36 +03:00
qazal	81a84aa85a	remove is_unrealized_unmasked_const [pr] (#8644 )	2025-01-16 05:27:47 -05:00
uuuvn	00e5979897	Use full soname for libgcc_s in CPUProgram (#8642 ) Number after .so is abi version, it is always 1 for libgcc_s. Most linux systems set default library versions via symlinks that are simply followed to get actual elf, however conda does it via linker scripts which ctypes doesn't follow (below contents of libgcc_s.so): ``` /* GNU ld script Use the shared library, but some functions are only in the static library. */ GROUP ( libgcc_s.so.1 -lgcc ) ``` ctypes.util.find_library thinks that this is the actual elf and ctypes.CDLL just loads this text file as a shared library. The result is: ``` File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__ self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header ```	2025-01-16 12:56:52 +03:00
qazal	611208cd8a	Revert "Revert "move subbuffer to a rewrite rule in the scheduler (#8639 )" (…" (#8643 ) This reverts commit `82ef956cb8`.	2025-01-16 04:30:11 -05:00
qazal	82ef956cb8	Revert "move subbuffer to a rewrite rule in the scheduler (#8639 )" (#8641 ) This reverts commit `d5c90da286`.	2025-01-16 03:29:07 -05:00
qazal	d5c90da286	move subbuffer to a rewrite rule in the scheduler (#8639 ) * delete buffer_view from tensor * add to the scheduler * move buffer_view to the scheduler * gradient doesn't care. * for/with	2025-01-16 03:14:28 +02:00
nimlgen	b3efeeb717	docs: start am docs (#8638 ) * docs: init am docs * missing	2025-01-16 00:22:35 +03:00
uuuvn	7ecced7f6d	LLVM JIT prereqs (#8634 ) * LLVM JIT prereqs This commit moves jit loading, disassembling and CPUProgram logic from `ops_clang.py` to `elf.py`, `helpers.py` and `device.py` respectively I don't quite like the `helpers.py` destination for capstone_flatdump but this is where cpu_objdump is so presumably this is how it's supposed to be * Types	2025-01-15 09:47:08 -08:00
qazal	a1f70ce7d0	only use BUFFER_VIEW in disk [pr] (#8629 ) * only use BUFFER_VIEW in disk [pr] * delete can_view * BUFFER_VIEW op on DISK * remove that allow_buffer_view=False * notes * bitcast is a low-level op too * this passes on AMD and LLVM	2025-01-15 12:34:15 -05:00
ignaciosica	bae20e5043	Generic PTX wmma rendering [pr] (#8632 ) * make wmma rendering dtype size generic * use var instead of calculating multiple times * compact rendering	2025-01-15 09:31:48 -08:00
qazal	6193e279d4	isolate simple failing test for subbuffer on CONST [pr] (#8630 ) * simple failing test for subbuffer on CONST [pr] * add view_supported_devices check	2025-01-15 05:45:03 -05:00
George Hotz	e1f7c90459	gradient is a set [pr] (#8626 ) * gradient is a set [pr] * typing for deepwalk	2025-01-14 20:48:23 -08:00
chenyu	7fb1c7af61	minor multi cleanups [pr] (#8625 )	2025-01-14 22:25:23 -05:00
George Hotz	504ad08e73	hotfix: add test_example_matmul_same	2025-01-14 19:03:17 -08:00
George Hotz	f29d6f54b8	support multilb gradient [pr] (#8624 )	2025-01-14 18:33:33 -08:00
chenyu	4ee3243c93	JITBEAM=2 for LLaMA-3 8B on 4 GPUs [pr] (#8623 ) is it fast?	2025-01-14 19:52:38 -05:00
chenyu	7860a80801	simpler MultiLazyBuffer alu [pr] (#8622 )	2025-01-14 19:19:13 -05:00
chenyu	930728c069	bert BS 72->66 [pr] (#8621 ) 72 does not fit now	2025-01-14 18:41:41 -05:00
chenyu	0790d8059f	remove MultiLazyBuffer.from_sharded [pr] (#8620 ) it's eqivalent to taking the lazydata from Tensor.split, then copy to devices	2025-01-14 18:00:49 -05:00
George Hotz	c85737c200	assert to prepare for grad uop [pr] (#8280 ) * assert to prepare for grad uop [pr] * fix test_nn * fix most of test_tensor * few more tests * fix multi * uniform gradient * acc_dtype * any for multi * fix typing * fix assert, CAST_BEFORE_VIEW is still the issue * explict test for CAST_BEFORE_VIEW --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 13:26:56 -08:00
George Hotz	fdd46c9f28	delete view instant rule (#8616 ) * remove cast before view * greener * indexing * delete view instant rule * that passes too * openpilot too * ack * base on cast_before_view * add it as a rewrite rule * VIEW(DEVICE) is also fine * test_shard_memory depends on forced_realize removal * put that back, will go soon * UOp representations change once we don't instantly fold things * do not duplicate tests --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 16:15:13 -05:00
qazal	dddd4e5f9f	hotfix: remove duplicate TestTensorMutates [pr] (#8619 ) * hotfix: remove duplicate TestTensorMutates [pr] * imports	2025-01-14 16:03:17 -05:00
nimlgen	c5782e85d2	tlsf: optimize alloc (#8608 )	2025-01-14 23:48:07 +03:00
George Hotz	bfbe81df71	remove cast before view (#8613 ) * remove cast before view * greener * indexing * that passes too * openpilot too * ack --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-14 15:04:58 -05:00
chenyu	393eec3201	raise RuntimeError for uneven shard [pr] (#8593 ) no 7B llama on 6 GPUs skip 70B	2025-01-14 14:51:48 -05:00
ignaciosica	d5a646d492	CUDA Turing TC (#8597 ) * init turing tc * reorder tc * hotfix: remove some spaces * revert var name to x * consistent order of factors * revert order of terms to match old stuff --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-14 10:35:14 -08:00
chenyu	cbfd51f5a5	make MultiLazyBuffer.bounds a property [pr] (#8614 ) determined by lbs shapes and axis	2025-01-14 13:25:54 -05:00
chenyu	52e7003414	Revert "make kits19 dataset samples have small sizes (#8591 )" (#8610 ) This reverts commit `76a03e950a`.	2025-01-14 12:24:27 -05:00
Francis Lata	76a03e950a	make kits19 dataset samples have small sizes (#8591 )	2025-01-14 08:27:45 -08:00
ignaciosica	4057b98f7f	rename i and j into k and row/col (#8607 )	2025-01-14 08:27:05 -08:00
nimlgen	1ff6862a3d	ci: sleep a bit to let the driver unload the prev pid (#8605 )	2025-01-14 15:55:23 +03:00
qazal	97ec564b03	noop changes from the block_assign branch [pr] (#8606 )	2025-01-14 07:47:17 -05:00
qazal	5aab2806f0	rename to test_tensor_uop + use upats for asserting [pr] (#8604 ) * rename to test_tensor_uop + use upats for asserting [pr] * fix pr	2025-01-14 05:09:56 -05:00
qazal	863abc7140	scheduling graph_rewrite prereqs for BLOCK in ASSIGN (#8598 ) * remove the BUF_LIMIT assert * skip the base one * work * work * good error * ok comment * shorter check	2025-01-14 03:01:59 -05:00
chenyu	05e54f00d3	remove bounds from MultiLazyBuffer.from_sharded [pr] (#8603 ) without a custom bound, the bound is uniquely determined by shape and axis	2025-01-13 23:40:05 -05:00
chenyu	d443e91d82	remove custom splits in Tensor.shard [pr] (#8602 ) towards even split only	2025-01-13 21:29:13 -05:00
chenyu	227d96d7a3	remove unused src from metaop [pr] (#8601 )	2025-01-13 20:28:14 -05:00
chenyu	c4e33048c6	test Tensor.clone has a different lazydata [pr] (#8600 )	2025-01-13 20:13:44 -05:00
qazal	ae2229d727	assert kernel buffer limit at compile time [pr] (#8595 ) * remove the BUF_LIMIT assert * skip the base one	2025-01-13 16:32:07 -05:00
nimlgen	c2504357af	am: lock to access dev (#8594 ) * amm lock to access dev * wording * just works * disbale	2025-01-13 23:53:13 +03:00
geohotstan	4abe631b56	fix onnx mobilenetv2-7-quantized.onnx (#8574 ) * is 67% considered fixed? * move test up * share function * add qgemm too * make sure qgemm comes out as int * actually that note is not right * remove qgemm (I did it wrong) and add it later lol.	2025-01-13 09:25:06 -08:00
George Hotz	d19c1c7f03	bump 75 -> 73 for test failure	2025-01-13 09:18:38 -08:00
Francis Lata	c25d5d3101	improve isin checks (#8589 )	2025-01-13 12:12:31 -05:00
nimlgen	74b83c4c41	am in ci (#8532 ) * try am in ci * no sudo * temp * run more am test * run half on am * insert amdgpu * other machine as well	2025-01-13 19:55:17 +03:00
nimlgen	d224d0ed7f	nv: fix fault info (#8587 ) * nv: fix fault info * and emu for amd * skip if not mock	2025-01-13 14:38:43 +03:00
qazal	586e730d32	use UOp.st for kernel reduce axes (#8499 ) * use UOp.st for kernel reduce axes [pr] * do not return dict	2025-01-13 06:24:11 -05:00
qazal	7562cc0399	better test for reduce swizzle + don't use double dtype [pr] (#8586 ) * better test_permute_rewrite * use float32	2025-01-13 05:02:21 -05:00
George Hotz	df59b072db	rename to top_down_rewrite [pr] (#8583 )	2025-01-12 18:36:38 -08:00

... 58 59 60 61 62 ...

10490 Commits