tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
George Hotz	dbda72f91d	hotfix: raise line limit to 11200 for new webgpu backend	2025-02-07 14:29:20 +08:00
George Hotz	b1e1319972	ci speed on the enterprise plan [pr] (#8942 )	2025-02-07 11:18:12 +08:00
Bhavya Gada	3b67712892	[bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple (#8937 ) * fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple * remove expectedFailure --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 10:07:54 +08:00
George Hotz	f54242849d	failing test for the devectorize [pr] (#8940 ) * failing test for the devectorize [pr] * add DEVECTORIZE to method_cache	2025-02-07 09:44:54 +08:00
nimlgen	ee1a0fb8ec	am_smi: print device name (#8939 )	2025-02-07 03:01:25 +03:00
chenyu	a092b6395d	Tuple -> tuple, List -> list [pr] (#8936 )	2025-02-06 14:21:19 -05:00
chenyu	d5183e1584	remove unneeded annotation import (#8934 )	2025-02-06 13:12:35 -05:00
chenyu	00d72a5144	setitem isinstance cleanup [pr] (#8932 )	2025-02-06 11:44:57 -05:00
qazal	81e241150a	hotfix: save 1 line (#8931 ) * hotfix: save 1 line * no unwrap	2025-02-06 17:26:05 +02:00
qazal	eb1144be8b	hotfix: only check current graph when excluding nodes in viz (#8930 )	2025-02-06 16:58:53 +02:00
George Hotz	3cc05081f4	llvm no devectorize, the right way (#8901 ) * closer * env flag + transcendental issue	2025-02-06 22:53:49 +08:00
George Hotz	8b16c65bca	add compile3 benchmark [pr] (#8929 )	2025-02-06 22:49:31 +08:00
qazal	79fb5c6470	hotfix: test_shard_no_recompile shouldn't rely on schedule order [pr] (#8928 )	2025-02-06 16:27:59 +02:00
George Hotz	1249e8dd3b	objc fast msg, try 2 [pr] (#8927 )	2025-02-06 19:06:21 +08:00
nimlgen	86feb98dcd	am: add support for 7600 (#8910 ) * am: start to add support for 7600 * test_tiny passes * mmhub 3 0 2 * cleaner	2025-02-06 14:04:07 +03:00
George Hotz	ae45826758	hotfix: GRAPH_ONE_KERNEL + fix timing	2025-02-06 17:52:20 +08:00
George Hotz	1c53e8bf27	Revert "objc fast msg (#8922 )" (#8926 ) This reverts commit `c3f99a727e`.	2025-02-06 17:50:49 +08:00
George Hotz	c3f99a727e	objc fast msg (#8922 ) * benchmark kernel launch * don't realize unneeded * faster * faster metal * fix mypy * new objc message style [pr] * without sync * no div 0 * lru cache that * no sync in the profile * fix * update all to new style * remove comment * graph one kernel * fix graph one kernel * remove that sync	2025-02-06 17:49:06 +08:00
qazal	a2e7e49fe1	prepickle scheduler process replay [pr] (#8924 )	2025-02-06 10:16:36 +01:00
qazal	89d7480b0c	hotfix: don't sink views [pr] (#8923 )	2025-02-06 09:15:12 +01:00
George Hotz	0cbb7d7f1e	hotfix: metal has known sync issue	2025-02-06 14:29:41 +08:00
George Hotz	a8e54df363	benchmark single kernel launch (#8921 ) * benchmark kernel launch * don't realize unneeded * faster * faster metal * fix mypy * without sync * no div 0 * lru cache that * no sync in the profile	2025-02-06 13:35:34 +08:00
George Hotz	3e082d4a9d	add float4 support to LLVM (#8920 ) * add float4 support to LLVM * is_bool	2025-02-06 12:15:50 +08:00
George Hotz	b05c536f74	cleanup some llvm stuff [pr] (#8919 ) * cleanup some llvm stuff [pr] * debug * default to newer llvm * repr	2025-02-06 11:45:03 +08:00
Josh Moore	44e0eab8fd	Fix AttributeError occurring after ValueError in _apply_uop (#8905 ) * Fix AttributeError occurring after ValueError in _apply_uop * Update tensor.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-06 10:56:29 +08:00
chenyu	30695da256	remove Tensor._to_const_val (#8917 ) * remove Tensor._to_const_val added a TODO for advance indexing on const, which was the last place that checks const in Tensor * that is not folding now * one more	2025-02-05 21:44:39 -05:00
George Hotz	d09b5f801c	don't use Tensor new, add to all_tensors after constructions [pr] (#8918 )	2025-02-06 10:21:32 +08:00
FICTURE7	759b3f86bf	Pass host CPU features to LLVM target (#8909 ) * Pass host CPU features to LLVM target This gets `test_gemm_fp16` to pass on Windows. It would fail because the generated machine code would call compiler-rt functions to to perform truncating. This gets the test to pass on some hardware, because LLVM gets access to more instructions. Essentially this is similar to `-march=native`. Unless this was intentionally left as is to be re-implemented fully in LLVM IR or something. * Fix linter complaints	2025-02-06 10:19:30 +08:00
uuuvn	09ec33a578	Better errors when relocating against undefined symbol (#8902 )	2025-02-06 10:13:44 +08:00
chenyu	488200f16c	move more pow const to rewrite (#8916 ) * move more pow const to rewrite one less use of _to_const_val * fix	2025-02-05 20:30:12 -05:00
chenyu	76671381aa	move positive const ** t to a rewrite rule (#8914 ) * move positive const ** t to a rewrite rule * one more test	2025-02-05 19:30:12 -05:00
Ignacio Sica	cad44f5f42	add Half-Precision Accumulation Support for Tensor Cores in NV, CUDA, and PTX (#8680 ) * ptx and nv rendering refactor to work with half acc * ptx fix! * use same reg for acc and out * fix comment * another fix * minor change in commet * fix --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-02-05 16:56:37 -05:00
nimlgen	17f9b1cef6	am: load fw based on versions (#8913 ) * am: load fw based on versions * ops * ops2	2025-02-06 00:02:09 +03:00
chenyu	189bfa164e	enable backward test for pow(neg const x) (#8912 ) backward works now. 0x still does not work because it's a special case fixed in transcendental	2025-02-05 15:35:21 -05:00
chenyu	9307572fe3	Ops.POW and transcendental (#8911 )	2025-02-05 15:15:59 -05:00
nimlgen	bff7c70eef	hcq: better var check (#8908 )	2025-02-05 22:38:59 +03:00
Ignacio Sica	aec3b8d515	add regression test: `test_get_kernel_actions_preserves_actions_state` (#8907 ) * test_get_kernel_actions_preserves_actions_state * simplify * simplify * refactor assert message	2025-02-05 14:13:01 -05:00
qazal	e71497aabc	move assign ShapeTracker check to pattern matcher [pr] (#8906 ) * move assign ShapeTracker check to pattern matcher [pr] * rename the st uop to view	2025-02-05 19:47:20 +01:00
Ignacio Sica	0f6109ec00	hotfix bug in `get_kernel_actions` after `TC_SEARCH_OVER_SHAPE` was introduced (#8904 ) * hotfix search bug * copy actions	2025-02-05 13:10:05 -05:00
Ignacio Sica	15f94ac964	TC_SEARCH_OVER_SHAPE to search multiple TC shapes (#8793 ) * squash search over search * refactor assert * init benchmark * cleaner get_kernel_actions * cleaner get_kernel_actions * add comment	2025-02-05 11:03:46 -05:00
qazal	e7edadda54	construct the sched_sink with graph_rewrite [pr] (#8903 ) * construct the sched_sink with graph_rewrite * diff * move break_sched	2025-02-05 15:16:48 +01:00
qazal	ef7ad3f077	simpler subbuffer construction + copyin is always base (#8900 ) * realize copy * cleanup buffer_view * smaller	2025-02-05 09:10:20 +01:00
qazal	6f0cc2e9c5	rename to KernelContext and move the linearize_sched comment [pr] (#8899 ) * rename to KernelContext and move that comment [pr] * 500	2025-02-05 07:49:58 +01:00
geohotstan	6fb0e5751b	hotfix test_onnx_imagenet (#8897 ) * start * log severity * only change this * change abstraction so it's more usable for huggingface * WHOOPS * actually this is more correct	2025-02-05 14:39:55 +08:00
George Hotz	c1c5227acb	preserve size in dtype ptr [pr] (#8898 )	2025-02-05 14:38:57 +08:00
George Hotz	5844883e59	bump master version v0.10.1	2025-02-05 09:08:28 +08:00
uuuvn	a51c688f39	Cleanup llvm cleanup (and some clang things too) (#8871 ) * Cleanup llvm cleanup (and some clang things too) * Tests * Tests 2 * forgot mockgpu * more print some sources	2025-02-05 07:49:05 +08:00
eliotgolding	bb5ded85cc	Don't rewrite idiv to rshift when numerator is negative (#8885 ) * more conditions for shift rewrite mul/idiv * make ptx test uint so the new condition is true * delete idiv test * rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division * mul/div by 2**(large count) is unsupported anyway	2025-02-05 07:47:33 +08:00
pedro	666b6149bc	Use full soname for libgcc_s in CPUProgram (#8642 ) (#8896 ) Number after .so is abi version, it is always 1 for libgcc_s. Most linux systems set default library versions via symlinks that are simply followed to get actual elf, however conda does it via linker scripts which ctypes doesn't follow (below contents of libgcc_s.so): ``` /* GNU ld script Use the shared library, but some functions are only in the static library. */ GROUP ( libgcc_s.so.1 -lgcc ) ``` ctypes.util.find_library thinks that this is the actual elf and ctypes.CDLL just loads this text file as a shared library. The result is: ``` File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__ self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header ``` Co-authored-by: uuuvn <83587632+uuuvn@users.noreply.github.com>	2025-02-05 07:45:48 +08:00

1 2 3 4 5 ...

7774 Commits