tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
chenyu	cfd28517df	move pow folding tests to test_schedule [pr] (#8955 ) not really belongs to test_const_folding	2025-02-07 12:51:43 -05:00
George Hotz	c2b4c43edb	handle stride 0 reduce (#8068 ) * handle stride 0 reduce [pr] * more test fixups * a few more --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-02-07 15:40:58 +01:00
qazal	cf21e27d78	little better VIEW simplifier pattern [pr] (#8954 )	2025-02-07 12:55:54 +01:00
qazal	329013f577	fix UOp.metadata on KERNEL op [pr] (#8953 ) * fix UOp.metadata on KERNEL op [pr] * hotfix: is not None	2025-02-07 12:40:11 +01:00
George Hotz	4de084a835	cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952 ) * cleanup ci [pr] * testing_minimal * add hypothesis to minimal * fail tiktoken import okay * add LLVM speed test * llvm speed w/o beam	2025-02-07 19:01:59 +08:00
uuuvn	6090cbe3be	Try to open llvm first when opening metal (#8949 ) * Try to open llvm first when opening metal * Use more specific FileNotFoundError	2025-02-07 18:58:37 +08:00
uuuvn	67b70e4f6c	Fix incorrect __del__ (#8950 ) CPython doesn't make any guarantees about order in which globals like `msg` or `libobjc` are destroyed when the interpreter shuts down https://github.com/tinygrad/tinygrad/pull/8949 triggered the unlucky ordering which lead to a bunch of errors at exit There is also a bunch of other places where similar problems exist	2025-02-07 18:21:44 +08:00
George Hotz	9ed2d0dfa2	refactor into subactions (#8946 ) * refactor into subactions * this work? * add shell * move install opencl * valid? * support mac os x * refactor other osx * fix linux/osx * fixes * cleanups * used everywhere * no quotes * quotes on true * bugfixes * this run? * hardcode * that * process replay action * fix checkout * restore to branch * fix caching * fix osx python cache * does replace function exist * Revert "does replace function exist" This reverts commit `622177c5a0`. * Revert "fix osx python cache" This reverts commit `e70d55cd93`. * user on osx to fix untar issue * that	2025-02-07 18:06:44 +08:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
George Hotz	dbda72f91d	hotfix: raise line limit to 11200 for new webgpu backend	2025-02-07 14:29:20 +08:00
George Hotz	b1e1319972	ci speed on the enterprise plan [pr] (#8942 )	2025-02-07 11:18:12 +08:00
Bhavya Gada	3b67712892	[bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple (#8937 ) * fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple * remove expectedFailure --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 10:07:54 +08:00
George Hotz	f54242849d	failing test for the devectorize [pr] (#8940 ) * failing test for the devectorize [pr] * add DEVECTORIZE to method_cache	2025-02-07 09:44:54 +08:00
nimlgen	ee1a0fb8ec	am_smi: print device name (#8939 )	2025-02-07 03:01:25 +03:00
chenyu	a092b6395d	Tuple -> tuple, List -> list [pr] (#8936 )	2025-02-06 14:21:19 -05:00
chenyu	d5183e1584	remove unneeded annotation import (#8934 )	2025-02-06 13:12:35 -05:00
chenyu	00d72a5144	setitem isinstance cleanup [pr] (#8932 )	2025-02-06 11:44:57 -05:00
qazal	81e241150a	hotfix: save 1 line (#8931 ) * hotfix: save 1 line * no unwrap	2025-02-06 17:26:05 +02:00
qazal	eb1144be8b	hotfix: only check current graph when excluding nodes in viz (#8930 )	2025-02-06 16:58:53 +02:00
George Hotz	3cc05081f4	llvm no devectorize, the right way (#8901 ) * closer * env flag + transcendental issue	2025-02-06 22:53:49 +08:00
George Hotz	8b16c65bca	add compile3 benchmark [pr] (#8929 )	2025-02-06 22:49:31 +08:00
qazal	79fb5c6470	hotfix: test_shard_no_recompile shouldn't rely on schedule order [pr] (#8928 )	2025-02-06 16:27:59 +02:00
George Hotz	1249e8dd3b	objc fast msg, try 2 [pr] (#8927 )	2025-02-06 19:06:21 +08:00
nimlgen	86feb98dcd	am: add support for 7600 (#8910 ) * am: start to add support for 7600 * test_tiny passes * mmhub 3 0 2 * cleaner	2025-02-06 14:04:07 +03:00
George Hotz	ae45826758	hotfix: GRAPH_ONE_KERNEL + fix timing	2025-02-06 17:52:20 +08:00
George Hotz	1c53e8bf27	Revert "objc fast msg (#8922 )" (#8926 ) This reverts commit `c3f99a727e`.	2025-02-06 17:50:49 +08:00
George Hotz	c3f99a727e	objc fast msg (#8922 ) * benchmark kernel launch * don't realize unneeded * faster * faster metal * fix mypy * new objc message style [pr] * without sync * no div 0 * lru cache that * no sync in the profile * fix * update all to new style * remove comment * graph one kernel * fix graph one kernel * remove that sync	2025-02-06 17:49:06 +08:00
qazal	a2e7e49fe1	prepickle scheduler process replay [pr] (#8924 )	2025-02-06 10:16:36 +01:00
qazal	89d7480b0c	hotfix: don't sink views [pr] (#8923 )	2025-02-06 09:15:12 +01:00
George Hotz	0cbb7d7f1e	hotfix: metal has known sync issue	2025-02-06 14:29:41 +08:00
George Hotz	a8e54df363	benchmark single kernel launch (#8921 ) * benchmark kernel launch * don't realize unneeded * faster * faster metal * fix mypy * without sync * no div 0 * lru cache that * no sync in the profile	2025-02-06 13:35:34 +08:00
George Hotz	3e082d4a9d	add float4 support to LLVM (#8920 ) * add float4 support to LLVM * is_bool	2025-02-06 12:15:50 +08:00
George Hotz	b05c536f74	cleanup some llvm stuff [pr] (#8919 ) * cleanup some llvm stuff [pr] * debug * default to newer llvm * repr	2025-02-06 11:45:03 +08:00
Josh Moore	44e0eab8fd	Fix AttributeError occurring after ValueError in _apply_uop (#8905 ) * Fix AttributeError occurring after ValueError in _apply_uop * Update tensor.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-06 10:56:29 +08:00
chenyu	30695da256	remove Tensor._to_const_val (#8917 ) * remove Tensor._to_const_val added a TODO for advance indexing on const, which was the last place that checks const in Tensor * that is not folding now * one more	2025-02-05 21:44:39 -05:00
George Hotz	d09b5f801c	don't use Tensor new, add to all_tensors after constructions [pr] (#8918 )	2025-02-06 10:21:32 +08:00
FICTURE7	759b3f86bf	Pass host CPU features to LLVM target (#8909 ) * Pass host CPU features to LLVM target This gets `test_gemm_fp16` to pass on Windows. It would fail because the generated machine code would call compiler-rt functions to to perform truncating. This gets the test to pass on some hardware, because LLVM gets access to more instructions. Essentially this is similar to `-march=native`. Unless this was intentionally left as is to be re-implemented fully in LLVM IR or something. * Fix linter complaints	2025-02-06 10:19:30 +08:00
uuuvn	09ec33a578	Better errors when relocating against undefined symbol (#8902 )	2025-02-06 10:13:44 +08:00
chenyu	488200f16c	move more pow const to rewrite (#8916 ) * move more pow const to rewrite one less use of _to_const_val * fix	2025-02-05 20:30:12 -05:00
chenyu	76671381aa	move positive const ** t to a rewrite rule (#8914 ) * move positive const ** t to a rewrite rule * one more test	2025-02-05 19:30:12 -05:00
Ignacio Sica	cad44f5f42	add Half-Precision Accumulation Support for Tensor Cores in NV, CUDA, and PTX (#8680 ) * ptx and nv rendering refactor to work with half acc * ptx fix! * use same reg for acc and out * fix comment * another fix * minor change in commet * fix --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-02-05 16:56:37 -05:00
nimlgen	17f9b1cef6	am: load fw based on versions (#8913 ) * am: load fw based on versions * ops * ops2	2025-02-06 00:02:09 +03:00
chenyu	189bfa164e	enable backward test for pow(neg const x) (#8912 ) backward works now. 0x still does not work because it's a special case fixed in transcendental	2025-02-05 15:35:21 -05:00
chenyu	9307572fe3	Ops.POW and transcendental (#8911 )	2025-02-05 15:15:59 -05:00
nimlgen	bff7c70eef	hcq: better var check (#8908 )	2025-02-05 22:38:59 +03:00
Ignacio Sica	aec3b8d515	add regression test: `test_get_kernel_actions_preserves_actions_state` (#8907 ) * test_get_kernel_actions_preserves_actions_state * simplify * simplify * refactor assert message	2025-02-05 14:13:01 -05:00
qazal	e71497aabc	move assign ShapeTracker check to pattern matcher [pr] (#8906 ) * move assign ShapeTracker check to pattern matcher [pr] * rename the st uop to view	2025-02-05 19:47:20 +01:00
Ignacio Sica	0f6109ec00	hotfix bug in `get_kernel_actions` after `TC_SEARCH_OVER_SHAPE` was introduced (#8904 ) * hotfix search bug * copy actions	2025-02-05 13:10:05 -05:00
Ignacio Sica	15f94ac964	TC_SEARCH_OVER_SHAPE to search multiple TC shapes (#8793 ) * squash search over search * refactor assert * init benchmark * cleaner get_kernel_actions * cleaner get_kernel_actions * add comment	2025-02-05 11:03:46 -05:00
qazal	e7edadda54	construct the sched_sink with graph_rewrite [pr] (#8903 ) * construct the sched_sink with graph_rewrite * diff * move break_sched	2025-02-05 15:16:48 +01:00

1 2 3 4 5 ...

7782 Commits