tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 22:08:08 -05:00

Author	SHA1	Message	Date
gg	19ae829bd1	test float uop in sym_infer (#7456 ) * float uop in sym_infer * break line :( * rerun mypy * update GlobalCounters types * revert type change and cast assignments to mem and ops * cast inferred value to UOp in reshape * cast hcq, update view reshape to handle inferred float * rm extra space * update error * no type updates	2025-02-13 12:55:28 +08:00
JaSpa99	d2ff55e9c6	OSX GPUOcelot (#8209 ) * add patches * add osx test in ci * macos specific uvm, gpfifo mask * only do that for now * Revert "add patches" This reverts commit `80d3112a57`. * use fork for now * workflow only one worker * merge osxtests with tests * Revert "merge osxtests with tests" This reverts commit `3461c8f46c`. * macos pagesize 16384 --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-13 12:24:29 +08:00
chenyu	f4f56d7c15	move time_linearizer to extra.optimization.helpers [pr] (#9048 ) no longer used in tinygrad	2025-02-12 15:49:58 -05:00
chenyu	c15486cf39	remove contiguous in test_subbuffer_used [pr] (#9046 ) test works without contiguous	2025-02-12 14:41:16 -05:00
chenyu	f53b819648	UOps. -> Ops. [pr] (#9044 ) updated the comments and doc except extra	2025-02-12 12:53:23 -05:00
Ahmed Harmouche	916d5e7f08	WebGPU f16 support (f16 bounty part 2) (#8653 ) * WebGPU f16 support * Don't enable f16 yet * dtype tests passing after bitcast fix * Maybe all WebGPU green? * Require shader-f16 in examples * Minor wgsl touchup * 1 line shorter * Simpler * Add transcendetal support * log2 nan location mismatch on Vulkan * Nan skips	2025-02-12 19:46:53 +08:00
Ignacio Sica	aaed315fee	add AMX support to LLVM (#8957 ) * init amx support for llvm * revert elf changes * fix attributes for AMX asm calls * add comments * add llvm amx job to benchmarks * cleanup * cleanup * hotfix: improve comments * comment for aux buffers * hotfix: * move amx_tc to ClangRenderer * merge master * refactor * add docs * add corsix docs reference --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-12 16:01:18 +08:00
Josh Moore	0c97c10814	TestOps: silence pytorch std()/var() degrees of freedom warnings (#9034 )	2025-02-12 14:49:18 +08:00
chenyu	2845f8797a	failed test cases for rsqrt at 0 and similar ones (#9035 ) * failed test cases for rsqrt at 0 and similar ones related to 0inf this failed	2025-02-11 17:50:16 -05:00
nimlgen	166670a2f2	nv: fill grid/block sizes (#9025 )	2025-02-11 16:30:30 +03:00
qazal	c80603285e	bring back some things from the fix_kernel_ops diff [pr] (#9027 ) * bring fix_kernel_ops back [pr] * fix	2025-02-11 14:20:31 +01:00
George Hotz	fb698920f1	revert scheduler change (#9019 ) * Revert "cleanup ast rewriter [pr] (#9012)" This reverts commit `bf0bcb2d5a`. * Revert "kernel op cleanups + use ScheduleItem [pr] (#9009)" This reverts commit `c52cd2b437`. * Revert "construct the schedule sink 2 (#8925)" This reverts commit `cfd3db7862`.	2025-02-11 11:34:12 +08:00
chenyu	6c39aa4a6b	adjust cuda ci test targets (#9014 )	2025-02-10 15:29:59 -05:00
qazal	bf0bcb2d5a	cleanup ast rewriter [pr] (#9012 )	2025-02-10 19:07:59 +01:00
chenyu	586e48d696	a few more backward tests now pass (#9010 )	2025-02-10 12:46:21 -05:00
chenyu	25fa5e4d5f	enable backward tests in test_std_one_in_axis [pr] (#9007 ) still one correction=0 case is broken Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-02-10 10:44:05 -05:00
qazal	cfd3db7862	construct the schedule sink 2 (#8925 ) * work * delete preload * fix metadata * this can keep existing * assign pruning * dedup early * bfs * cycle asserts * move assign check * once	2025-02-10 22:23:02 +08:00
qazal	cd77e51810	fix tensor realization bug in #8975 (#8984 ) * fix tensor realization bug in #8975 * that's a reshape now * work * works * give those tests better names * test when multiple mops result in the same ShapeTracker * test_become_existing_buf_complex is enough * that too	2025-02-10 13:51:30 +01:00
qazal	b17ec42b56	remove const_arg (#9002 ) * remove const_arg * use -m pytest * remove test_const_arg test, variable arg on CONST does not exist. * use base in test_const_dtype	2025-02-10 12:45:11 +01:00
George Hotz	0568720a68	delete revectorize (#9000 ) * delete revectorize * test vectorized LLVM/CLANG * idk about that * was that the segfault?	2025-02-10 18:32:35 +08:00
qazal	fd9f9ec772	realized base tensors become RESHAPE(BUFFER) [pr] (#8994 )	2025-02-10 10:17:54 +01:00
George Hotz	e618efce22	COMMUTATIVE flipping is only for ints (#8996 ) * COMMUTATIVE flipping is only for ints [pr] * no pr * comm fixes this	2025-02-10 12:01:28 +08:00
George Hotz	2983285315	use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] (#8993 ) * use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] * add quantize test to dsp * fix tests * older onnx * debug, let's see what's happening	2025-02-10 11:07:35 +08:00
nimlgen	88add71c25	amd: increase sdma copy size (#8989 ) * amd: increase sdma max copy size * rm this * fix * fx * ops	2025-02-09 20:53:35 +03:00
qazal	7eba5fb413	Tensor.empty is RESHAPE(BUFFER) (#8987 ) * empty is RESHAPE(BUFFER) * eh * add test_empty_buf * can we unsupport this * linter * Revert "can we unsupport this" This reverts commit `0f71e1aadb`.	2025-02-09 18:42:51 +01:00
qazal	55351ebb31	minimal failing test for #8975 [pr] (#8982 )	2025-02-09 14:10:37 +01:00
nimlgen	e5a3f60fc2	am: remove libpciaccess dep (#8980 ) * am: remove libpciaccess dep * offset in mockhwiface * op * fake regions	2025-02-09 16:06:55 +03:00
George Hotz	0b26cee2f1	fix some slow tests [pr] (#8979 )	2025-02-09 15:57:04 +08:00
George Hotz	a3c78d47b3	speed docs + upgrades [pr] (#8964 ) * add some docs about speed [pr] * better torch gemm * enable locals on llvm/clang * disable locals for beam speed on LLVM/CLANG * 0x20 alignment in llvm allows ymm use	2025-02-08 17:28:52 +08:00
chenyu	cfd28517df	move pow folding tests to test_schedule [pr] (#8955 ) not really belongs to test_const_folding	2025-02-07 12:51:43 -05:00
George Hotz	c2b4c43edb	handle stride 0 reduce (#8068 ) * handle stride 0 reduce [pr] * more test fixups * a few more --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-02-07 15:40:58 +01:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
Bhavya Gada	3b67712892	[bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple (#8937 ) * fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple * remove expectedFailure --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 10:07:54 +08:00
George Hotz	f54242849d	failing test for the devectorize [pr] (#8940 ) * failing test for the devectorize [pr] * add DEVECTORIZE to method_cache	2025-02-07 09:44:54 +08:00
chenyu	a092b6395d	Tuple -> tuple, List -> list [pr] (#8936 )	2025-02-06 14:21:19 -05:00
qazal	79fb5c6470	hotfix: test_shard_no_recompile shouldn't rely on schedule order [pr] (#8928 )	2025-02-06 16:27:59 +02:00
George Hotz	ae45826758	hotfix: GRAPH_ONE_KERNEL + fix timing	2025-02-06 17:52:20 +08:00
George Hotz	1c53e8bf27	Revert "objc fast msg (#8922 )" (#8926 ) This reverts commit `c3f99a727e`.	2025-02-06 17:50:49 +08:00
George Hotz	c3f99a727e	objc fast msg (#8922 ) * benchmark kernel launch * don't realize unneeded * faster * faster metal * fix mypy * new objc message style [pr] * without sync * no div 0 * lru cache that * no sync in the profile * fix * update all to new style * remove comment * graph one kernel * fix graph one kernel * remove that sync	2025-02-06 17:49:06 +08:00
George Hotz	a8e54df363	benchmark single kernel launch (#8921 ) * benchmark kernel launch * don't realize unneeded * faster * faster metal * fix mypy * without sync * no div 0 * lru cache that * no sync in the profile	2025-02-06 13:35:34 +08:00
Josh Moore	44e0eab8fd	Fix AttributeError occurring after ValueError in _apply_uop (#8905 ) * Fix AttributeError occurring after ValueError in _apply_uop * Update tensor.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-06 10:56:29 +08:00
chenyu	30695da256	remove Tensor._to_const_val (#8917 ) * remove Tensor._to_const_val added a TODO for advance indexing on const, which was the last place that checks const in Tensor * that is not folding now * one more	2025-02-05 21:44:39 -05:00
uuuvn	09ec33a578	Better errors when relocating against undefined symbol (#8902 )	2025-02-06 10:13:44 +08:00
chenyu	488200f16c	move more pow const to rewrite (#8916 ) * move more pow const to rewrite one less use of _to_const_val * fix	2025-02-05 20:30:12 -05:00
chenyu	76671381aa	move positive const ** t to a rewrite rule (#8914 ) * move positive const ** t to a rewrite rule * one more test	2025-02-05 19:30:12 -05:00
chenyu	189bfa164e	enable backward test for pow(neg const x) (#8912 ) backward works now. 0x still does not work because it's a special case fixed in transcendental	2025-02-05 15:35:21 -05:00
Ignacio Sica	aec3b8d515	add regression test: `test_get_kernel_actions_preserves_actions_state` (#8907 ) * test_get_kernel_actions_preserves_actions_state * simplify * simplify * refactor assert message	2025-02-05 14:13:01 -05:00
Ignacio Sica	15f94ac964	TC_SEARCH_OVER_SHAPE to search multiple TC shapes (#8793 ) * squash search over search * refactor assert * init benchmark * cleaner get_kernel_actions * cleaner get_kernel_actions * add comment	2025-02-05 11:03:46 -05:00
qazal	6f0cc2e9c5	rename to KernelContext and move the linearize_sched comment [pr] (#8899 ) * rename to KernelContext and move that comment [pr] * 500	2025-02-05 07:49:58 +01:00
George Hotz	c1c5227acb	preserve size in dtype ptr [pr] (#8898 )	2025-02-05 14:38:57 +08:00

... 24 25 26 27 28 ...

4618 Commits