tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-17 02:48:03 -05:00

Author	SHA1	Message	Date
chenyu	0fc43c2e54	fix test_const_tensor_index index (#11660 ) index should be ints	2025-08-13 19:50:16 -04:00
chenyu	0d8a0d7a96	update test_multi_const_folding_tensor to include pow (#11635 ) pow folds now	2025-08-12 13:35:37 -04:00
chenyu	7ee3770961	FUSE_ARANGE=1 (#11427 ) * FUSE_ARANGE=1 * fix test --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-08-07 13:32:34 -04:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
qazal	ce9f12dc13	reorder cast before masking constants (#10609 ) * failing test from fuzzer * .numpy() handles bfloat16 better * const->view->cast becomes const->cast->view * update TestMovedConstFolding.test_cast_padded	2025-06-03 15:44:03 +03:00
chenyu	7bfb20757c	fix tensor int floor div (#10327 ) * fix tensor int floor div * test_float_floordiv_scalar	2025-05-21 06:46:54 -04:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
George Hotz	568d6d96e7	small changes from new multi [pr] (#10318 )	2025-05-14 20:50:59 -07:00
George Hotz	603c03bef2	fix tests for rewrite [pr] (#10167 ) * fix tests for rewrite [pr] * cleaner * delete linearize_uop * clean up the rest	2025-05-05 19:19:49 -07:00
George Hotz	d81acbeef6	multi: move shrink after copy (#10109 ) * multi: move shrink after copy * passing now	2025-04-30 10:29:51 -04:00
chenyu	c8f47c1d07	not_support_multi_device helper (#9831 ) unify the test helper to skip ci device that does not support multi	2025-04-10 05:25:29 -04:00
qazal	e162aa862d	is_realized only if buffer is allocated (#9253 ) * is_realized only if the buffer is allocated * fix the image check too * assert test_lil_model after ExecItems run	2025-02-26 08:58:08 +01:00
qazal	cbfe95d306	bring cast before view back (#9230 ) * bring cast before view back * tune it to only trigger on expands --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 01:50:39 +02:00
George Hotz	df3b320f46	rewriter -> devectorizer [pr] (#9147 )	2025-02-18 12:42:08 +08:00
quortus	5bdf0c7951	Bitcast constant folding 2.0 (#9089 ) * Prevent const folding in test_payne_hanek_reduction * Do not use list as a default parameter * Bitcast constant folding --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-17 18:08:20 +08:00
chenyu	cfd28517df	move pow folding tests to test_schedule [pr] (#8955 ) not really belongs to test_const_folding	2025-02-07 12:51:43 -05:00
chenyu	30695da256	remove Tensor._to_const_val (#8917 ) * remove Tensor._to_const_val added a TODO for advance indexing on const, which was the last place that checks const in Tensor * that is not folding now * one more	2025-02-05 21:44:39 -05:00
chenyu	488200f16c	move more pow const to rewrite (#8916 ) * move more pow const to rewrite one less use of _to_const_val * fix	2025-02-05 20:30:12 -05:00
chenyu	76671381aa	move positive const ** t to a rewrite rule (#8914 ) * move positive const ** t to a rewrite rule * one more test	2025-02-05 19:30:12 -05:00
chenyu	48349efdc1	copy is already contiguous (#8886 )	2025-02-04 17:53:33 -05:00
chenyu	cce26009f0	simplify pow to not call cos (#8877 ) use %2 instead of cos to detect even numbers	2025-02-03 12:54:18 -05:00
George Hotz	46a8c5e1e5	delete forced_realize (#8615 ) * delete forced_realize * put that back * expectedFailures * cleaner create_subbuffer * more comments --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-20 09:40:36 -08:00
George Hotz	bfbe81df71	remove cast before view (#8613 ) * remove cast before view * greener * indexing * that passes too * openpilot too * ack --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-14 15:04:58 -05:00
George Hotz	b71c51191b	tests from remove uop mutability [pr] (#8442 ) * tests from remove uop mutability [pr] * more test fix * simpler test fix * remove that	2024-12-29 12:14:10 -05:00
qazal	34987a03af	const copy folding spec + multi.py behavior [pr] (#8436 ) * const copy folding spec + multi behavior [pr] * copy from clang, move multi test	2024-12-29 23:12:13 +08:00
qazal	3a556a7e8b	fully local tensor const representation: CONST(VIEW(DEVICE)) [pr] (#8389 )	2024-12-24 16:15:56 +08:00
George Hotz	4679f9fb44	add detach to graph [pr] (#8221 ) * add detach to graph [pr] * accept failure	2024-12-13 14:21:32 -08:00
qazal	9044b0746a	delete lazy [pr] (#7801 ) * LazyBuffer = UOp * try 4 at this diff * skip optimization tests p1 * raise kernel count expectations * BIND isn't the _only_ uop that can become a tensor * fix test_ones_sum on symbolic * bump openpilot, correctness first * offset on assign is fine * uop is immutable * what if this was higher * more optimization skips * instant fold const copy * test_multitensor shouldn't expect buffer for unrealized * move copy folder to upats * start BUFFER_VIEW * kinda BUFFER_VIEW * Revert "kinda BUFFER_VIEW" This reverts commit `94b4fe3040`. * BUFFER_VIEW try 2 * linter and missed _device * pylint * keep Ops.CONTIGUOUS * always BUFFER_VIEW disk * test * cpu isn't a real device * buffer references afte del * add that back * start bringing some of these back * more test updates * simpler simplify copy * subbufer everything * this is fine with buffer view * cleanup the diff in test/ 1 * copy is one thing * diff pruning * diff pruning 2 * oh bind unbinds way too early * extra * more diff pruning * more const folding * experiment with symbolic here * Revert "experiment with symbolic here" This reverts commit `cb87d61f7a`. * Revert "more const folding" This reverts commit `2a7d258a2b`. * Revert VALID early folding This reverts commit `4074f52317`. * storing const is fine * fix test_prefer_half_buffer * iterate on test_real_world * this fixes test_train_mnist memory, breaks everything else * Revert "this fixes test_train_mnist memory, breaks everything else" This reverts commit `dccfcbe068`. * always expect buffer to exist here * temp debug: something is mutating lazydata in compile3 * Revert "temp debug: something is mutating lazydata in compile3" This reverts commit `71400f0d55`. * everything back to normal * compile3 * compile3 test * start captured jit work, that test passes * finalized memory skip set * linter err * back to base here * tiny metaop cleanup * print tensor * 4th type this unbind got me * green pickle * tensor_variable sanity * cast sanity * link from the reds * COPY sanity + minor repr change * you can exist * enable test_winograd * bye bye nbytes * danger, uop is mutating * real become * delete those from uop init * put it in buffer init * buffer inits with so much stuff * buffer pickle try 2 * toposort can't be a cached property * fix test_schedule_gc_with_inputs * remove all @unittest.skip(gc) * Revert "remove all @unittest.skip(gc)" This reverts commit `9d8d92dd85`. * reenable real world + test_schedule_gc * test: RUN_PROCESS_REPLAY=0 * fix pickle jit * test changes * reenable test_lru_alloc and TestTrain * fix imagedtype * bring pr back * reenable 3 gc tests * test_schedule better diff * disable SPLIT_REDUCEOP * test_save_all_dtypes looks fixed * fix metadata * skip that one * fix viz by not pickling buffers * simple test for const folding * bring split reduceop back * add simplify_alu * simplify_binop fixes a test * fix cast folding * disable that test * that test looks fine * changes from delete_lazy pruning p1 * cast folding and children base * test: cast folding from pruning branch * green test_sgd_4convs_fuse_conv_bw * enable some indexing folding * test_complex_backward is fixed * prune more, 295 -> 233 * fix test_multi_const_folding_literal * fix double copy * early become test * ooooops * clean up ctx in all big_graph * fix openpilot 208 kernels * train_cifar is fine now * fix CAST_BEFORE_VIEW * ever faker const * back to 13 * mark expectedFailure * fine don't create them * test_multi_const_folding_tensor --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-12 05:05:19 +08:00
qazal	b894657aa7	assert the same things without mutating or accessing internal ops state [pr] (#8157 ) * don't mutate internal state in test_lazybuffer * fix test_schedule internals * save time * third si * fine sometimes buffer_view isn't there	2024-12-11 22:01:27 +08:00
qazal	df84dc6444	unrelated test fixups from delete_lazy [pr] (#8088 ) * unrelated test fixups from delete_lazy [pr] * fine if it's scheduled later	2024-12-06 17:31:02 +02:00
qazal	435a51e10c	reduce folding simple tests [pr] (#8040 ) * reduce folding simple tests [pr] * test for view and realized src pattern * realize / buffer behavior	2024-12-05 12:22:45 +08:00
ignaciosica	597a239e28	Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725 ) * remove unaryops * remove ternaryops * remove metaops * hotfix * remove binaryops * hotfix: test_pattern_matcher --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-16 20:56:56 +08:00
chenyu	aeb1301bab	enable a few tests that work now (#7721 ) should mark the ones that are expected to work with expectedFailure, and delete and ones that are not expected to work	2024-11-15 14:30:52 -05:00
George Hotz	205befa788	move is_dtype_supported to device [pr] (#7575 )	2024-11-07 20:38:03 +08:00
Ahmed Harmouche	36488a2a43	Use is_dtype_supported in more places in tests (#7529 )	2024-11-04 09:21:15 -05:00
George Hotz	c8bf09b7d4	s/UOps/Ops (#7500 ) * s/UOps/Ops [pr] * fix	2024-11-03 11:26:10 +08:00
chenyu	b76f0c875e	lazy const fold idiv 1 (#6285 )	2024-08-26 10:29:59 -04:00
chenyu	590c0922b6	Tensor.prod (#6250 ) * Tensor.prod a new reduce op! * onnx ReduceProd	2024-08-23 10:06:32 -04:00
chenyu	21d6739237	remove UnaryOps.NEG from lazy.py (#6193 ) * remove UnaryOps.NEG from lazy.py * neg is no longer unary	2024-08-19 18:41:28 -04:00
qazal	28c75bf2a6	merge uops with ops (#6111 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-08-16 18:17:57 -04:00
qazal	c23d44c779	AST is UOp (#6030 ) * most of the work from the uops2 branch * schedule * realize * kernel * lowerer * search * green * merge uops with ops * Revert "merge uops with ops" This reverts commit `1408a59f12`. * fix benchmark * remove extra dedup	2024-08-16 22:09:00 +03:00
George Hotz	fa7e734b49	MetaOps.KERNEL (#5543 )	2024-07-17 19:41:23 -07:00
George Hotz	6707c778d0	scheduleitem is not Tuple [run_process_replay] (#5425 ) * scheduleitem is not Tuple [run_process_replay] * fix tests * fix op + fuzzers * fix mop test	2024-07-12 15:13:19 -07:00
chenyu	2396ab9b33	more transcend cleanup [run_process_replay] (#5369 ) fix test name, less # noqa: E501 and removed the cast	2024-07-10 23:05:03 -04:00
George Hotz	0215c952c5	Move transcendental to UOp level (#5367 ) * move uopgraph to file [run_process_replay] * transcendental uops * tests pass * no skip --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-10 19:06:25 -07:00
hikettei	320e7ed935	Approximations for SIN/LOG2/EXP2 passing all tests. (#5187 ) * [WIP] Added an approximated implementation of Sin(FP32, FP64) passing all tests on Clang runtime * Map nan/-inf/inf as 1.0 in order to avoid doing as_const(math.inf) * [WIP] Added a support for LLVM IR * cleaned up the code for the mypy and linter * [WIP] Updated fp64 supports (bitwise shift causes the compilation error), fixed linter issue. * [Add] added fast=true mode which disables the payne-hanek reduction which is slow * [Fix] fails to compute elements when shape includes zero * [WIP] Added BinaryOps.ADD/BinaryOps.OR to assembly * [wip] update the assembly for ptx * Enables fast=True when device is one of PTX, NV, CUDA, to avoid slow bitwise ops (as lv3 reduction is not required). * [WIP] Added an approximation of LOG2/EXP2 (FP32, FP64) * [Fix] Cyclic dependencies existing in xlog2 * [Fix] Cycle dependency in the graph of exp2, and log2. (passing test_symbolic_ops.py) * [Fix] keep using higher precision for exp2, but cycle graph issue remained to be fixed... * [Refactor] removed is_metal option. xsin does not rely on fp64 when fp32 mode. * [WIP] fp16 xsin implementation passing all tests. (still needs to be refactored) * [WIP] Added fp16 exp2 implementation * [WIP] Increased the precision of Log2 from 3.5 ULP to 1.0 ULP, and added FP16 Log2 approximation. * stashed the changes for FP16 sin * [Fix] Patch for FP16 Sin/Exp2. (updated the dtype_via, fp32_p, and lower) * [Refactor] migration to fastmath.py, some code simplification, renamed apis in fastmath, et al. * [Refactor] Added the function polyN to clean-up N-terms polynomial approximation. * [Patch] Increase fp64 precision when ldexp3k if possible, and patch for fp16 exp2 * [Patch] added bitcast_forward option * [Patch] resolved cycle graph * patch fix cycle graph * set bitcast_forward=True in ilogb2k * bitcast_forward for multi.py * E501 * Break into multiple small PRs * [Patch] FP16 -> FP64 upcast is not anymore required since xlog2 use quad precision polyN * [Patch] NV still required FP64 for xlog2 * updated schedule test * updated the count of kernels * [Update] Removed all bitwise ops (SHL/SHR), tweaked the nan manipulation of log2, passing all tests except for AMD. * Bitcast: make them api-compatible * [update] force to use bitcast * updated the count of constant folding * [Patch] Creating a mask for exp2 using x <= Inf satisfies True as long as x is a real value * [Update] isNaN(x) Free log2 algorithm, passing PTX tests, METAL with fastmath enabled is able to handle nan well, amd backend will not crash. * xsin is reluctant to call payne_hanek_reduction which is slow to compile, passing stable diffusion compilation in a realistic time * some minor simplification to payne hanek reduction * [refactor] refactored some rebundant parts existing in payne hanek * [refactor] more readable payne hanek impl * [refactor] improved the code consistency of payne hanek * [experiment] topological sort when doing _recursive_group (i dunno if this is good but at least it works.) * Revert "[experiment] topological sort when doing _recursive_group (i dunno if this is good but at least it works.)" This reverts commit `0eee08b87c`. * use allow_buffer_view * lets support multilazytensor * updated the count of kernels * [test] added the jit tests for approx ops * keep failed constant folding tests tested, added expectedFailure * explict the timeout deadline when testing approx jit timeout * [WIP] Simplified the implementation of xsin, never timeouts * [Refactor] Improved the consistency of approx sin implementation, passing time out tests * integrated xexp2_base into xexp2 * Set switch_over=39800.0 * delete: is_buffer_fastmath_supported * sin: compute against abs(x) * some cleanups * fix typo * removed the space between param and dtype * allow 514 kernels on CI for sd * [refactor] no need to upcast ad ldexp3k * [refactor] added some comments, references to help understanding the code. * [Fix] 1.0 ULP Sine Approximation for FP16 * [update] assume e != 0 * use pow2if instead of ldexp3k to fuse payne_hanek reduction into one * check if approximated sin/log2/exp are fused into one * clean up changes * test amd exp * some code cleanup and test sigmoid * fix: enabled payne_hanek for fp16 to achieve higher acc * fix: payne_hanek always accumlates the value with uint64, and fp16 sin is fused to a single kernel * [Refactor] Rename: fastmath -> transcendental * [Refactor] Added TRANSCENDENTAL, Moved the gate function to function.py * updated const folding tests * TRANSCENDENTAL as a ContextVar, removed old test of cody waite reduction, added assertions, et al. * Add: unittest.main() * Import TRANSCENDENTAL instead of getenv * Refactor: Added dtype check when TRANSCENDENTAL=2, more context var * Patch: xlog2, break expt(2, 32) x 2 -> expt(2, 16) x 4 for fp16 math --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-10 16:44:58 -07:00
qazal	981afb114f	safely fold NEG in lazy.py (#5135 ) * safe * add test	2024-06-24 19:40:37 -04:00
chenyu	36a1f38049	lazy folding: mul -1 is neg, and neg neg is noop (#4472 )	2024-05-08 01:52:22 -04:00
chenyu	c508eb7425	revert the removal of CAST_BEFORE_VIEW (#4471 ) this brings most of the memory gain for resnet back.	2024-05-08 00:14:29 -04:00
chenyu	f363f39e83	fix dtype of const folded sum (#4349 ) const folding sum should return in the same dtype the same as regular sum, which can be different from input dtype	2024-04-29 11:40:45 -04:00

1 2

63 Commits