tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-03 19:25:06 -05:00

Author	SHA1	Message	Date
chenyu	3060e0be4f	add vmin vmax of SPECIAL (#5670 ) * add vmin vmax of SPECIAL folded stuff like (-1 < gidx0) * flaky	2024-07-23 22:55:54 -04:00
chenyu	01fe00e055	skip test_failure_39 in CI (#5660 ) took more than 2 minutes in ci metal, it's basically the same as test_failure_37 but 20X bigger	2024-07-23 14:47:05 -04:00
George Hotz	386fb5e7f8	folding without UNMUL (#5628 ) * folding without UNMUL * fix failures, index_collapse * import ReduceOps * test_arange_4096 isn't folding	2024-07-21 20:14:44 -07:00
George Hotz	fa7e734b49	MetaOps.KERNEL (#5543 )	2024-07-17 19:41:23 -07:00
Francis Lam	c4eb30a04c	test/test_linearizer_failures: add a new beautiful_mnist one (#5531 ) * test/test_linearizer_failures: add a new beautiful_mnist one this one is from a DEPTH=2 fuzz_linearizer search * add GPU to test_failure_40 --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-17 16:27:04 -04:00
George Hotz	1a68854766	PatternMatcher add (#5532 ) * PatternMatcher add [run_process_replay] * f4 dynamic * test_failure_36 is fixed * fix PTX	2024-07-17 12:44:42 -07:00
George Hotz	158221b36b	expand tests from uop_expander [run_process_replay] (#5524 ) * expand tests from uop_expander * more changes from the branch	2024-07-17 09:22:36 -07:00
George Hotz	42c25cc961	fix fixup_ast (#5523 ) * fix fixup_ast * these lin failures are fixed	2024-07-17 08:52:21 -07:00
Francis Lam	2d53abb04a	test/external/fuzz_linearizer: fix for new AST changes (#5519 ) * test/external/fuzz_linearizer: fix for new AST changes also add beautiful_mnist failures * add CLANG and LLVM to test_failure_35 failed_platforms * fix test_linearizer_failure names	2024-07-17 00:08:07 -04:00
chenyu	07ff4b7d24	test_failure_33 ast that has UOps.UNMUL after linearize (#5504 ) * test_failure_33 ast that has UOps.UNMUL after linearize * smaller	2024-07-15 22:54:23 -04:00
George Hotz	03c2dc8bd7	lowerer is kernel [run_process_replay] (#5437 )	2024-07-12 18:50:55 -07:00
George Hotz	870dc8c350	s/Linearizer/Lowerer [run_process_replay] (#5428 )	2024-07-12 15:54:07 -07:00
qazal	c4fdb9c725	second iteration on verify_lazyop (#5140 )	2024-06-25 09:44:32 +03:00
qazal	18e70deec3	verify_lazyop (#5124 ) * start verify_lazyop * bfs order * assert * assert shapetrackers 2 * refactor * more iteration * skips * that ast was wrong too	2024-06-24 13:45:35 -07:00
Francis Lam	8d33998e0d	[run_process_replay] linearizer: fix get_grouping_dims to respect global/local max (#4855 ) * linearizer: fix get_grouping_dims to respect global/local max * fix lidx variable index offset and unrestrict clang/llvm global len * test reverse variable indexing when reverse_dims is true * change the collapse axis to be the right most if reversed	2024-06-18 16:51:27 +03:00
Junjun Dong	c8cd6e725c	Remove BinaryOps.SUB. Replace SUB by ADD and NEG in all tests. Regenerate dataset (#4977 ) * feat: remove BinaryOps.SUB * remove SUB in test_early_end_local * regenerate dataset. remove SUB in test_linearizer_* * reenable overflow tests * simplify tensor.sub function by returning a+(-b) * remove whitespaces --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-18 09:06:13 -04:00
Jhenner Tigreros	dc9e9e4363	Convert BinaryOps.DIV to UnaryOps.RECIP and BinaryOps.IDIV (#4887 ) * Create UnaryOps.RECIP and BinaryOps.IDIV and changing uses of BinaryOps.DIV * Delete unused import * Add cstyle renderer * Fix formatting text * Fix test error due to bad implementation of renderer * Add PTX support * Add RECIP to LLVMIR * Remove BinaryOps.DIV from symbolic test * Change some test and fix C floor division * Change references to DIV for the RECIP or IDIV * Add mimic idiv for symbolic test * Restore floor * Mimic idiv * cast to int * Fix some test and renderer * Remove DIV for render nodes * Resolve issue with div * Add TestRenderer * Fix test * fix error * Fix PAD test * Fix div implementation * Remove DIV * Add upcast to rshift, due to use of MUL and RECIP on DIV * Fix linter * Remove complete BinaryOps.DIV * Fix lint * Fix some test * Revert mul modification * Fix tests * Fix CLANG for uops * Revert IDIV function * Minor fix * modify pattern matching rule to support nan * Fix UNSAFE_PADS_OPS to add UnaryOps.RECIP * Remove const folding for IDIV and fix PTX * Complete remove IDIV from extra * Remove test_div from TestFloatUOps due to test on recip * Fix linearizer * fix * Fix test_22 * Fix llvm * Apply trunc function for llvmlit * use floor instead of trunc * Use correct type * Generate new fuzz db * Fix rshift, do not cast to float to support idiv * Return upcast=false to rshift * Add to unsafepad BinaryOps.IDIV * Remove RECIP override for CUDA * add atol / rtol for the test * Remove cast to int on IDIV * Regenerate sops * delete sops.gz * regenerate * regenerate * regenerate * Reduce margins * pass atol and rtol as parametersg for _test_metrics * regenerated dataset * Regenerate * Remove duplicated * Revert changes on extra * Remove changes extra and NOQA for test * Remove E501 * Remove and change line * Remove E501 * Fix atan2 * Revert import and E501 * Remove E501 * Add hrcp to halp ops * Remove 1 of hrcp * Remove last DIV and add type check on uops for IDIV * Fix new tests * Fix tests and custom function * Regenerate dataset * Regenerate dataset * Revert dataset * Change generate dataset script * Remove line * Change IDIV, type checker validate if x,y and z are int --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-06-14 02:43:46 -07:00
chenyu	a21ea165bc	skip linearizer test_failure_22 on llvm (#4937 ) getting flaky recently	2024-06-12 16:03:38 -04:00
Timmy	720c700a8a	Multireduce-Kernels: Linearizer Changes and Tests (#4259 ) * basic tests * cleanup * pylint * ruff * use define acc as a proxy for rendered reductions * use define acc as a proxy for rendered reductions * recursive reduceop rendering via ast_parse * linters + cleanup * fixing late buf loading * plus linters * removing extra line * linters * does this break ci? * added tests and if add end change * typo in add_ends * linters * removing comments * allow endifs to be inserted before the end of the graph * find add ENDIF before next BARRIER * removing tests with manual ENDIF + linters * specifically the next barrier aftr the store of the local result * Revert "specifically the next barrier aftr the store of the local result" This reverts commit `b288a5c3ce`. * keeping up to date * linters + merge changes * cleaning up old bad decisions * linters and opts * mrged linearizer tests * fixing merge issues * removing the big ugly uop test (functionality tested end-to-end by test_linearizer additions * small diff fixes * updating linearizer to work without uops.add( ... cachable) * linters * comment in multireduce tests * skipping tests without locals * full tests * linters * load_cache[key] fix for multiple accs * linters * assert only one reduceop * fix loop_scope test to actually cause an issue * self.load_cache[key] key for DEFINE_ACC changed to use a string to make sure each acc is unique * updated tests * fixing merge * removing debug prints * complete merge fix * linters * diff cleanup * adding tests in * give each reduce it's own local buffer * gpu=1 changes * store and load locals with upcasting * modifying test? * make multireduce_netsted_local_upcast test match single reduce shapes * removing todo * cleaning up the diff * unroll test * unroll and upcast tests * fix gpu * seq and self.load_cache[key] cleaning * linters * padto works * merge fixes * fixes * add skips for amd * linters + seq * cleaning & more tests * softmax tests * linters * [run_process_replay] * add new tests back This reverts commit `19dec22e01`. * more hardcoded -1s * fix ptx * Fix name for loop in ptx * cleaning up the diff * cleaning up the uops diff * nv ci is too slow --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: Szymon Ożóg <58388001+SzymonOzog@users.noreply.github.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-06-12 13:29:43 -04:00
chenyu	0f21aa0416	example kernel that triggers Memory access fault for resnet on red (#4678 )	2024-05-21 18:59:36 -04:00
George Hotz	07b350a8f4	new uops is an actual graph (#4560 ) * new uops is an actual graph * it's way slower * simpler * fix define acc * render_loop unique * ops test pass * add pattern matcher back, there's bugs * rewrite * use priority queue * recursive children * fix tests * fix tests with SINK * fix abstractions * fix assembly * simpler * link define_acc * fix DEFINE_ACC placement * type verify * full cmp * fix cmp * ACCESS_ACC * insert DEFINE_ACC * fix PHI * recursive rewrite * fix many tests * sum collapse * more patterns * correct change * fold arange * fix that lin test * space * big folding rule works * close * has more maxes, meh * cached node replace * set changed * simplest folding yet * works * works * DIV * all tests pass * del * fuzz linearizer fails * sum_collapse * test depth 2 cf * fix lin test 14 * fix clang depth * disable that * failure 14 is fixed * fix ptx * failure 27 is fixed * fix llama * run_cnt * Revert "Optimize PTX gated loads index calculation (#4304)" This reverts commit `d97d5a7689`. * fix uops loop * fix ptx bugs * add barrier * print * mem_type in ptx direct * bypass tests that fail in CI but pass locally * ptx remove ptr_ar * more ptx passing * fix ptx tests * assert compile support * remove model inference benchmark from red	2024-05-17 18:00:18 -07:00
uuuvn	639ea5b0f2	Metal linearizer failure 22 is flaky not just on CI (#4617 ) * METAL doesn't fail anymore, not just on CI * oops	2024-05-16 11:31:23 -04:00
nimlgen	eb9689336e	nv mockgpu (#4600 ) * mockgpu nv * works * comment that out * fix merge * setup gpuocelot * install packages * not run all of them * passes * fix ci * almost * should pass * linter * linter 2 * try this? * ugn, not supported * ci * remove ticket from description * better descs	2024-05-15 23:46:08 +03:00
Ahmed Harmouche	662bca8134	Split UnaryOps.CAST into CAST and BITCAST (#4487 ) * Separate cast and bitcast * Fix lint * No more arg[0] * Revert "No more arg[0]" This reverts commit dee6911335513f092fe2cbb9684e8a9d26aad964. * CAST/BITCAST arg is the dtype only, no more tuple * No image bitcast, regenerate dataset * Small fixes	2024-05-15 11:43:31 -04:00
George Hotz	ff64bcab69	move graph/search to engine (#4596 )	2024-05-14 23:12:59 -07:00
chenyu	afe020710d	disable PADTO on upcasted axis (#4444 ) fixed test_failure_31. PADTO upcasted is at best a no-op, and might fail at edge cases.	2024-05-05 21:52:03 -04:00
Francis Lam	c8595a9655	update sops.gz, fix tests and add new linearizer test (#4437 ) * update sops.gz, fix tests and add new linearizer test * remove METAL CI skip for test_failure_22 * re-add skip to METAL CI to test_failure_22	2024-05-05 17:31:25 -04:00
chenyu	3f3af0fb85	test_linearizer_failures 29 passes now (#4215 ) TC + PADTO fixed	2024-04-18 19:49:23 -04:00
chenyu	1fa0351acb	fix DEFINE_ACC invalid_value to have same type as localtype (#3980 )	2024-03-28 19:21:17 -04:00
Patrick Tsai	e27129a798	Fix linearizer failure 26 test (#3906 ) * Adjust adds between WHERE and PHI * Not much better * undo recursive change * hm * iterate over where, not factored op * oo * consts only for loop * UNdo var name change * update --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-03-24 16:34:13 -04:00
Francis Lam	0145366323	wmma: fix the AMD TC threads to split the first 16 threads (#3904 ) previously it was incorrectly aliasing 16 into the size 8 upcast on the store alias. now it splits it properly into 8 and the remaining 2 into the correct local stride	2024-03-23 21:17:42 -04:00
chenyu	a2b2597fc2	replace dtype.name str with render_dtype (#3903 ) fixed some bf16 cast issue since it does not have `.name`. also more robust if there are lang specific type override	2024-03-23 19:25:48 -04:00
chenyu	30fa03243e	reuse fuzz_linearizer.compare_linearizer in test_linearizer_failures (#3861 )	2024-03-21 14:12:27 -04:00
chenyu	33dd99acf4	remove helper_add_store from test_linearizer_failures (#3860 )	2024-03-21 12:53:31 -04:00
Francis Lam	131bbb6563	test_linearizer_failure: add failure 27 from a gpt2 kernel (#3825 ) * test_linearizer_failure: add failure 27 from a gpt2 kernel found during a full fuzz test of applied_opts combos to a depth of 4 on the gpt2 kernels w/o GROUPTOP. added additional examples to failure 26 that don't have GROUPTOP * add other platform failure	2024-03-19 16:29:50 -04:00
Francis Lam	9851e2c3b9	test_linearizer_failure: add failure 26 from a gpt2 kernel (#3821 ) found during a full fuzz test of all applied_opts combos to a depth of 3 on the gpt2 kernels	2024-03-19 13:19:54 -04:00
chenyu	ac866eaf5a	disable simplify_phi_loops (#3812 ) * disble simplify_phi_loops this breaks BEAM search GPT2. * skip that	2024-03-18 19:25:26 -04:00
Francis Lam	a7afd2f6bf	test_linearizer_failures: add failing kernel from GPT2 CUDA (#3808 ) * test_linearizer_failures: add failing kernel from GPT2 CUDA * test_linearizer_failure: remove "HIP" from failed_platforms	2024-03-18 17:16:40 -04:00
qazal	e3e89c244b	multioutput uoping infra (#3706 ) * linearize multioutput * add vars to copy	2024-03-15 21:56:59 -07:00
chenyu	a2d3cf64a5	move is_dtype_supported to test.helpers (#3762 ) * move is_dtype_supported to test.helpers updated all places that check if float16 is supports * fix tests	2024-03-15 14:33:26 -04:00
nimlgen	6b8c66e04f	fix broken loops in llvm (#3751 )	2024-03-15 11:57:51 +03:00
nimlgen	6bf11a2ce3	fix incorrect direct store with gep (#3735 ) * fix incorrect direct store with gep * better comment * phi as well * dtype check there * mypy happy? * not used * renames * phi in phi	2024-03-14 20:58:50 +03:00
qazal	43953c0ba9	skip grouped store for umatching upcasts (#3723 ) * skip if upcasts dont match * outputs match now * this ast is hardcoded --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-14 01:18:31 -04:00
nimlgen	08064a0e29	add SEED env to fuzz_linearizer (#3713 ) * add SEED env to test/external/fuzz_linearizer.py * found some * more platforms	2024-03-13 18:08:42 +03:00
chenyu	e1b2a82d89	fix st.real_size can be nagative if valid is always false (#3708 ) two followups after this. (1) if a buffer is never accessed in kernel, it can be removed from input (2) real_size can be smaller conditional on valid being true (the old validhack stuff)	2024-03-12 20:34:07 -04:00
Francis Lam	b6e2495fdd	kernel: limit shared memory usage when adding opts (#3705 ) * kernel: limit shared memory usage when adding opts * search: remove unnecessary limit on search space apply_opt will do the more correct check	2024-03-12 17:06:21 -04:00
Patrick Tsai	971d7f5d7c	O(n) arange attempt (#3530 ) * It works? * Clamp correctly * Refactor * Make code better * Undo some stuff * First step to trying to make floats work * Floats work in Python op but not metal because int div is different Python integerdivision was implemented as // which rounds towards negative infinity, but C integer division rounds towards 0 so there is an off-by-1 division error * arange does cumsum with ints and then multiplies by step This is so loop optimization can remain int only * Undo a lot of symbolic changes * Final check * Cleanup * There can be multiple phis * Fix multiple phi op removal * const sets dtype correctly * Fix bugs * Fix a couple bugs and add loop vars to resolve * missed one * Don't trim too many ops * Fix symbolic test * Use ones instead of full * Delete test * Lint passes * max node error * Small updates to loop logic * Remove unnecessary changes * We are getting somewhere * Simple case * Fix * rm, prn * Better * If NumNode doesn't work then continue * clamp is needed for arange(256) * Move everything into the optim fn * Replace correctly * Order optimizations better * Delete * mypy * Test for simplification * Rename * Fix test * update test description * Undo more * Cleanup * No replaced_ops map * Fix lint * AssertionError * back again * Reinstate assertion * Return true and make diff not as big * Bigger range for test * Change cumsum impl * fix bug * make big cumsum work * lint * Undo cumsum 2-stage removal * No while helper * optional min/max clamping * floats work * rm giant arange test * fix python cast None * Check phi parents * one phi allowed per where * Fix one phi per where * Rework iteration * Delete assertions * convert to int * Try mul -1 instead of neg for hip..? * Remove one phi per where requirements * one accum only * Lint * should simplify a loop at a time * Don't get rid of loop explcitly * Need to iterate backwards * lint * unary neg * Make optim work for onnx and sum_pad_collapse * Better message * filter alu ops correctly * Fix the limiter * lint and simplify * Add it back * off by one error * test wheres and phis * test max ops and non-if stuff * <= * cast_scalar * Oops * Change test * Pass loop uops instead of a modified map * Cut param transfer between linearizer and uops * Fix issues * Fix lint * fix efficientnet python 3.8 invalid syntax * distinct vars in seen_vars * accurate var names --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-11 16:09:20 -07:00
chenyu	915f98791c	use custom KernelOptError in kernel opt (#3661 ) be more specific about invalid kernel opt, used that in test_linearizer_failures. make BEAM kernel search work even with assertion disabled. `BEAM=2 python3 -O examples/llama.py --temperature=0 --count=10 --prompt="Hello." --timing`	2024-03-08 15:36:16 -05:00
chenyu	1130c73844	add FUZZ_NTH to fuzz_linearizer (#3656 ) * add FUZZ_NTH to fuzz_linearizer also update tests in test_linearizer_failures to not just run on METAL * update failures for HIP/HSA * test_failure_21 LLVM PADTO	2024-03-08 09:16:49 -05:00
chenyu	b282a45e39	fix direct store float4 with same vin (#3652 ) In a kernel that stores expanded value, the vin of float4 can come from same source, and we only remove once in that case.	2024-03-07 18:11:50 -05:00

1 2

88 Commits