tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-03 11:14:56 -05:00

Author	SHA1	Message	Date
Jhenner Tigreros	fa78755f19	Add new patterns to unfold division (#5139 ) * Add new patterns to unfold division * Create regression test and fix pattern	2024-06-25 18:07:47 -07:00
qazal	c4fdb9c725	second iteration on verify_lazyop (#5140 )	2024-06-25 09:44:32 +03:00
qazal	18e70deec3	verify_lazyop (#5124 ) * start verify_lazyop * bfs order * assert * assert shapetrackers 2 * refactor * more iteration * skips * that ast was wrong too	2024-06-24 13:45:35 -07:00
Francis Lam	b563cd52ed	linearizer: change globals to merge into left axis/gridDims.x first (#5033 ) * linearizer: change order of collapse to be left-most also fixes Variable max size to be correct and add docs for the off parameter * fix multiple global dim oversizes * add passing variable test and reorganize tests * use assert RuntimeError for failing test	2024-06-23 18:53:15 -04:00
qazal	28bf8d86d8	test_linearizer with multi output ASTs (#5115 ) * ast is tuple * run test_phi_simplification * update reason * more tc * beam * a few more * use test_opt directly	2024-06-23 15:41:24 +03:00
qazal	5717a54b28	don't use Tensor.empty in kernel opts tests (#5086 )	2024-06-21 18:41:03 +03:00
George Hotz	6f6b3b10c9	import from uops, not linearizer (#5064 )	2024-06-20 08:08:44 -07:00
kormann	7c3b877216	rename uop [run_process_replay] (#5031 ) * rename * fix unittests * rename vin * fix test * fix type [run_process_replay] * rm pre commit hook change	2024-06-18 21:34:05 +03:00
Francis Lam	8d33998e0d	[run_process_replay] linearizer: fix get_grouping_dims to respect global/local max (#4855 ) * linearizer: fix get_grouping_dims to respect global/local max * fix lidx variable index offset and unrestrict clang/llvm global len * test reverse variable indexing when reverse_dims is true * change the collapse axis to be the right most if reversed	2024-06-18 16:51:27 +03:00
Junjun Dong	c8cd6e725c	Remove BinaryOps.SUB. Replace SUB by ADD and NEG in all tests. Regenerate dataset (#4977 ) * feat: remove BinaryOps.SUB * remove SUB in test_early_end_local * regenerate dataset. remove SUB in test_linearizer_* * reenable overflow tests * simplify tensor.sub function by returning a+(-b) * remove whitespaces --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-18 09:06:13 -04:00
chenyu	67e8df4969	remove numpy from dtype (#4969 ) replaced all dtype.np with _to_np_dtype defined in tensor.py. after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer	2024-06-14 15:38:45 -04:00
Jhenner Tigreros	dc9e9e4363	Convert BinaryOps.DIV to UnaryOps.RECIP and BinaryOps.IDIV (#4887 ) * Create UnaryOps.RECIP and BinaryOps.IDIV and changing uses of BinaryOps.DIV * Delete unused import * Add cstyle renderer * Fix formatting text * Fix test error due to bad implementation of renderer * Add PTX support * Add RECIP to LLVMIR * Remove BinaryOps.DIV from symbolic test * Change some test and fix C floor division * Change references to DIV for the RECIP or IDIV * Add mimic idiv for symbolic test * Restore floor * Mimic idiv * cast to int * Fix some test and renderer * Remove DIV for render nodes * Resolve issue with div * Add TestRenderer * Fix test * fix error * Fix PAD test * Fix div implementation * Remove DIV * Add upcast to rshift, due to use of MUL and RECIP on DIV * Fix linter * Remove complete BinaryOps.DIV * Fix lint * Fix some test * Revert mul modification * Fix tests * Fix CLANG for uops * Revert IDIV function * Minor fix * modify pattern matching rule to support nan * Fix UNSAFE_PADS_OPS to add UnaryOps.RECIP * Remove const folding for IDIV and fix PTX * Complete remove IDIV from extra * Remove test_div from TestFloatUOps due to test on recip * Fix linearizer * fix * Fix test_22 * Fix llvm * Apply trunc function for llvmlit * use floor instead of trunc * Use correct type * Generate new fuzz db * Fix rshift, do not cast to float to support idiv * Return upcast=false to rshift * Add to unsafepad BinaryOps.IDIV * Remove RECIP override for CUDA * add atol / rtol for the test * Remove cast to int on IDIV * Regenerate sops * delete sops.gz * regenerate * regenerate * regenerate * Reduce margins * pass atol and rtol as parametersg for _test_metrics * regenerated dataset * Regenerate * Remove duplicated * Revert changes on extra * Remove changes extra and NOQA for test * Remove E501 * Remove and change line * Remove E501 * Fix atan2 * Revert import and E501 * Remove E501 * Add hrcp to halp ops * Remove 1 of hrcp * Remove last DIV and add type check on uops for IDIV * Fix new tests * Fix tests and custom function * Regenerate dataset * Regenerate dataset * Revert dataset * Change generate dataset script * Remove line * Change IDIV, type checker validate if x,y and z are int --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-06-14 02:43:46 -07:00
Timmy	720c700a8a	Multireduce-Kernels: Linearizer Changes and Tests (#4259 ) * basic tests * cleanup * pylint * ruff * use define acc as a proxy for rendered reductions * use define acc as a proxy for rendered reductions * recursive reduceop rendering via ast_parse * linters + cleanup * fixing late buf loading * plus linters * removing extra line * linters * does this break ci? * added tests and if add end change * typo in add_ends * linters * removing comments * allow endifs to be inserted before the end of the graph * find add ENDIF before next BARRIER * removing tests with manual ENDIF + linters * specifically the next barrier aftr the store of the local result * Revert "specifically the next barrier aftr the store of the local result" This reverts commit `b288a5c3ce`. * keeping up to date * linters + merge changes * cleaning up old bad decisions * linters and opts * mrged linearizer tests * fixing merge issues * removing the big ugly uop test (functionality tested end-to-end by test_linearizer additions * small diff fixes * updating linearizer to work without uops.add( ... cachable) * linters * comment in multireduce tests * skipping tests without locals * full tests * linters * load_cache[key] fix for multiple accs * linters * assert only one reduceop * fix loop_scope test to actually cause an issue * self.load_cache[key] key for DEFINE_ACC changed to use a string to make sure each acc is unique * updated tests * fixing merge * removing debug prints * complete merge fix * linters * diff cleanup * adding tests in * give each reduce it's own local buffer * gpu=1 changes * store and load locals with upcasting * modifying test? * make multireduce_netsted_local_upcast test match single reduce shapes * removing todo * cleaning up the diff * unroll test * unroll and upcast tests * fix gpu * seq and self.load_cache[key] cleaning * linters * padto works * merge fixes * fixes * add skips for amd * linters + seq * cleaning & more tests * softmax tests * linters * [run_process_replay] * add new tests back This reverts commit `19dec22e01`. * more hardcoded -1s * fix ptx * Fix name for loop in ptx * cleaning up the diff * cleaning up the uops diff * nv ci is too slow --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: Szymon Ożóg <58388001+SzymonOzog@users.noreply.github.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-06-12 13:29:43 -04:00
Timmy	887643cf34	Multireduce atomic local load/store test (#4786 ) * atomic load/store test * tests for nested & unrolled * check barriers * linters * cleaning up diff * fix assert in _temp_create_multireduce_ast changes * cleaning up the check for redundant barriers * minor cleanups for the assert * always seed randn, helps with debuggability --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-05 14:41:19 +03:00
Szymon Ożóg	e47277d18a	Disable for PTX as well (#4838 ) Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-06-05 10:37:59 +03:00
chenyu	3afc914617	CMPEQ -> CMPNE and make it safe to pad (#4818 ) * CMPNE * new dataset	2024-06-03 18:02:15 -04:00
Timmy	ca32921f84	Multireduce PADTO Test (#4785 ) * padto test * expanded multireduce padto tests * cuda doesnt run on ci * moving padto_where_multireduce test to SUM so that we can check the reduce axis * cleaning up tests some more * add wanna_outputs * refactor test_padto_sum_multireduce * fix max and refactor where * fix axis --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-02 13:46:53 +03:00
chenyu	7cc883ecee	CMPLT is safe to pad (#4790 ) 0 < 0 evals to False	2024-05-30 22:50:48 -04:00
qazal	c2945be0a3	add fused tensor core opts tests (#4775 ) * add fused tc opts tests * n=64	2024-05-30 13:50:00 +03:00
qazal	0e824741c4	pre multi reduce codegen/* cleanup (#4755 ) * refactor self.reduceop * free lines * fix test	2024-05-28 08:15:48 -04:00
qazal	0e69b22629	multireduce OptOps tests (start) (#4733 ) * start * full tests * add skips * unrelated * notes	2024-05-27 12:21:33 +03:00
qazal	c7b1d802f1	delete duplicate tests in test_linearizer (#4723 ) * delete duplicate test test_simplify_uop isnt needed max works * ci * remove skip * add skip back	2024-05-26 08:11:42 +03:00
chenyu	31358cbea5	change Tensor.stack to method (#4719 )	2024-05-24 17:04:19 -04:00
Szymon Ożóg	212025b53c	Int mulacc for ptx (#4680 ) * IntMulacc * don't mov const * Dont do int mulacc on ocelot * Workaround for ocelot * Remove ocelot workaround * Fix tests that merged into mulacc * fix uop cout after mergin to mulacc	2024-05-24 15:20:48 -04:00
chenyu	4398cc3654	update test_linearizer.py (#4707 ) tests passed locally on tinybox green. Also unified test skipping with local/shared/float4/tc	2024-05-23 22:41:22 -04:00
Francis Lam	49225522aa	wmma: chain unrolled WMMAs and phi only at the end (#4703 ) * wmma: chain unrolled WMMAs and phi only at the end * fix linter and tests * reduce lines	2024-05-23 17:50:18 -04:00
qazal	532c9e08e3	proposal: PHI nodes in TC shouldn't have children inside the loop (#4694 ) * expectations from UOpGraph * one with children * minimal repro * replace	2024-05-23 15:11:26 -04:00
Szymon Ożóg	9a9963ba7b	Remove uops deepcopy from PTX (#4671 ) * Remove uops deepcopy from PTX * Update test * Fix test * fix for non-ptx * Clean --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-05-22 23:14:17 -04:00
qazal	f11a81f707	isolated test for BEAM=2 llama wrong uops toposort (#4687 ) * add ast * skip test in CI	2024-05-23 00:47:37 +03:00
qazal	c5f5755328	correctness test for multireduce nested locals (#4682 ) * nested locals test * move st	2024-05-22 19:35:35 +03:00
qazal	d12d412e8b	revert uops dtype in pattern matcher (#4681 ) This reverts commit `5f84cbb5df`.	2024-05-22 14:45:51 +03:00
qazal	5f84cbb5df	keep UOps.CAST in PHI-GEP fold for unmatching dtypes (#4674 ) * these should be val.dtype * cast float4 and float2 to root * document tests * 2 args * fix assert * match dtype * no extra lines * better fix	2024-05-21 14:59:49 -04:00
qazal	458a3961eb	catch compile errors in uops tests (#4672 ) * use helper and compile * llama beam=2 * ast length * skip float4, fix hsa * use empty tensors	2024-05-21 12:20:35 +03:00
Timmy	de733d73cf	Multireduce Linearizer Tests (#4665 ) * updated tests * make sure the upcasting tests actually causes the problem * diff cleanup * use UOpGraph utils --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-05-21 02:43:25 +03:00
qazal	b33c827aed	UOps.RANGE toposort spec (#4660 ) * use iterator * nested loops and outer loads * uop after phi	2024-05-20 23:38:20 +03:00
qazal	0d9e623d83	consolidate uops tests (#4659 ) * merge uoptimize * move tests * fix skip message	2024-05-20 21:42:31 +03:00
George Hotz	4753283221	LOOP -> RANGE (#4650 )	2024-05-19 06:40:20 -07:00
George Hotz	07b350a8f4	new uops is an actual graph (#4560 ) * new uops is an actual graph * it's way slower * simpler * fix define acc * render_loop unique * ops test pass * add pattern matcher back, there's bugs * rewrite * use priority queue * recursive children * fix tests * fix tests with SINK * fix abstractions * fix assembly * simpler * link define_acc * fix DEFINE_ACC placement * type verify * full cmp * fix cmp * ACCESS_ACC * insert DEFINE_ACC * fix PHI * recursive rewrite * fix many tests * sum collapse * more patterns * correct change * fold arange * fix that lin test * space * big folding rule works * close * has more maxes, meh * cached node replace * set changed * simplest folding yet * works * works * DIV * all tests pass * del * fuzz linearizer fails * sum_collapse * test depth 2 cf * fix lin test 14 * fix clang depth * disable that * failure 14 is fixed * fix ptx * failure 27 is fixed * fix llama * run_cnt * Revert "Optimize PTX gated loads index calculation (#4304)" This reverts commit `d97d5a7689`. * fix uops loop * fix ptx bugs * add barrier * print * mem_type in ptx direct * bypass tests that fail in CI but pass locally * ptx remove ptr_ar * more ptx passing * fix ptx tests * assert compile support * remove model inference benchmark from red	2024-05-17 18:00:18 -07:00
nimlgen	daf57af3eb	move tc to renderers (#4631 ) * move tc to renderers * missed import * fix typo * fix * fix imports * remove from tests * fix 4607 * nv emulate timestamp * time is int * correct time	2024-05-18 00:36:29 +03:00
nimlgen	eb9689336e	nv mockgpu (#4600 ) * mockgpu nv * works * comment that out * fix merge * setup gpuocelot * install packages * not run all of them * passes * fix ci * almost * should pass * linter * linter 2 * try this? * ugn, not supported * ci * remove ticket from description * better descs	2024-05-15 23:46:08 +03:00
Ahmed Harmouche	662bca8134	Split UnaryOps.CAST into CAST and BITCAST (#4487 ) * Separate cast and bitcast * Fix lint * No more arg[0] * Revert "No more arg[0]" This reverts commit dee6911335513f092fe2cbb9684e8a9d26aad964. * CAST/BITCAST arg is the dtype only, no more tuple * No image bitcast, regenerate dataset * Small fixes	2024-05-15 11:43:31 -04:00
George Hotz	ff64bcab69	move graph/search to engine (#4596 )	2024-05-14 23:12:59 -07:00
nimlgen	9b02aef45a	remove rhip (#4579 ) * remove rhip * remove hip runner	2024-05-14 17:58:19 +03:00
nimlgen	2131556c2c	amd mockgpu (#4535 ) * start mock amd gpu * virt files * cleaner * init ci * small fixes * linter * better? * ugh * linter * fix * diable some * run shorter * fixes * add hcq test * fix * fix cmd revert	2024-05-14 14:28:04 +03:00
Filip Brzek	f7d08bd454	feat: add acc_dtype to einsum (#4571 )	2024-05-13 14:02:07 -04:00
George Hotz	b660f60125	all uops are now cachable (#4564 ) * all uops are now cachable * cachable is gone	2024-05-12 22:34:35 -07:00
qazal	2fb564c125	multi reduce linearizer tests start (#4529 ) * test_end_local * test_early_end_local * todos * mean+std * skip no locals	2024-05-11 14:06:40 +03:00
qazal	3cba22920f	test_linearizer_correctness (#4458 ) * test helper * uops asserts * cleanup args * nits	2024-05-11 13:02:08 +03:00
qazal	b3d9fd48d0	infra for testing linearizer correctness (#4528 ) * refactor outbufs * delete helper	2024-05-11 12:10:33 +03:00
George Hotz	2f970a4fc2	all realize 2 (#4527 ) * all realize 2 * tests fixup * fix more tests * fix openpilot * fix tests * unneeded	2024-05-10 22:43:09 -07:00

1 2 3 4

182 Commits