tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-31 09:48:25 -05:00

Author	SHA1	Message	Date
qazal	f64fa51a64	process replay for test/* (#4799 ) * add input to unit tests [run_process_replay] * add setup [run_process_replay] * run tests [run_process_replay] * add cuda and amd [run_process_replay] * run everything but BEAM=2 [run_process_replay] * skip export_model [run_process_replay] * fix amd CI * add concurrency back	2024-06-03 12:01:58 +03:00
Timmy	ca32921f84	Multireduce PADTO Test (#4785 ) * padto test * expanded multireduce padto tests * cuda doesnt run on ci * moving padto_where_multireduce test to SUM so that we can check the reduce axis * cleaning up tests some more * add wanna_outputs * refactor test_padto_sum_multireduce * fix max and refactor where * fix axis --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-02 13:46:53 +03:00
chenyu	1ffa5ec492	unit test ShapeTracker.consecutive (#4800 )	2024-06-01 10:10:51 -04:00
chenyu	8942230b1f	minor cleanups of test_tensor and extend some cases (#4794 )	2024-05-31 10:43:22 -04:00
qazal	637f482588	configure derandomizing CI tests (#4793 )	2024-05-31 17:06:58 +03:00
chenyu	7cc883ecee	CMPLT is safe to pad (#4790 ) 0 < 0 evals to False	2024-05-30 22:50:48 -04:00
chenyu	236390aafb	fix lazy r const folding with variable shape (#4783 ) currently not supporting const fold symbolic shape. I think it's possible with a refactor to Tensor.from_node. also added some failed required tests for symbolic arange.	2024-05-30 15:19:28 -04:00
chenyu	4921de1945	fix cumsum of 0-d tensor (#4781 ) * fix cumsum of 0-d tensor * _resolve_dim for all	2024-05-30 12:41:09 -04:00
chenyu	4cf0eadf8f	failed test case for ellipsis in einsum (#4779 ) from #4156	2024-05-30 11:14:42 -04:00
Alec Chen	e89bc42cc7	Add UOps pattern matcher regression tests (#4725 ) * add pattern matcher regression tests * Remove test for dtype str after rebasing * Make test uops match type spec * leave const const, add const alu vin test * correct uops * actually correct uops	2024-05-30 17:12:20 +03:00
qazal	c2945be0a3	add fused tensor core opts tests (#4775 ) * add fused tc opts tests * n=64	2024-05-30 13:50:00 +03:00
chenyu	f1bf916b8a	apply NOOPT in test_arange complexity (#4774 ) with hcopt, arange(2560) uses less ops than arange(256)	2024-05-29 23:12:35 -04:00
chenyu	cde7a7cda7	isolate the 134ms kernel in train_gpt2.py (#4773 ) 133ms on tinybox red with BEAM=2	2024-05-29 17:26:24 -04:00
chenyu	59c6472b9f	check contiguous in View.create after canonicalizing mask and offset (#4770 ) mask / offset / strides can change during canonicalization, and contiguous can be True at the end	2024-05-29 11:31:13 -04:00
nimlgen	019f4680e5	check dims before execution on nv (#4756 ) * check dims before execution on nv * fix linter	2024-05-28 16:57:28 +03:00
qazal	0e824741c4	pre multi reduce codegen/* cleanup (#4755 ) * refactor self.reduceop * free lines * fix test	2024-05-28 08:15:48 -04:00
chenyu	53b9081aab	check arg types of Tensor.randint (#4751 ) raise TypeError if low, high, dtype are not ints	2024-05-27 20:24:10 -04:00
qazal	0e69b22629	multireduce OptOps tests (start) (#4733 ) * start * full tests * add skips * unrelated * notes	2024-05-27 12:21:33 +03:00
qazal	c7b1d802f1	delete duplicate tests in test_linearizer (#4723 ) * delete duplicate test test_simplify_uop isnt needed max works * ci * remove skip * add skip back	2024-05-26 08:11:42 +03:00
Szymon Ożóg	de5c69c4c9	Unify test_dtype naming conventions (#4730 )	2024-05-25 10:12:40 -04:00
chenyu	7e90026eb0	pow cleanup part 2 (#4727 ) more cleanups and fix 0 ** 0	2024-05-25 07:17:40 -04:00
chenyu	31358cbea5	change Tensor.stack to method (#4719 )	2024-05-24 17:04:19 -04:00
Szymon Ożóg	212025b53c	Int mulacc for ptx (#4680 ) * IntMulacc * don't mov const * Dont do int mulacc on ocelot * Workaround for ocelot * Remove ocelot workaround * Fix tests that merged into mulacc * fix uop cout after mergin to mulacc	2024-05-24 15:20:48 -04:00
qazal	c170ddceaf	fix commavq benchmark (#4712 ) * fix _slice and assert explicit device * with _slice	2024-05-24 19:40:57 +03:00
Szymon Ożóg	84255069e7	Fix int8 and uint8 on PTX (#4711 ) * Fix mem type for uchar * Bring tests back	2024-05-24 11:08:52 -04:00
chenyu	4398cc3654	update test_linearizer.py (#4707 ) tests passed locally on tinybox green. Also unified test skipping with local/shared/float4/tc	2024-05-23 22:41:22 -04:00
Francis Lam	49225522aa	wmma: chain unrolled WMMAs and phi only at the end (#4703 ) * wmma: chain unrolled WMMAs and phi only at the end * fix linter and tests * reduce lines	2024-05-23 17:50:18 -04:00
chenyu	eb714a600d	fix UOps.CAST noop for vectorized dtypes (#4704 ) * == * add test * not lazyop * use str comparison for PtrDType --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-05-23 17:33:29 -04:00
qazal	532c9e08e3	proposal: PHI nodes in TC shouldn't have children inside the loop (#4694 ) * expectations from UOpGraph * one with children * minimal repro * replace	2024-05-23 15:11:26 -04:00
Szymon Ożóg	9a9963ba7b	Remove uops deepcopy from PTX (#4671 ) * Remove uops deepcopy from PTX * Update test * Fix test * fix for non-ptx * Clean --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-05-22 23:14:17 -04:00
chenyu	47aba47f64	update Torch.gather api (#4692 ) * update Torch.gather api gather(self, dim, index) to match torch * fix that	2024-05-22 21:54:06 -04:00
qazal	498cf3e7e0	fuzzer path search for DEFINE_ACC (#4656 ) * insert acc * add test_ops * find toposorts * todo - not yet ready * remove the import * atol and childless children	2024-05-23 00:50:01 +03:00
qazal	f11a81f707	isolated test for BEAM=2 llama wrong uops toposort (#4687 ) * add ast * skip test in CI	2024-05-23 00:47:37 +03:00
Francis Lam	721f9f6acf	test/external/verify_kernel: fix LOGKERNS variable name in comments (#4685 ) should've been changed with the LOGKERN to LOGKERNS change	2024-05-22 17:08:40 -04:00
qazal	c5f5755328	correctness test for multireduce nested locals (#4682 ) * nested locals test * move st	2024-05-22 19:35:35 +03:00
qazal	d12d412e8b	revert uops dtype in pattern matcher (#4681 ) This reverts commit `5f84cbb5df`.	2024-05-22 14:45:51 +03:00
chenyu	0f21aa0416	example kernel that triggers Memory access fault for resnet on red (#4678 )	2024-05-21 18:59:36 -04:00
qazal	5f84cbb5df	keep UOps.CAST in PHI-GEP fold for unmatching dtypes (#4674 ) * these should be val.dtype * cast float4 and float2 to root * document tests * 2 args * fix assert * match dtype * no extra lines * better fix	2024-05-21 14:59:49 -04:00
qazal	458a3961eb	catch compile errors in uops tests (#4672 ) * use helper and compile * llama beam=2 * ast length * skip float4, fix hsa * use empty tensors	2024-05-21 12:20:35 +03:00
Timmy	de733d73cf	Multireduce Linearizer Tests (#4665 ) * updated tests * make sure the upcasting tests actually causes the problem * diff cleanup * use UOpGraph utils --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-05-21 02:43:25 +03:00
qazal	b33c827aed	UOps.RANGE toposort spec (#4660 ) * use iterator * nested loops and outer loads * uop after phi	2024-05-20 23:38:20 +03:00
qazal	0d9e623d83	consolidate uops tests (#4659 ) * merge uoptimize * move tests * fix skip message	2024-05-20 21:42:31 +03:00
Szymon Ożóg	1e7b7b2c3c	Fix flop coutning for mulacc (#4640 ) * Fix flop coutning for mulacc * add test_simple_mulacc * Update test_uops_stats.py * Update test_uops_stats.py * revert test_mulacc * Test for MULACC vs MUL+ADD	2024-05-20 12:06:00 -04:00
nimlgen	c9f7f2da70	nv hcq bind api (#4629 ) * hcq bind api for nv * linter * linter * add test * small comment	2024-05-19 23:17:10 +03:00
qazal	d308f4fa9a	correctly insert UOps.END* in fuzz result (#4653 )	2024-05-19 21:10:28 +03:00
chenyu	456aa0b656	update test_search kernel count (#4652 ) integration test that beaming 1 kernel increments kernel count by 1, and moved exiting test_kernel_count to TestTimeLinearizer	2024-05-19 13:54:52 -04:00
qazal	954718e6bf	reorder DEFINE_GLOBAL in fuzz_uops (#4651 ) * globals base * test: opt out of DEFINE_GLOBAL * do it like ExecItem	2024-05-19 20:51:31 +03:00
Léo	967e35f8b8	fix(beam): GlobalCounters kernel count increasing when clearing l2 (#4598 ) * fix(beam): GlobalCounters kernel count increasing when clearing l2 * fix: removed the NOSTATS var by adding do_update_stats to Tensor.realize() * test(search): regression test for _time_program, should not increment kernel_count * fix(test_search): unused var and now properly checking when l2 is cleared * fix(test_search): added assert message * fix(test_search): now testing public beam api for kcount * ruff fixes --------- Co-authored-by: Léo Paillé <leo.paille@enseirb-matmeca.fr>	2024-05-19 10:03:47 -07:00
George Hotz	4753283221	LOOP -> RANGE (#4650 )	2024-05-19 06:40:20 -07:00
chenyu	286b4dbdf2	compile raise CompileError and skip only RuntimeError in multiprocess… (#4646 ) * compile raise CompileError and skip only RuntimeError in multiprocess beam renderer error with multiprocess should not be skipped by beam * use `==` for dtype to dtype comparison * that needs to be is * typo	2024-05-19 00:25:25 -04:00

1 2 3 4 5 ...

1847 Commits