tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
chenyu	1326f29e24	fix Tensor.gather shape checking criteria (#4932 ) it's fine if `self.shape[d] >= index.shape[d]` for all `d != dim`, not for all `d`	2024-06-12 13:10:14 -04:00
George Hotz	9a3c1e4a17	fix mul div failure (#4928 )	2024-06-12 13:58:46 +02:00
George Hotz	11a03cbbf5	don't use uops.add while constructing (#4913 ) * don't use uops.add while constructing * rebase * bugfixes * have to use BFS * prove it's late * simpler uop symbolic test (why we did this) * use dict, not set	2024-06-12 13:31:34 +02:00
chenyu	fdbb4305cb	skip unsupported dtype in fuzz_linearizer (#4917 ) resolve issues in #4887. dataset generated from ubuntu but metal does not support double	2024-06-11 18:18:21 -04:00
chenyu	b886d250fb	improve test_dropout_on_shard (#4912 ) tested some basic property, also minor formatting for a few Tensor.training setups	2024-06-11 11:36:02 -04:00
George Hotz	35e53c0809	add sharded arange test (#4908 )	2024-06-11 10:58:33 +02:00
chenyu	798ea61377	widen test_ops [low, high] and more strict atol (#4906 ) default [low, high] changed from [-1.5, 1.5] to [-2, 2] (except tan). dropped several explicit atol if it's unnecessarily larger than default 1e-6. tested on mac, tinybox red / green	2024-06-10 20:47:09 -04:00
chenyu	97b05f567e	revert the .detach() in layernorm (#4904 ) * revert the .detach() in layernorm it's only correct in LayerNorm where input is the data, and not correct in GroupNorm and InstanceNorm that reused layernorm. Added backward tests for weights, bias and input for these norms. * bigger atol for llvm * relax backward more	2024-06-10 18:02:05 -04:00
qazal	8b5bcf309a	process replay in all of CI (#4884 )	2024-06-10 14:49:29 -04:00
chenyu	c8cd637236	test case for Tensor.var reducing over size = 1 axis (#4902 ) backward failed when correction >= reducing n	2024-06-10 12:11:39 -04:00
chenyu	b56ae5606c	cosmetic changes to uop _match (#4897 ) minor cleanup before fixing two level match [run_process_replay]	2024-06-09 18:29:42 -04:00
SnakeOnex	b1db2d0094	tqdm replacement (#4846 ) * tqdm replacement almost * formatting * formatting * imports * line len * fix * removed set description :( * removed set description :( * fix * fix * green check? * rewrote as class, fixed several bugs * types spacing * removed imports * fix * iterable * typing * mypy disagreement * imports * more e2e tests vs tqdm * removed seed setting * robustness against time.sleep() flakiness * flaky fix * automatic bar closing when count==total * cleanup * clang error with tqdm * tqdm back * use os lib, print to stderr (fixes the clang bug, where the bar was leaking into the generated c program * back to shutil * unit_scale + unit_scale test * custom unit to tests * pretty * clean * removed flaky test * less test iters * empty line * remove disable	2024-06-09 23:46:03 +02:00
qazal	1dde829e34	UOps.IF* to graph spec (#4894 )	2024-06-09 07:00:12 -04:00
George Hotz	b9afb0d577	test uop as symbolic (#4870 ) * start work * more tests passing * more tests passing * more * 34 failures * expect the failures * remove broken rule * render is fine in just the test * simplify and put in test	2024-06-09 12:15:11 +02:00
nimlgen	654a8b9ef7	retire hsa (#4885 ) * retire hsa * EMULATE_AMD	2024-06-09 11:33:03 +03:00
chenyu	e33efd6a3d	test cases for multitensor adds const (#4892 ) Tested const remained const in ast. Removed the TODO in _to_const_val too	2024-06-08 22:57:48 -04:00
nimlgen	d24e57c615	amd support kernel with bf16 (#4863 ) * amd support kernels with dispatch_ptr * fixes * line savings * one line * try * Revert "try" This reverts commit `5f340dfdd4`. * not used will be back when hsa is gone * gone will be back * add this as well	2024-06-08 22:52:32 +03:00
qazal	1e3325f369	raise assert [run_process_replay] (#4879 )	2024-06-08 08:31:44 -04:00
qazal	66dfd5e7bf	faster codegen process replay (#4858 ) * faster codegen process replay * use self.copy * regenerate * delete copy * test a real error [run_process_replay] * revert the error change	2024-06-07 16:20:57 +03:00
nimlgen	47bfd7c2b7	fix sync of offset buffers in graphs (#4850 ) * correctly sync offset buffers * test * style * run less * just use base	2024-06-06 16:09:45 +03:00
chenyu	99e7a1d5e9	support symbolic reshape with non-contiguous (#4844 ) * support symbolic reshape with non-contiguous pre-requisite for symbolic arange (make symbolic ones that can be folded). * test cases * typo * shorter	2024-06-05 16:01:19 -04:00
chenyu	a352b6d9ce	symbolic Tensor.var (#4843 ) taken from #4446 and add more tests	2024-06-05 12:55:54 -04:00
Timmy	887643cf34	Multireduce atomic local load/store test (#4786 ) * atomic load/store test * tests for nested & unrolled * check barriers * linters * cleaning up diff * fix assert in _temp_create_multireduce_ast changes * cleaning up the check for redundant barriers * minor cleanups for the assert * always seed randn, helps with debuggability --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-05 14:41:19 +03:00
Szymon Ożóg	273945df67	Regression tests for bitshift (#4829 ) * Regression tests for bitshift * Add test for bitshift not triggered * Enable tests	2024-06-05 11:42:34 +02:00
Alec Chen	5ac30c29d8	Construct UOps patterns using UPat (#4821 ) * Allow UPat pattern definitions * Convert pattern matcher tests to UPat constructions * Convert constant_folder patterns to upat constructions * Convert assembly patterns to upat constructions * [run_process_replay] Drop UPat.from_dict	2024-06-05 10:29:37 +02:00
Szymon Ożóg	e47277d18a	Disable for PTX as well (#4838 ) Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-06-05 10:37:59 +03:00
Francis Lam	890e7c12bb	test/external/verify_kernel: add support for single pickled kernel (#4836 )	2024-06-04 18:59:21 -04:00
Elias Wahl	04e237328b	Refactor to class style (#4804 )	2024-06-04 14:08:31 -07:00
David Hou	cddce0e168	don't cast before view on shape changing bitcast (#4833 ) * don't cast before view on shape changing bitcast * make sure cast before view triggers	2024-06-04 16:04:52 -04:00
Alec Chen	4909a0d16f	Fix arg set in pattern matcher (#4830 )	2024-06-04 15:10:09 -04:00
Alec Chen	c96026ac65	Add arg set regression test for pattern matcher (#4827 ) * Add arg set regression test for pattern matcher * real regression --------- Co-authored-by: qazalin <qazal.software@gmail.com>	2024-06-04 13:35:09 -04:00
chenyu	a70e8a80d7	test_ops test cmp with special floats (#4826 ) prepare to fix nan, it did not work with ge and le before either	2024-06-04 12:10:21 -04:00
chenyu	3afc914617	CMPEQ -> CMPNE and make it safe to pad (#4818 ) * CMPNE * new dataset	2024-06-03 18:02:15 -04:00
Szymon Ożóg	bb7b031c5c	Bitshift (#4728 ) * WIP * Cleanup * Cleanup * Fix variable, refactor to use set * right shift should be signed/unsigned * Test for bitshifts * Allow a neg	2024-06-03 21:16:01 +02:00
nimlgen	e78a9bf3f2	support view in nv/amd (#4812 ) * support view in nv/amd * fix amd * fix * run test on nv/amd	2024-06-03 22:11:52 +03:00
chenyu	45083ccb43	canonicalize 0 in shape in View.create (#4815 ) set strides to 0, offset to 0, mask to None, and contiguous to True with size 0 view.	2024-06-03 13:37:37 -04:00
qazal	f64fa51a64	process replay for test/* (#4799 ) * add input to unit tests [run_process_replay] * add setup [run_process_replay] * run tests [run_process_replay] * add cuda and amd [run_process_replay] * run everything but BEAM=2 [run_process_replay] * skip export_model [run_process_replay] * fix amd CI * add concurrency back	2024-06-03 12:01:58 +03:00
Timmy	ca32921f84	Multireduce PADTO Test (#4785 ) * padto test * expanded multireduce padto tests * cuda doesnt run on ci * moving padto_where_multireduce test to SUM so that we can check the reduce axis * cleaning up tests some more * add wanna_outputs * refactor test_padto_sum_multireduce * fix max and refactor where * fix axis --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-02 13:46:53 +03:00
chenyu	1ffa5ec492	unit test ShapeTracker.consecutive (#4800 )	2024-06-01 10:10:51 -04:00
chenyu	8942230b1f	minor cleanups of test_tensor and extend some cases (#4794 )	2024-05-31 10:43:22 -04:00
qazal	637f482588	configure derandomizing CI tests (#4793 )	2024-05-31 17:06:58 +03:00
chenyu	7cc883ecee	CMPLT is safe to pad (#4790 ) 0 < 0 evals to False	2024-05-30 22:50:48 -04:00
chenyu	236390aafb	fix lazy r const folding with variable shape (#4783 ) currently not supporting const fold symbolic shape. I think it's possible with a refactor to Tensor.from_node. also added some failed required tests for symbolic arange.	2024-05-30 15:19:28 -04:00
chenyu	4921de1945	fix cumsum of 0-d tensor (#4781 ) * fix cumsum of 0-d tensor * _resolve_dim for all	2024-05-30 12:41:09 -04:00
chenyu	4cf0eadf8f	failed test case for ellipsis in einsum (#4779 ) from #4156	2024-05-30 11:14:42 -04:00
Alec Chen	e89bc42cc7	Add UOps pattern matcher regression tests (#4725 ) * add pattern matcher regression tests * Remove test for dtype str after rebasing * Make test uops match type spec * leave const const, add const alu vin test * correct uops * actually correct uops	2024-05-30 17:12:20 +03:00
qazal	c2945be0a3	add fused tensor core opts tests (#4775 ) * add fused tc opts tests * n=64	2024-05-30 13:50:00 +03:00
chenyu	f1bf916b8a	apply NOOPT in test_arange complexity (#4774 ) with hcopt, arange(2560) uses less ops than arange(256)	2024-05-29 23:12:35 -04:00
chenyu	cde7a7cda7	isolate the 134ms kernel in train_gpt2.py (#4773 ) 133ms on tinybox red with BEAM=2	2024-05-29 17:26:24 -04:00
chenyu	59c6472b9f	check contiguous in View.create after canonicalizing mask and offset (#4770 ) mask / offset / strides can change during canonicalization, and contiguous can be True at the end	2024-05-29 11:31:13 -04:00

... 50 51 52 53 54 ...

4433 Commits