tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 23:38:58 -05:00

Author	SHA1	Message	Date
chenyu	4a65010de8	remove CUDACPU flag in tests [run_process_replay] (#5902 ) no longer used	2024-08-04 16:06:38 -04:00
chenyu	b392b8edc3	increase atol and rtol test_gemm_fp16 (#5866 ) * increase atol and rtol test_gemm_fp16 made it pass with NOOPT which has larger accumulated error * revert that	2024-08-01 19:09:58 -04:00
chenyu	defd89e8e0	unify negative shape creation to raise ValueError (#5817 ) [run_process_replay]	2024-07-30 13:42:59 -04:00
P4ssenger	6742a4789a	Add check for negative dimension in view (#5790 ) * add check for negative dimension in view * add negative dim tests * move check to tensor level * fix error message * move check to view create --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-30 13:26:27 -04:00
samm393	573e0f9a48	remove float division from idiv in python_alu (#5777 ) * removes float division from idiv in python_alu * add test * cleaner logic * pass clang unsigned literals correctly * suffix ULL instead of U --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-29 12:14:12 -04:00
George Hotz	053550c3f3	remove MERGE opt, cleanup wmma upcast (#5669 ) * remove MERGE opt, cleanup wmma upcast * upcast first * fix broken vectorize folding rule	2024-07-23 20:43:42 -07:00
George Hotz	e3f00ac77d	Fix cuda tc emu test (#5663 ) * fix acc folding for NV tensor cores * fix correctness of reduce_before_expand * fix test emulated CUDA tensor cores * test_gemm_fp16 on some devices	2024-07-23 15:04:25 -07:00
George Hotz	386fb5e7f8	folding without UNMUL (#5628 ) * folding without UNMUL * fix failures, index_collapse * import ReduceOps * test_arange_4096 isn't folding	2024-07-21 20:14:44 -07:00
George Hotz	0ad87021e2	move acc to end (#5568 ) * move acc to end * confirmed pictures are the same * relax that * Update test_ops.py	2024-07-19 03:06:52 -07:00
chenyu	6e405b0a2b	add 0d tensor to trunc/floor/ceil/round tests (#5512 ) existing trunc test passes backward but its backward is incorrect in general. added tests that would fail	2024-07-16 16:48:25 -04:00
Tobias Fischer	87a2ef2bc2	Add Interpolate Function (#5482 ) * add interpolate function * fixed linter issue * reduced sizes in test --------- Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2024-07-16 09:44:01 -07:00
Tobias Fischer	e219103677	Add Pad to Pooling (#5488 )	2024-07-14 21:50:20 -07:00
Tobias Fischer	5849130cbb	gather negative dim fix (#5486 )	2024-07-14 20:20:53 -04:00
chenyu	00813a92a0	update Tensor.eye api to match torch (#5433 ) * update Tensor.eye api to match torch input is n for nrows and optional m for ncols * space * fix onnx	2024-07-12 20:25:12 -04:00
chenyu	64986f949c	more transcend math tests in ci (#5368 ) * more transcend math tests in ci test large input to trig functions that hit different reduction algo, and test TRANSCENDENTAL=2 for all backend * no CUDACPU * try that	2024-07-10 21:19:09 -04:00
chenyu	0f0940225a	fix Tensor.all and Tensor.any for PTX (#5335 ) supported boolean acc and boolean phi. and rewrite boolean max to uint8 max	2024-07-08 18:15:04 -04:00
chenyu	6856f915d6	Tensor.any and Tensor.all (#5320 ) does not work in ptx yet due to how boolean tensor is handled	2024-07-07 14:36:00 -04:00
chenyu	2029cb7047	support passing None to Tensor.clip (#5319 ) passing None for no upper bound or no lower bound	2024-07-07 13:04:22 -04:00
chenyu	c1e330f302	Tensor.int and Tensor.bool (#5317 )	2024-07-07 11:52:58 -04:00
George Hotz	e53b164e1a	small changes from lowerer (#5266 )	2024-07-02 15:03:54 -07:00
George Hotz	3df47bc21e	OpenELM + repeat_interleave (#5234 ) * start writing openelm * progress...hit bug * repeat_interleave support * gqa * add rotary embedding * spp * i think it runs correctly * broken * output is good now * cleanups * no io_uring on android	2024-06-30 15:18:39 -07:00
hikettei	ad1ca7da64	[Feature] Added BinaryOps.AND/BinaryOps.OR (#5223 ) * [Feature] Added BinaryOps.AND/BinaryOps.OR * Add: __rand__, __ror__	2024-06-29 17:20:25 -07:00
chenyu	ee0c6dfc15	build Tensor._tri with movements only (#5110 ) * build Tensor._tri with movements only doesn't need arange, saved a kernel in attention mask * simpler, more tests	2024-06-23 00:07:36 -04:00
chenyu	20fabd8a5b	update Tensor.triu and Tensor.tril (#5109 ) renamed arg to `diagonal` that matches torch api, and added document and examples	2024-06-22 21:59:50 -04:00
George Hotz	9f875123b6	small changes from lowerer. [run_process_replay] [no_assert] (#5102 )	2024-06-22 11:09:35 -07:00
chenyu	166a2b19b5	fix reduce axis of 0d tensors (#5089 ) `x.sum(())` is fine, and `x.sum((1,))` should throw IndexError	2024-06-21 13:51:40 -04:00
chenyu	36b4a492a1	explicitly check getitem indices can have at most one ellipsis (#5087 ) * explicitly check getitem indices can have at most one ellipsis previous error with multiple `...`: ``` if index_type not in [None, int, slice, Tensor]: raise IndexError(f"{index_type=} not supported") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IndexError: index_type=<class 'ellipsis'> not supported ``` this pr: ``` if len(ellipsis_idx) > 1: raise IndexError("an index can only have a single ellipsis ('...')") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IndexError: an index can only have a single ellipsis ('...') ``` * oh we have that already * test that * test these	2024-06-21 12:33:18 -04:00
chenyu	f6d6760f71	don't cast tuple to list before creating Tensor (#5071 ) Tensor constructor supports creating from tuple now	2024-06-20 13:32:56 -04:00
chenyu	50700171ef	minor cleanup to reshape arg handling (#5070 ) moved None handle to be with argfix, and only resolve -1 if there's a -1	2024-06-20 10:27:27 -04:00
chenyu	f4355d0f1b	check Tensor.permute input arg is a valid permutation (#5069 ) also added support of negative axes	2024-06-20 10:01:28 -04:00
chenyu	e8f39fcaaa	check arg to Tensor.flip can appear only once (#5068 ) * check arg to Tensor.flip can appear only once raise RuntimeError if there are multiple * fix test	2024-06-20 09:33:42 -04:00
chenyu	620fa6e5a2	check Tensor.reshape can have at most one -1 (#5026 ) raise RuntimeError to match torch. on master it throws weird errors from shapetracker	2024-06-18 08:17:12 -04:00
chenyu	c0139b05d8	python_alu sin(inf) is nan (#5020 ) * python_alu sin(inf) is nan without special handling, it throws ValueError: math domain error * skip CUDACPU	2024-06-17 19:47:30 -04:00
Ray	1ad3b25461	fix einsum output str (#4998 ) * fix einsum output str * new line to satisfy linter * removed redundant cast (satisfy linter)	2024-06-17 12:18:14 -04:00
chenyu	67e8df4969	remove numpy from dtype (#4969 ) replaced all dtype.np with _to_np_dtype defined in tensor.py. after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer	2024-06-14 15:38:45 -04:00
geohotstan	90332eb529	Getitem pin None dimension (#4960 ) * fix * remove torch out of bounds test * 1 more test case	2024-06-14 10:48:59 -04:00
chenyu	74586bc339	fix getitem with leading None (#4943 ) i think all None handling can be unified and remove the calc_dim in advanced indexing	2024-06-13 11:23:40 -04:00
chenyu	fae08c4d48	fix Tensor.triu / Tensor.triu with boolean input (#4941 ) `where(self, 0)` incorrectly upcasted the output. `where(self, False)` is correct but looks unnatural, so added a cast at the end. Pattern matcher can fold the cast into where branches	2024-06-12 20:16:13 -04:00
chenyu	eb0f5b5660	failed test case for getitem with leading Nones (#4936 ) * failed test case for getitem with leading Nones torch matched numpy so tinygrad is incorrect. another repro ``` t = np.arange(12).reshape((3, 4)) print(t[None, None, np.array([1, 2])]) t = torch.arange(12).reshape((3, 4)) print(t[None, None, torch.tensor([1, 2])].numpy()) t = Tensor.arange(12).reshape(3, 4) print(t[None, None, Tensor([1, 2])].numpy()) ``` * # noqa	2024-06-12 16:19:42 -04:00
chenyu	1326f29e24	fix Tensor.gather shape checking criteria (#4932 ) it's fine if `self.shape[d] >= index.shape[d]` for all `d != dim`, not for all `d`	2024-06-12 13:10:14 -04:00
chenyu	798ea61377	widen test_ops [low, high] and more strict atol (#4906 ) default [low, high] changed from [-1.5, 1.5] to [-2, 2] (except tan). dropped several explicit atol if it's unnecessarily larger than default 1e-6. tested on mac, tinybox red / green	2024-06-10 20:47:09 -04:00
chenyu	c8cd637236	test case for Tensor.var reducing over size = 1 axis (#4902 ) backward failed when correction >= reducing n	2024-06-10 12:11:39 -04:00
chenyu	a70e8a80d7	test_ops test cmp with special floats (#4826 ) prepare to fix nan, it did not work with ge and le before either	2024-06-04 12:10:21 -04:00
chenyu	3afc914617	CMPEQ -> CMPNE and make it safe to pad (#4818 ) * CMPNE * new dataset	2024-06-03 18:02:15 -04:00
chenyu	4921de1945	fix cumsum of 0-d tensor (#4781 ) * fix cumsum of 0-d tensor * _resolve_dim for all	2024-05-30 12:41:09 -04:00
chenyu	4cf0eadf8f	failed test case for ellipsis in einsum (#4779 ) from #4156	2024-05-30 11:14:42 -04:00
chenyu	7e90026eb0	pow cleanup part 2 (#4727 ) more cleanups and fix 0 ** 0	2024-05-25 07:17:40 -04:00
chenyu	31358cbea5	change Tensor.stack to method (#4719 )	2024-05-24 17:04:19 -04:00
chenyu	47aba47f64	update Torch.gather api (#4692 ) * update Torch.gather api gather(self, dim, index) to match torch * fix that	2024-05-22 21:54:06 -04:00
George Hotz	07b350a8f4	new uops is an actual graph (#4560 ) * new uops is an actual graph * it's way slower * simpler * fix define acc * render_loop unique * ops test pass * add pattern matcher back, there's bugs * rewrite * use priority queue * recursive children * fix tests * fix tests with SINK * fix abstractions * fix assembly * simpler * link define_acc * fix DEFINE_ACC placement * type verify * full cmp * fix cmp * ACCESS_ACC * insert DEFINE_ACC * fix PHI * recursive rewrite * fix many tests * sum collapse * more patterns * correct change * fold arange * fix that lin test * space * big folding rule works * close * has more maxes, meh * cached node replace * set changed * simplest folding yet * works * works * DIV * all tests pass * del * fuzz linearizer fails * sum_collapse * test depth 2 cf * fix lin test 14 * fix clang depth * disable that * failure 14 is fixed * fix ptx * failure 27 is fixed * fix llama * run_cnt * Revert "Optimize PTX gated loads index calculation (#4304)" This reverts commit `d97d5a7689`. * fix uops loop * fix ptx bugs * add barrier * print * mem_type in ptx direct * bypass tests that fail in CI but pass locally * ptx remove ptr_ar * more ptx passing * fix ptx tests * assert compile support * remove model inference benchmark from red	2024-05-17 18:00:18 -07:00

1 2 3 4 5 ...

412 Commits