tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 23:08:06 -05:00

Author	SHA1	Message	Date
chenyu	03b367c014	handle float16 overflow in PYTHON (#5022 ) * handle float16 overflow in PYTHON use `truncate` when constructing tensor from list to make sure all values are packable (might be slow, but should be correct). add truncate_fp16 to cast overflowed values to inf/-inf. * all valid fmt supports truncate	2024-06-17 21:12:52 -04:00
chenyu	c0139b05d8	python_alu sin(inf) is nan (#5020 ) * python_alu sin(inf) is nan without special handling, it throws ValueError: math domain error * skip CUDACPU	2024-06-17 19:47:30 -04:00
chenyu	4296507021	Tensor.sum returns in acc_dtype if specified (#5012 ) * Tensor.sum returns in acc_dtype if specified * skip PYTHON for now * revert that * relax that	2024-06-17 16:35:52 -04:00
Ray	1ad3b25461	fix einsum output str (#4998 ) * fix einsum output str * new line to satisfy linter * removed redundant cast (satisfy linter)	2024-06-17 12:18:14 -04:00
nimlgen	794acefbf3	hcq update waits and signals in place (#4984 ) * hcq update waits and signals in place * start amd * amd works * prettier * test * normal messages * linetr * linter 2	2024-06-17 17:19:07 +03:00
qazal	04feeb37e6	look for unsafe pad ops in multiview ShapeTrackers (#5002 )	2024-06-17 00:28:12 +03:00
chenyu	72c9b22833	sort vars in jit when building expected input args (#4990 ) * sort vars in jit when building expected input args fixed symbolic jit bugs with two variables. * sort in clanggraph * space * one more	2024-06-16 15:55:51 -04:00
qazal	71aad183fd	check Program from HEAD [run_process_replay] (#4996 ) * use the same prg [run_process_replay] * put var back	2024-06-16 20:12:30 +03:00
chenyu	2b07847f2b	matmul returns in acc_dtype if specified (#4994 ) more flexible to not automatically downcast, can fix bert mixed precision training with this	2024-06-16 12:56:15 -04:00
George Hotz	1d6f1a15e1	add lt and ge uop methods [run_process_replay] (#4995 ) * add lt and ge uop methods [run_process_replay] * more correct (should still run process replay)	2024-06-16 09:33:53 -07:00
George Hotz	dac96f177e	ignore indexing in the flopcounter (#4993 )	2024-06-16 08:59:55 -07:00
Timmy	01b26756d6	Multireduce Scheduler Tests (#4972 ) * scheduler tests * linters * cleaning up tests * fixing tests * syntax * fixing metal	2024-06-16 16:30:22 +03:00
chenyu	50bc14d186	re-enable test that loads torch pkl format (#4986 )	2024-06-15 14:11:30 -04:00
uuuvn	033fb53f9e	Incomplete/buggy rule breaks process replay on #4976 (#4978 ) * Incomplete/buggy rule breaks process replay on #4976 * test passes --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-15 15:18:35 +03:00
qazal	d91f0ee85b	add regression test for the neg folding pattern (#4979 )	2024-06-15 15:08:28 +03:00
wozeparrot	8209cd3c55	easier llama3 + fetch subdir (#4938 )	2024-06-14 13:47:27 -07:00
chenyu	64cda3c481	raise TypeError calling len() on a 0-d tensor (#4970 ) matched numpy and torch	2024-06-14 16:34:27 -04:00
chenyu	67e8df4969	remove numpy from dtype (#4969 ) replaced all dtype.np with _to_np_dtype defined in tensor.py. after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer	2024-06-14 15:38:45 -04:00
chenyu	dae1c8abe2	create Tensor from bytes without numpy (#4964 )	2024-06-14 13:37:27 -04:00
chenyu	5eee974b2a	construct Tensor from python list/tuple directly (#4947 ) * construct Tensor from python list/tuple directly no numpy. annoying that half memoryview is 3.12 feature... * simpler, and test * flat already * simpler * cute * 10% faster * 5%	2024-06-14 11:36:05 -04:00
geohotstan	90332eb529	Getitem pin None dimension (#4960 ) * fix * remove torch out of bounds test * 1 more test case	2024-06-14 10:48:59 -04:00
George Hotz	14189bca68	graph_dedup function [run_process_replay] (#4955 )	2024-06-14 04:24:37 -07:00
George Hotz	63a8add2c2	move uops add logic to linearize (#4952 ) * move logic to linearize * idk how this should work * empty	2024-06-14 03:52:37 -07:00
George Hotz	9823752397	make uops.add private (#4950 ) * make uops.add private * modernize all tests	2024-06-14 03:23:25 -07:00
Jhenner Tigreros	dc9e9e4363	Convert BinaryOps.DIV to UnaryOps.RECIP and BinaryOps.IDIV (#4887 ) * Create UnaryOps.RECIP and BinaryOps.IDIV and changing uses of BinaryOps.DIV * Delete unused import * Add cstyle renderer * Fix formatting text * Fix test error due to bad implementation of renderer * Add PTX support * Add RECIP to LLVMIR * Remove BinaryOps.DIV from symbolic test * Change some test and fix C floor division * Change references to DIV for the RECIP or IDIV * Add mimic idiv for symbolic test * Restore floor * Mimic idiv * cast to int * Fix some test and renderer * Remove DIV for render nodes * Resolve issue with div * Add TestRenderer * Fix test * fix error * Fix PAD test * Fix div implementation * Remove DIV * Add upcast to rshift, due to use of MUL and RECIP on DIV * Fix linter * Remove complete BinaryOps.DIV * Fix lint * Fix some test * Revert mul modification * Fix tests * Fix CLANG for uops * Revert IDIV function * Minor fix * modify pattern matching rule to support nan * Fix UNSAFE_PADS_OPS to add UnaryOps.RECIP * Remove const folding for IDIV and fix PTX * Complete remove IDIV from extra * Remove test_div from TestFloatUOps due to test on recip * Fix linearizer * fix * Fix test_22 * Fix llvm * Apply trunc function for llvmlit * use floor instead of trunc * Use correct type * Generate new fuzz db * Fix rshift, do not cast to float to support idiv * Return upcast=false to rshift * Add to unsafepad BinaryOps.IDIV * Remove RECIP override for CUDA * add atol / rtol for the test * Remove cast to int on IDIV * Regenerate sops * delete sops.gz * regenerate * regenerate * regenerate * Reduce margins * pass atol and rtol as parametersg for _test_metrics * regenerated dataset * Regenerate * Remove duplicated * Revert changes on extra * Remove changes extra and NOQA for test * Remove E501 * Remove and change line * Remove E501 * Fix atan2 * Revert import and E501 * Remove E501 * Add hrcp to halp ops * Remove 1 of hrcp * Remove last DIV and add type check on uops for IDIV * Fix new tests * Fix tests and custom function * Regenerate dataset * Regenerate dataset * Revert dataset * Change generate dataset script * Remove line * Change IDIV, type checker validate if x,y and z are int --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-06-14 02:43:46 -07:00
SnakeOnex	f87ba6016a	tqdm total=0 fix (#4939 ) * fixes * fixes * removed auto loop closing * one line shorter	2024-06-14 02:31:59 -07:00
chenyu	287d3c3b84	support list, tuple input in dtypes.from_py (#4945 ) * support list, tuple input in dtypes.from_py and used it to infer dtype from python list and tuple in Tensor constructor. * fix tests	2024-06-13 13:38:06 -04:00
chenyu	7aecea4f56	support creating Tensor from python tuple (#4944 ) added a small fuzzer to test data with mixed tuple and list of numbers matched with numpy	2024-06-13 12:18:37 -04:00
chenyu	74586bc339	fix getitem with leading None (#4943 ) i think all None handling can be unified and remove the calc_dim in advanced indexing	2024-06-13 11:23:40 -04:00
nimlgen	fd071ba27e	amd mockgpu correct timer resolution (#4942 ) * amd mockgpu correct timer resolution * test it	2024-06-13 10:07:34 +03:00
chenyu	fae08c4d48	fix Tensor.triu / Tensor.triu with boolean input (#4941 ) `where(self, 0)` incorrectly upcasted the output. `where(self, False)` is correct but looks unnatural, so added a cast at the end. Pattern matcher can fold the cast into where branches	2024-06-12 20:16:13 -04:00
chenyu	eb0f5b5660	failed test case for getitem with leading Nones (#4936 ) * failed test case for getitem with leading Nones torch matched numpy so tinygrad is incorrect. another repro ``` t = np.arange(12).reshape((3, 4)) print(t[None, None, np.array([1, 2])]) t = torch.arange(12).reshape((3, 4)) print(t[None, None, torch.tensor([1, 2])].numpy()) t = Tensor.arange(12).reshape(3, 4) print(t[None, None, Tensor([1, 2])].numpy()) ``` * # noqa	2024-06-12 16:19:42 -04:00
chenyu	a21ea165bc	skip linearizer test_failure_22 on llvm (#4937 ) getting flaky recently	2024-06-12 16:03:38 -04:00
Timmy	720c700a8a	Multireduce-Kernels: Linearizer Changes and Tests (#4259 ) * basic tests * cleanup * pylint * ruff * use define acc as a proxy for rendered reductions * use define acc as a proxy for rendered reductions * recursive reduceop rendering via ast_parse * linters + cleanup * fixing late buf loading * plus linters * removing extra line * linters * does this break ci? * added tests and if add end change * typo in add_ends * linters * removing comments * allow endifs to be inserted before the end of the graph * find add ENDIF before next BARRIER * removing tests with manual ENDIF + linters * specifically the next barrier aftr the store of the local result * Revert "specifically the next barrier aftr the store of the local result" This reverts commit `b288a5c3ce`. * keeping up to date * linters + merge changes * cleaning up old bad decisions * linters and opts * mrged linearizer tests * fixing merge issues * removing the big ugly uop test (functionality tested end-to-end by test_linearizer additions * small diff fixes * updating linearizer to work without uops.add( ... cachable) * linters * comment in multireduce tests * skipping tests without locals * full tests * linters * load_cache[key] fix for multiple accs * linters * assert only one reduceop * fix loop_scope test to actually cause an issue * self.load_cache[key] key for DEFINE_ACC changed to use a string to make sure each acc is unique * updated tests * fixing merge * removing debug prints * complete merge fix * linters * diff cleanup * adding tests in * give each reduce it's own local buffer * gpu=1 changes * store and load locals with upcasting * modifying test? * make multireduce_netsted_local_upcast test match single reduce shapes * removing todo * cleaning up the diff * unroll test * unroll and upcast tests * fix gpu * seq and self.load_cache[key] cleaning * linters * padto works * merge fixes * fixes * add skips for amd * linters + seq * cleaning & more tests * softmax tests * linters * [run_process_replay] * add new tests back This reverts commit `19dec22e01`. * more hardcoded -1s * fix ptx * Fix name for loop in ptx * cleaning up the diff * cleaning up the uops diff * nv ci is too slow --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: Szymon Ożóg <58388001+SzymonOzog@users.noreply.github.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-06-12 13:29:43 -04:00
Nicklas Boman	6e86472cd6	fix typing for test to run in py38 (#4930 )	2024-06-12 13:22:30 -04:00
chenyu	1326f29e24	fix Tensor.gather shape checking criteria (#4932 ) it's fine if `self.shape[d] >= index.shape[d]` for all `d != dim`, not for all `d`	2024-06-12 13:10:14 -04:00
George Hotz	9a3c1e4a17	fix mul div failure (#4928 )	2024-06-12 13:58:46 +02:00
George Hotz	11a03cbbf5	don't use uops.add while constructing (#4913 ) * don't use uops.add while constructing * rebase * bugfixes * have to use BFS * prove it's late * simpler uop symbolic test (why we did this) * use dict, not set	2024-06-12 13:31:34 +02:00
chenyu	fdbb4305cb	skip unsupported dtype in fuzz_linearizer (#4917 ) resolve issues in #4887. dataset generated from ubuntu but metal does not support double	2024-06-11 18:18:21 -04:00
chenyu	b886d250fb	improve test_dropout_on_shard (#4912 ) tested some basic property, also minor formatting for a few Tensor.training setups	2024-06-11 11:36:02 -04:00
George Hotz	35e53c0809	add sharded arange test (#4908 )	2024-06-11 10:58:33 +02:00
chenyu	798ea61377	widen test_ops [low, high] and more strict atol (#4906 ) default [low, high] changed from [-1.5, 1.5] to [-2, 2] (except tan). dropped several explicit atol if it's unnecessarily larger than default 1e-6. tested on mac, tinybox red / green	2024-06-10 20:47:09 -04:00
chenyu	97b05f567e	revert the .detach() in layernorm (#4904 ) * revert the .detach() in layernorm it's only correct in LayerNorm where input is the data, and not correct in GroupNorm and InstanceNorm that reused layernorm. Added backward tests for weights, bias and input for these norms. * bigger atol for llvm * relax backward more	2024-06-10 18:02:05 -04:00
qazal	8b5bcf309a	process replay in all of CI (#4884 )	2024-06-10 14:49:29 -04:00
chenyu	c8cd637236	test case for Tensor.var reducing over size = 1 axis (#4902 ) backward failed when correction >= reducing n	2024-06-10 12:11:39 -04:00
chenyu	b56ae5606c	cosmetic changes to uop _match (#4897 ) minor cleanup before fixing two level match [run_process_replay]	2024-06-09 18:29:42 -04:00
SnakeOnex	b1db2d0094	tqdm replacement (#4846 ) * tqdm replacement almost * formatting * formatting * imports * line len * fix * removed set description :( * removed set description :( * fix * fix * green check? * rewrote as class, fixed several bugs * types spacing * removed imports * fix * iterable * typing * mypy disagreement * imports * more e2e tests vs tqdm * removed seed setting * robustness against time.sleep() flakiness * flaky fix * automatic bar closing when count==total * cleanup * clang error with tqdm * tqdm back * use os lib, print to stderr (fixes the clang bug, where the bar was leaking into the generated c program * back to shutil * unit_scale + unit_scale test * custom unit to tests * pretty * clean * removed flaky test * less test iters * empty line * remove disable	2024-06-09 23:46:03 +02:00
qazal	1dde829e34	UOps.IF* to graph spec (#4894 )	2024-06-09 07:00:12 -04:00
George Hotz	b9afb0d577	test uop as symbolic (#4870 ) * start work * more tests passing * more tests passing * more * 34 failures * expect the failures * remove broken rule * render is fine in just the test * simplify and put in test	2024-06-09 12:15:11 +02:00
nimlgen	654a8b9ef7	retire hsa (#4885 ) * retire hsa * EMULATE_AMD	2024-06-09 11:33:03 +03:00

... 53 54 55 56 57 ...

4618 Commits