tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-05 20:24:57 -05:00

Author	SHA1	Message	Date
chenyu	72c9b22833	sort vars in jit when building expected input args (#4990 ) * sort vars in jit when building expected input args fixed symbolic jit bugs with two variables. * sort in clanggraph * space * one more	2024-06-16 15:55:51 -04:00
qazal	71aad183fd	check Program from HEAD [run_process_replay] (#4996 ) * use the same prg [run_process_replay] * put var back	2024-06-16 20:12:30 +03:00
chenyu	2b07847f2b	matmul returns in acc_dtype if specified (#4994 ) more flexible to not automatically downcast, can fix bert mixed precision training with this	2024-06-16 12:56:15 -04:00
George Hotz	1d6f1a15e1	add lt and ge uop methods [run_process_replay] (#4995 ) * add lt and ge uop methods [run_process_replay] * more correct (should still run process replay)	2024-06-16 09:33:53 -07:00
uuuvn	1b3f27565a	Boring UOps to UPat compiler [run_process_replay] (#4991 ) * Boring UOps to UPat compiler * ruff * weirdness * dtype fix * Revert "weirdness" This reverts commit `4bc213a157`. * weirdness * end weirdness? * a bunch more rules * more patterns	2024-06-16 09:03:41 -07:00
George Hotz	dac96f177e	ignore indexing in the flopcounter (#4993 )	2024-06-16 08:59:55 -07:00
Timmy	01b26756d6	Multireduce Scheduler Tests (#4972 ) * scheduler tests * linters * cleaning up tests * fixing tests * syntax * fixing metal	2024-06-16 16:30:22 +03:00
chenyu	5eb8001514	minor cleanup in jit (#4989 ) found a non-deterministic bug in jit with multiple variables. but first cleanup some variable names. [run_process_replay]	2024-06-15 23:43:17 -04:00
chenyu	44dfa37c70	use threefry in stable diffusion benchmark (#4988 ) also updated default steps to 10. easier to tell the image is following the prompt.	2024-06-15 20:25:29 -04:00
chenyu	20b50d8d64	doc: manual_seed (#4987 ) there was a docstring just not linked to the doc page. also updated the example to show re-seed instead of a internal variable	2024-06-15 19:57:26 -04:00
wozeparrot	ce1ed374c9	more tinychat fixes (#4971 )	2024-06-15 16:29:39 -07:00
chenyu	50bc14d186	re-enable test that loads torch pkl format (#4986 )	2024-06-15 14:11:30 -04:00
qazal	ff8e9eefc3	hotfix: don't use ASSERT_COMPILE for benchmarks process replay (#4981 ) * use replay_codegen [run_process_replay] * disable for now [run_process_replay]	2024-06-15 16:57:47 +03:00
uuuvn	92f49efd06	Trigger process replay from pull request title [run_process_replay] (#4980 ) * Trigger process replay from pull request title * idk how this thing works btw * test if it will work * try 2 * Revert "idk how this thing works btw" This reverts commit `580da51b07`. * Revert "try 2" This reverts commit `7ff1e86d5d`. * test if it works * meh * Reapply "idk how this thing works btw" This reverts commit `dd33ad7c14`. * revert	2024-06-15 16:21:00 +03:00
uuuvn	033fb53f9e	Incomplete/buggy rule breaks process replay on #4976 (#4978 ) * Incomplete/buggy rule breaks process replay on #4976 * test passes --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-15 15:18:35 +03:00
qazal	d91f0ee85b	add regression test for the neg folding pattern (#4979 )	2024-06-15 15:08:28 +03:00
nimlgen	dfadf82e10	hcq optimize enqueue time (#4973 ) * hcq optimize enqueue time * linter	2024-06-15 10:47:25 +03:00
chenyu	5f7dd74655	docs: update wording for unflatten (#4974 ) it was using `Expands`, the same in torch doc, but we also have expand so it's confusing	2024-06-14 23:12:41 -04:00
Cyril Roumégous	efbf4fca05	perf: graph_rewrite line reduction and make it a little bit faster [run_process_replay] (#4958 )	2024-06-14 16:37:27 -07:00
wozeparrot	8209cd3c55	easier llama3 + fetch subdir (#4938 )	2024-06-14 13:47:27 -07:00
chenyu	64cda3c481	raise TypeError calling len() on a 0-d tensor (#4970 ) matched numpy and torch	2024-06-14 16:34:27 -04:00
chenyu	67e8df4969	remove numpy from dtype (#4969 ) replaced all dtype.np with _to_np_dtype defined in tensor.py. after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer	2024-06-14 15:38:45 -04:00
wozeparrot	62dc36d371	autogen _try_dlopen (#4949 )	2024-06-14 12:12:18 -07:00
qazal	3e297d8216	delete Linearizer.const [run_process_replay] (#4967 )	2024-06-14 21:51:37 +03:00
chenyu	118c9fe468	Tensor._fromcpu -> Tensor._fromnp (#4966 ) and moved to constructor with np.ndarray	2024-06-14 14:33:43 -04:00
wozeparrot	2a974ff257	fix: no readablestream await of, too new (#4965 )	2024-06-14 11:22:19 -07:00
nimlgen	9436cd4551	hcq add memory_barrier (#4963 ) * hcq add memory_barrier * fix nv	2024-06-14 21:02:55 +03:00
chenyu	dae1c8abe2	create Tensor from bytes without numpy (#4964 )	2024-06-14 13:37:27 -04:00
chenyu	5eee974b2a	construct Tensor from python list/tuple directly (#4947 ) * construct Tensor from python list/tuple directly no numpy. annoying that half memoryview is 3.12 feature... * simpler, and test * flat already * simpler * cute * 10% faster * 5%	2024-06-14 11:36:05 -04:00
geohotstan	90332eb529	Getitem pin None dimension (#4960 ) * fix * remove torch out of bounds test * 1 more test case	2024-06-14 10:48:59 -04:00
qazal	2eeddf1a46	IF ends with STORE, RANGE ends with PHI [run_process_replay] (#4953 )	2024-06-14 16:00:32 +03:00
George Hotz	d5a92b9b83	sort the axis in reduce op [run_process_replay] (#4956 )	2024-06-14 05:16:05 -07:00
George Hotz	14189bca68	graph_dedup function [run_process_replay] (#4955 )	2024-06-14 04:24:37 -07:00
George Hotz	63a8add2c2	move uops add logic to linearize (#4952 ) * move logic to linearize * idk how this should work * empty	2024-06-14 03:52:37 -07:00
qazal	7e32b8c930	refactor generic UOps.END* insertion (#4951 ) * merge loops children * rename to scope_children * refactor ends * merge with ends [run_process_replay]	2024-06-14 13:42:41 +03:00
George Hotz	9823752397	make uops.add private (#4950 ) * make uops.add private * modernize all tests	2024-06-14 03:23:25 -07:00
Jhenner Tigreros	dc9e9e4363	Convert BinaryOps.DIV to UnaryOps.RECIP and BinaryOps.IDIV (#4887 ) * Create UnaryOps.RECIP and BinaryOps.IDIV and changing uses of BinaryOps.DIV * Delete unused import * Add cstyle renderer * Fix formatting text * Fix test error due to bad implementation of renderer * Add PTX support * Add RECIP to LLVMIR * Remove BinaryOps.DIV from symbolic test * Change some test and fix C floor division * Change references to DIV for the RECIP or IDIV * Add mimic idiv for symbolic test * Restore floor * Mimic idiv * cast to int * Fix some test and renderer * Remove DIV for render nodes * Resolve issue with div * Add TestRenderer * Fix test * fix error * Fix PAD test * Fix div implementation * Remove DIV * Add upcast to rshift, due to use of MUL and RECIP on DIV * Fix linter * Remove complete BinaryOps.DIV * Fix lint * Fix some test * Revert mul modification * Fix tests * Fix CLANG for uops * Revert IDIV function * Minor fix * modify pattern matching rule to support nan * Fix UNSAFE_PADS_OPS to add UnaryOps.RECIP * Remove const folding for IDIV and fix PTX * Complete remove IDIV from extra * Remove test_div from TestFloatUOps due to test on recip * Fix linearizer * fix * Fix test_22 * Fix llvm * Apply trunc function for llvmlit * use floor instead of trunc * Use correct type * Generate new fuzz db * Fix rshift, do not cast to float to support idiv * Return upcast=false to rshift * Add to unsafepad BinaryOps.IDIV * Remove RECIP override for CUDA * add atol / rtol for the test * Remove cast to int on IDIV * Regenerate sops * delete sops.gz * regenerate * regenerate * regenerate * Reduce margins * pass atol and rtol as parametersg for _test_metrics * regenerated dataset * Regenerate * Remove duplicated * Revert changes on extra * Remove changes extra and NOQA for test * Remove E501 * Remove and change line * Remove E501 * Fix atan2 * Revert import and E501 * Remove E501 * Add hrcp to halp ops * Remove 1 of hrcp * Remove last DIV and add type check on uops for IDIV * Fix new tests * Fix tests and custom function * Regenerate dataset * Regenerate dataset * Revert dataset * Change generate dataset script * Remove line * Change IDIV, type checker validate if x,y and z are int --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-06-14 02:43:46 -07:00
SnakeOnex	f87ba6016a	tqdm total=0 fix (#4939 ) * fixes * fixes * removed auto loop closing * one line shorter	2024-06-14 02:31:59 -07:00
nimlgen	225f792330	amd hdp flush regs are on seg2 (#4925 )	2024-06-14 01:42:23 +03:00
nimlgen	4bfd1904f6	nv do not modify prg's qmd (#4948 )	2024-06-14 01:15:40 +03:00
chenyu	845c10bc28	add Node to _broadcasted type annotation (#4946 )	2024-06-13 14:10:56 -04:00
chenyu	287d3c3b84	support list, tuple input in dtypes.from_py (#4945 ) * support list, tuple input in dtypes.from_py and used it to infer dtype from python list and tuple in Tensor constructor. * fix tests	2024-06-13 13:38:06 -04:00
chenyu	7aecea4f56	support creating Tensor from python tuple (#4944 ) added a small fuzzer to test data with mixed tuple and list of numbers matched with numpy	2024-06-13 12:18:37 -04:00
chenyu	74586bc339	fix getitem with leading None (#4943 ) i think all None handling can be unified and remove the calc_dim in advanced indexing	2024-06-13 11:23:40 -04:00
George Hotz	e63701fbd4	RDNA3 assembly support (#3637 ) * amazing that i can use comgr for this * compile empty kernel * cleanups * tiny_add compiles * ugh * more work * put that in extra	2024-06-13 09:09:24 +02:00
nimlgen	fd071ba27e	amd mockgpu correct timer resolution (#4942 ) * amd mockgpu correct timer resolution * test it	2024-06-13 10:07:34 +03:00
chenyu	fae08c4d48	fix Tensor.triu / Tensor.triu with boolean input (#4941 ) `where(self, 0)` incorrectly upcasted the output. `where(self, False)` is correct but looks unnatural, so added a cast at the end. Pattern matcher can fold the cast into where branches	2024-06-12 20:16:13 -04:00
chenyu	cc90b3ef9f	simpler Tensor.gather (#4940 ) get rid of some confusing transpose and permute, and the if condition on dim. Saved a kernel for each dim != 0 case in test by removing the dangling transpose at the end	2024-06-12 19:42:40 -04:00
George Hotz	fa00ef66fd	Update README.md	2024-06-13 00:29:19 +02:00
chenyu	eb0f5b5660	failed test case for getitem with leading Nones (#4936 ) * failed test case for getitem with leading Nones torch matched numpy so tinygrad is incorrect. another repro ``` t = np.arange(12).reshape((3, 4)) print(t[None, None, np.array([1, 2])]) t = torch.arange(12).reshape((3, 4)) print(t[None, None, torch.tensor([1, 2])].numpy()) t = Tensor.arange(12).reshape(3, 4) print(t[None, None, Tensor([1, 2])].numpy()) ``` * # noqa	2024-06-12 16:19:42 -04:00

... 114 115 116 117 118 ...

10490 Commits