tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
kormann	7c3b877216	rename uop [run_process_replay] (#5031 ) * rename * fix unittests * rename vin * fix test * fix type [run_process_replay] * rm pre commit hook change	2024-06-18 21:34:05 +03:00
chenyu	dc942bf1f6	jit sampling functionn in test_randomness.test_multinomial (#5034 ) * jit sampling functionn in test_randomness.test_multinomial `THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec * skip that	2024-06-18 14:21:05 -04:00
Elias Wahl	f31ef11537	Better default hparams for large BS (#5030 ) * better default hparams for large BS * bf16 too * use tuple	2024-06-18 11:13:06 -04:00
Francis Lam	8d33998e0d	[run_process_replay] linearizer: fix get_grouping_dims to respect global/local max (#4855 ) * linearizer: fix get_grouping_dims to respect global/local max * fix lidx variable index offset and unrestrict clang/llvm global len * test reverse variable indexing when reverse_dims is true * change the collapse axis to be the right most if reversed	2024-06-18 16:51:27 +03:00
joeshmoe0112358	7842559952	simplification of exp2 (#5023 )	2024-06-18 06:51:16 -07:00
kormann	acc8f5e30e	print_tree for uops (#5028 )	2024-06-18 06:36:14 -07:00
Junjun Dong	c8cd6e725c	Remove BinaryOps.SUB. Replace SUB by ADD and NEG in all tests. Regenerate dataset (#4977 ) * feat: remove BinaryOps.SUB * remove SUB in test_early_end_local * regenerate dataset. remove SUB in test_linearizer_* * reenable overflow tests * simplify tensor.sub function by returning a+(-b) * remove whitespaces --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-18 09:06:13 -04:00
chenyu	620fa6e5a2	check Tensor.reshape can have at most one -1 (#5026 ) raise RuntimeError to match torch. on master it throws weird errors from shapetracker	2024-06-18 08:17:12 -04:00
Elias Wahl	7bfa9101c0	Float in scaled dot product attention (#4985 ) * Monkeypatch scaled-dot-product-attention * Use dot instead of matmul * new api * imports * least_upper_dtype	2024-06-18 08:16:41 -04:00
nimlgen	194a168630	hcq signal scheduler (#5016 ) * faster hcq * fix nv * linter * cleaner * fix sync * cleaner * a bit cleaner	2024-06-18 14:02:21 +03:00
chenyu	e9c6a36894	remove CACHELEVEL=0 in llama3 benchmark (#5025 )	2024-06-17 22:43:16 -04:00
chenyu	acaf9a490d	RECIP(-0.0) should be -inf (#5024 ) * RECIP(-0.0) should be -inf added test_dtype_alu for PYTHON backend * catcht that * fix those two	2024-06-17 22:26:58 -04:00
GabrielZCode	66760ae558	graph display floats rounded (#5021 ) Co-authored-by: gabrielsouza <gabriel.martins@perdcomp.com.br>	2024-06-17 18:22:55 -07:00
chenyu	03b367c014	handle float16 overflow in PYTHON (#5022 ) * handle float16 overflow in PYTHON use `truncate` when constructing tensor from list to make sure all values are packable (might be slow, but should be correct). add truncate_fp16 to cast overflowed values to inf/-inf. * all valid fmt supports truncate	2024-06-17 21:12:52 -04:00
chenyu	c0139b05d8	python_alu sin(inf) is nan (#5020 ) * python_alu sin(inf) is nan without special handling, it throws ValueError: math domain error * skip CUDACPU	2024-06-17 19:47:30 -04:00
chenyu	4296507021	Tensor.sum returns in acc_dtype if specified (#5012 ) * Tensor.sum returns in acc_dtype if specified * skip PYTHON for now * revert that * relax that	2024-06-17 16:35:52 -04:00
chenyu	013c73c3b3	minor refactor overflow handing in python backend (#5015 ) made it clear that it's only handing int now. need to handle float inf next	2024-06-17 12:18:38 -04:00
Ray	1ad3b25461	fix einsum output str (#4998 ) * fix einsum output str * new line to satisfy linter * removed redundant cast (satisfy linter)	2024-06-17 12:18:14 -04:00
nimlgen	794acefbf3	hcq update waits and signals in place (#4984 ) * hcq update waits and signals in place * start amd * amd works * prettier * test * normal messages * linetr * linter 2	2024-06-17 17:19:07 +03:00
qazal	603a4a0ce1	process replay contributor docs (#5010 )	2024-06-17 09:38:59 -04:00
qazal	026c59543c	allow keyword args in UOp.store [run_process_replay] (#5008 ) * allow keyword args in UOp.store [run_process_replay] * same for load * typing can stay	2024-06-17 15:42:27 +03:00
uuuvn	f1de8cd8cf	Convert a bunch more rules [run_process_replay] (#5007 ) * Convert a bunch more rules [run_process_replay] * more rules, narrow down CMPLT rule * smart linter cut two lines * nope, the linter is dumb * make dumb linter shut up * revert two rules * Revert "revert two rules" This reverts commit `585688da17`. * fix	2024-06-17 15:16:31 +03:00
chenyu	c52352bd9a	fix yolov8 example (#5003 ) it was creating Tensor from a list of numpy arrays, which is not supported after moving creating from a list not using numpy.	2024-06-16 20:47:29 -04:00
nimlgen	8bc0cbf67b	nv tiny cleanups (#5001 ) * nv tiny cleanups * gpfifo rework * return type	2024-06-17 00:43:44 +03:00
qazal	04feeb37e6	look for unsafe pad ops in multiview ShapeTrackers (#5002 )	2024-06-17 00:28:12 +03:00
George Hotz	bee8fc29ee	add GPT2 half/half+beam to AMD (#5000 ) * add GPT2 half/half+beam to AMD * winograd in training. half and half/beam file upload	2024-06-16 14:07:14 -07:00
chenyu	72c9b22833	sort vars in jit when building expected input args (#4990 ) * sort vars in jit when building expected input args fixed symbolic jit bugs with two variables. * sort in clanggraph * space * one more	2024-06-16 15:55:51 -04:00
qazal	71aad183fd	check Program from HEAD [run_process_replay] (#4996 ) * use the same prg [run_process_replay] * put var back	2024-06-16 20:12:30 +03:00
chenyu	2b07847f2b	matmul returns in acc_dtype if specified (#4994 ) more flexible to not automatically downcast, can fix bert mixed precision training with this	2024-06-16 12:56:15 -04:00
George Hotz	1d6f1a15e1	add lt and ge uop methods [run_process_replay] (#4995 ) * add lt and ge uop methods [run_process_replay] * more correct (should still run process replay)	2024-06-16 09:33:53 -07:00
uuuvn	1b3f27565a	Boring UOps to UPat compiler [run_process_replay] (#4991 ) * Boring UOps to UPat compiler * ruff * weirdness * dtype fix * Revert "weirdness" This reverts commit `4bc213a157`. * weirdness * end weirdness? * a bunch more rules * more patterns	2024-06-16 09:03:41 -07:00
George Hotz	dac96f177e	ignore indexing in the flopcounter (#4993 )	2024-06-16 08:59:55 -07:00
Timmy	01b26756d6	Multireduce Scheduler Tests (#4972 ) * scheduler tests * linters * cleaning up tests * fixing tests * syntax * fixing metal	2024-06-16 16:30:22 +03:00
chenyu	5eb8001514	minor cleanup in jit (#4989 ) found a non-deterministic bug in jit with multiple variables. but first cleanup some variable names. [run_process_replay]	2024-06-15 23:43:17 -04:00
chenyu	44dfa37c70	use threefry in stable diffusion benchmark (#4988 ) also updated default steps to 10. easier to tell the image is following the prompt.	2024-06-15 20:25:29 -04:00
chenyu	20b50d8d64	doc: manual_seed (#4987 ) there was a docstring just not linked to the doc page. also updated the example to show re-seed instead of a internal variable	2024-06-15 19:57:26 -04:00
wozeparrot	ce1ed374c9	more tinychat fixes (#4971 )	2024-06-15 16:29:39 -07:00
chenyu	50bc14d186	re-enable test that loads torch pkl format (#4986 )	2024-06-15 14:11:30 -04:00
qazal	ff8e9eefc3	hotfix: don't use ASSERT_COMPILE for benchmarks process replay (#4981 ) * use replay_codegen [run_process_replay] * disable for now [run_process_replay]	2024-06-15 16:57:47 +03:00
uuuvn	92f49efd06	Trigger process replay from pull request title [run_process_replay] (#4980 ) * Trigger process replay from pull request title * idk how this thing works btw * test if it will work * try 2 * Revert "idk how this thing works btw" This reverts commit `580da51b07`. * Revert "try 2" This reverts commit `7ff1e86d5d`. * test if it works * meh * Reapply "idk how this thing works btw" This reverts commit `dd33ad7c14`. * revert	2024-06-15 16:21:00 +03:00
uuuvn	033fb53f9e	Incomplete/buggy rule breaks process replay on #4976 (#4978 ) * Incomplete/buggy rule breaks process replay on #4976 * test passes --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-15 15:18:35 +03:00
qazal	d91f0ee85b	add regression test for the neg folding pattern (#4979 )	2024-06-15 15:08:28 +03:00
nimlgen	dfadf82e10	hcq optimize enqueue time (#4973 ) * hcq optimize enqueue time * linter	2024-06-15 10:47:25 +03:00
chenyu	5f7dd74655	docs: update wording for unflatten (#4974 ) it was using `Expands`, the same in torch doc, but we also have expand so it's confusing	2024-06-14 23:12:41 -04:00
Cyril Roumégous	efbf4fca05	perf: graph_rewrite line reduction and make it a little bit faster [run_process_replay] (#4958 )	2024-06-14 16:37:27 -07:00
wozeparrot	8209cd3c55	easier llama3 + fetch subdir (#4938 )	2024-06-14 13:47:27 -07:00
chenyu	64cda3c481	raise TypeError calling len() on a 0-d tensor (#4970 ) matched numpy and torch	2024-06-14 16:34:27 -04:00
chenyu	67e8df4969	remove numpy from dtype (#4969 ) replaced all dtype.np with _to_np_dtype defined in tensor.py. after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer	2024-06-14 15:38:45 -04:00
wozeparrot	62dc36d371	autogen _try_dlopen (#4949 )	2024-06-14 12:12:18 -07:00
qazal	3e297d8216	delete Linearizer.const [run_process_replay] (#4967 )	2024-06-14 21:51:37 +03:00

1 2 3 4 5 ...

4766 Commits