tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
chenyu	2cadf21684	include "mkdocs" in setup docs (#5798 )	2024-07-29 15:54:52 -04:00
chenyu	471b188d79	fix mypy errors in latest mypy (#5794 ) * fix mypy errors in latest mypy mypy has stricter partial and api arg checks now * PYTHONPATH="."	2024-07-29 14:53:30 -04:00
samm393	573e0f9a48	remove float division from idiv in python_alu (#5777 ) * removes float division from idiv in python_alu * add test * cleaner logic * pass clang unsigned literals correctly * suffix ULL instead of U --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-29 12:14:12 -04:00
samm393	2c94316bd2	ull literal support and test (#5789 ) * ull literal support and test * missing .numpy()	2024-07-29 11:50:49 -04:00
nimlgen	71e1472290	hcq more types (#5791 ) * mhcq more types * linter * pylint * docs: bind	2024-07-29 18:03:23 +03:00
P4ssenger	9c80f9adf9	fix bug in assert message (#5787 )	2024-07-29 15:46:23 +03:00
nimlgen	ab3839a80a	cleanup nv/cuda compilers (#5767 ) * cleanup nv/cuda compilers * destroy prog * small test * fix test * nv ptx rewrite key * jitlink free * ptx is part of cuda	2024-07-29 13:50:03 +03:00
chenyu	76840fd65a	minor ops cleanup [run_process_replay] (#5786 )	2024-07-29 02:30:38 -04:00
chenyu	e7a14f398e	more uop_symbolic tests for divmod pairs (#5785 )	2024-07-28 21:27:06 -04:00
George Hotz	76d191ab94	move consts to end of add (#5783 ) * move consts to end of add * better * fix infinite loop	2024-07-28 17:38:57 -07:00
George Hotz	5b84a7db1a	hotfix: ptx threads match cuda threads	2024-07-28 16:53:24 -07:00
chenyu	460b120d62	apply more .alu syntactic sugar [run_process_replay] (#5782 )	2024-07-28 19:43:48 -04:00
George Hotz	0392123e6e	TC=2 still sets tensor cores (and TC=3 support for locals) (#5780 ) * TC=2 still sets tensor cores * add TC=3 support for using locals * bugfix * lines + TC=3 tests * CUDA can use threads, fix fuzz linearizer	2024-07-28 16:16:53 -07:00
chenyu	71a64d8252	UOps.MUL bound when one is negative (#5781 ) * UOps.MUL bound when one is negative also one more distribute_mul rule * don't always expand	2024-07-28 19:02:47 -04:00
qazal	b775db6b60	high-level benchmark timing diff (#5776 ) * high level timings benchmark times fix defs * use the name map * skip last task	2024-07-28 23:42:57 +03:00
chenyu	600a39771d	fix Tensor.arange if (stop-start) and step have different signs (#5775 )	2024-07-28 14:34:10 -04:00
David González Martínez	d0fd84e617	feat: allow passing gradient to .backward() to compute vjp (#5771 ) * feat: allow passing gradient to .backward() to compute vjp * fix * refactor * fix trailing whitespace	2024-07-28 11:13:18 -07:00
qazal	e0e7293b0a	make process replay unique in retries [run_process_replay] (#5773 )	2024-07-28 20:44:15 +03:00
nimlgen	ea27ec4cd0	nv switch classlist_v2 to classlist (#5763 ) * nv switch classlist_v2 to classlist * support in mockgpu * fix mockgpu	2024-07-28 20:24:42 +03:00
nimlgen	73fda023d3	amd better comments for ENABLE_SGPR_DISPATCH_PTR (#5768 ) * amd better comments for ENABLE_SGPR_DISPATCH_PTR * fix lkinter	2024-07-28 16:23:38 +03:00
qazal	95dda8dadf	more unmatching vectorize/gep asserts [run_process_replay] (#5760 ) * merge vectorize/gep rules [run_process_replay] * assert dtypes * src= * float2=(float4.x,float4.y)	2024-07-28 15:08:54 +08:00
chenyu	bfbd7c5461	more generic UOp mul mod folding (#5765 )	2024-07-27 20:20:35 -04:00
chenyu	80c6475757	update test_uop_symbolic to test UOp min and max (#5764 ) covers #5750, #5748, #5741	2024-07-27 19:53:21 -04:00
nimlgen	1903542c2d	nv/cuda compilers touchup (#5759 ) * nv/cuda compilers touchup * fix cuda check + move nv disasm * remove includes * fix nvrtc_check	2024-07-28 00:15:28 +03:00
chenyu	3c79faaf77	remove redundant UOps max folding [run_process_replay] (#5762 ) all covered by generic max folding	2024-07-27 16:46:51 -04:00
chenyu	05748e5a84	fix vmax of Uop.RANGE off by 1 (#5750 ) with this, can remove several redundant max folding rules, do it separately to check kernel diff	2024-07-27 16:30:46 -04:00
nimlgen	fff19b961b	docs: user runtime docs (#5756 )	2024-07-27 23:21:54 +03:00
nimlgen	5d53fa491b	amd autogened kfd ioctls (#5757 ) * amd autogened kio * unused import * linter	2024-07-27 22:49:48 +03:00
nimlgen	ed1d784077	test profiler timer sync across devs (#5751 ) * test profiler timer sync across devs * more correct * typo	2024-07-27 16:47:37 +03:00
qazal	e5fb08acbc	simpler expand UOps acc [run_process_replay] (#5754 )	2024-07-27 15:20:56 +03:00
gswangg	de66d93859	PTX render vec CONST (#5729 ) * dedupe PTX vec CONST render * fix linter errors --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-07-27 13:40:19 +03:00
qazal	890e11ce11	fix UOps.STORE folding returning NOp [run_process_replay] (#5753 )	2024-07-27 13:32:54 +03:00
qazal	3e49d86c01	process replay diffs 3 things now (#5731 ) * github api infra * process replay is 3 parts now * parse benchmarks * add gh_token * complete diff * move process replay tests * last successful run * add tempdir * skip master	2024-07-27 12:52:20 +03:00
qazal	57b4a8e98d	assert process replay asserts (#5737 ) * assert process replay asserts * one ci job is fine * test: Revert "separate process replay main loop (#5734)" This reverts commit `94d578396f`. * mac sed needs that * Revert "test: Revert "separate process replay main loop (#5734)"" This reverts commit `e4ad7684d5`. * disable process replay capture * save time * amd is tiny * send to /dev/null	2024-07-27 12:07:50 +03:00
George Hotz	f8972ace38	test flops (and allow wide ALU in UOps) [run_process_replay] (#5749 ) * flops test in external_test_speed_theoretical.py * test speed theo * min SZMAX * allow wide ALU for things that support it * needed for mypy	2024-07-26 21:07:28 -07:00
George Hotz	2fde2d2914	hotfix: external_test_speed_theoretical works on 24GB	2024-07-26 18:41:52 -07:00
chenyu	b75d1e8793	UOp._min_max for IDIV (#5748 )	2024-07-26 21:40:16 -04:00
George Hotz	829262a5ee	add external_test_speed_theoretical	2024-07-26 17:45:22 -07:00
chenyu	5f168e7499	remove the optimization in AndNode.substitute (#5747 ) was used in the old linearizer but longer needed. still need substitute because some fuzz tests calls sym_infer on AndNode	2024-07-26 20:08:07 -04:00
kormann	c50e354936	NOp clean up any_len passing [run_process_replay] (#5743 ) * clean allow_any_len * min	2024-07-26 17:00:31 -07:00
George Hotz	db1d093b29	reenable LLaMA-3 8B BEAM on NV (#5746 )	2024-07-26 16:56:41 -07:00
chenyu	c6b2d96474	minor uop uopgraph cleanups (#5745 )	2024-07-26 19:23:48 -04:00
chenyu	3686b6726a	move GraphException to jit.py (#5744 ) same place where GraphRunner is defined	2024-07-26 19:01:12 -04:00
kormann	a5ede535ef	NOp field name [run_process_replay] (#5742 ) * rm def name * add field name	2024-07-26 18:45:59 -04:00
chenyu	0d7d4dd731	UOp._min_max for MUL and MOD (#5741 )	2024-07-26 18:38:10 -04:00
George Hotz	c50e374bb6	multiple locals + get_kernel_modifier + fix valid (#5739 ) * multiple locals + get_kernel_modifier + fix valid * fix test pattern matcher	2024-07-26 15:10:10 -07:00
nimlgen	f6c0e17a2c	optimize symbolic-related updates in graphs (#5727 ) * try * faster * cleaner * better? * better? * cleaner * fixes * unused * mypy * fix clang * remove comment * better var names * rename * fix cuda * rename	2024-07-27 00:57:59 +03:00
chenyu	dc7483ee6f	UOp simple div folding (#5740 ) made UOp.divides return the Optional[quotient] and used it for simple div folding	2024-07-26 17:14:32 -04:00
chenyu	671259417f	reuse UOp `__repr__` for NOp (#5738 )	2024-07-26 16:59:55 -04:00
kormann	b0c1dba299	named UOp class "NOP" [run_process_replay] (#5728 ) * NOP * fix const + simplify compile * rm VAR for NOOP --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-07-26 13:25:53 -07:00

1 2 3 4 5 ...

5354 Commits