tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-04 03:35:16 -05:00

Author	SHA1	Message	Date
wozeparrot	8845a5dbfd	feat: begin immediate (#5539 )	2024-07-17 16:11:21 -07:00
George Hotz	a6e70f8a71	clean up expand function [run_process_replay] (#5538 ) * clean up expand function [run_process_replay] * lil cleaner * add a type	2024-07-17 15:02:00 -07:00
qazal	61ee02e93d	start multireduce lowerer work (var/std) (#5537 ) * multireduce no-opts works * passed test_var_multireduce * cleanup * double reduce * extra check for range_group * more checking for range_groups * cleaning up debug prints * cleanup diff * linters * revert kernel changes * these are uops toposort --------- Co-authored-by: timmy <timmy0x@proton.me>	2024-07-17 23:43:46 +03:00
qazal	67ea4af01f	depth first recurse_reduceops (#5536 ) * early recurse p2 * yea cache shouldnt be there	2024-07-17 23:27:53 +03:00
Francis Lam	c4eb30a04c	test/test_linearizer_failures: add a new beautiful_mnist one (#5531 ) * test/test_linearizer_failures: add a new beautiful_mnist one this one is from a DEPTH=2 fuzz_linearizer search * add GPU to test_failure_40 --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-17 16:27:04 -04:00
qazal	0259d76183	use Context only in replaying Kernel [run_process_replay] (#5535 )	2024-07-18 03:46:14 +08:00
George Hotz	1a68854766	PatternMatcher add (#5532 ) * PatternMatcher add [run_process_replay] * f4 dynamic * test_failure_36 is fixed * fix PTX	2024-07-17 12:44:42 -07:00
qazal	d3c137d478	utility for computing reduceop output_shape (#5534 ) * refactor to reduce_st * update lazy	2024-07-17 22:40:07 +03:00
qazal	0a7872a62f	use exec_alu in uops flop counting (#5511 ) * use exec_alu for uops flop counting * deal with sint	2024-07-17 22:39:27 +03:00
qazal	a7706e05f9	option to [skip_process_replay] (#5533 )	2024-07-17 22:30:46 +03:00
chenyu	4193095f67	fix handcode_opt.py with DEBUG=2 (#5530 ) only one ast per kernel now	2024-07-17 14:50:47 -04:00
chenyu	466555cd17	touchup Tensor.interpolate (#5525 ) * touchup Tensor.interpolate and Tensor.lerp rewrite lerp to save one sub and thus flops. use Tensor.lerp for interpolate and some minor cleanups * revert lerp change	2024-07-17 13:35:57 -04:00
George Hotz	1242b302fa	expand UOps with rewrite rules (#5501 ) * expand UOps with rewrite rules [run_process_replay] * progress * much closer * close, way less bugs * bunch of expander tests * fix contract * ops tests pass * fix barrier * mostly passing * bitcast in expanded ops * support more expand merges * all tests pass maybe * fix empty EXPAND * fix LIN fuzzing * add ALL_SAME assert * all same * all same work * raise CompileError * pass fuzz linearizer * revert whitespace * fix nv tensor core test * fix mypy * bug fix * fuzzer passes * put tests back * expand arg to idx	2024-07-17 10:17:50 -07:00
George Hotz	158221b36b	expand tests from uop_expander [run_process_replay] (#5524 ) * expand tests from uop_expander * more changes from the branch	2024-07-17 09:22:36 -07:00
George Hotz	42c25cc961	fix fixup_ast (#5523 ) * fix fixup_ast * these lin failures are fixed	2024-07-17 08:52:21 -07:00
qazal	fbe0233be3	infra for multi reduce asts (#5522 ) * add reduce_info * _recurse_reduceops base * derive output shape * refactor * delete reduce_for_op * save lines * more line saving	2024-07-17 17:23:46 +03:00
nimlgen	dcd462860f	elf loader (#5508 ) * elf loader * cleanup * cleaner * cleaner * fixes * revert this * fix div 0 * fix nv * amd fix * fix mockgpu * amd better? * restore relocs for <12.4 * linter * this is fixed now * revert this * process cdefines as function * cleaner * align * save lines * revert this change	2024-07-17 17:09:34 +03:00
nimlgen	661da32aff	nv do not map regions twice (#5521 )	2024-07-17 11:20:02 +03:00
Francis Lam	2d53abb04a	test/external/fuzz_linearizer: fix for new AST changes (#5519 ) * test/external/fuzz_linearizer: fix for new AST changes also add beautiful_mnist failures * add CLANG and LLVM to test_failure_35 failed_platforms * fix test_linearizer_failure names	2024-07-17 00:08:07 -04:00
Tobias Fischer	85d4ca7caa	FID Inception Model (#5516 ) * added model impl * minor cleanups * extracted weights loading into from_pretrained * reorganized model for better weight loading * removed lru cache for state dict loading	2024-07-16 23:12:03 -04:00
chenyu	4ad83d032e	remove Kernel.lazyops [run_process_replay] (#5517 ) always use Kernel.ast.lazyops	2024-07-16 19:47:42 -04:00
wozeparrot	1c1d6d3a4a	feat: show caller when tracemeta >= 2 (#5514 )	2024-07-16 15:06:02 -07:00
chenyu	5aad043522	cleanup fixup_ast local shape long line [run_process_replay] (#5513 )	2024-07-16 17:29:38 -04:00
chenyu	6e405b0a2b	add 0d tensor to trunc/floor/ceil/round tests (#5512 ) existing trunc test passes backward but its backward is incorrect in general. added tests that would fail	2024-07-16 16:48:25 -04:00
chenyu	0afcbfae84	docs: add Tensor.interpolate to doc page (#5510 )	2024-07-16 14:17:19 -04:00
Tobias Fischer	87a2ef2bc2	Add Interpolate Function (#5482 ) * add interpolate function * fixed linter issue * reduced sizes in test --------- Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2024-07-16 09:44:01 -07:00
gswangg	203161c75d	refactor VECTORIZE/GEP rules (#5507 )	2024-07-16 09:41:23 -07:00
qazal	173064c69c	(re)start multireduce in codegen/* (#5391 ) * test_var_multireduce * run verify_lazyop * test_var_multireduce * assert lazyop * add test_indexing_multireduce * arange fuses (crude) * note: extra reshape * start readble * test_arange_simple * test_arange_expanded * test_indexing_multireduce * cleanups * skip ptx * skip nv and amd ci * skip arange expanded too * GPU=1 is slow too in CI	2024-07-16 14:20:48 +03:00
chenyu	07ff4b7d24	test_failure_33 ast that has UOps.UNMUL after linearize (#5504 ) * test_failure_33 ast that has UOps.UNMUL after linearize * smaller	2024-07-15 22:54:23 -04:00
chenyu	1ccd987e6a	simpler tc permaxis in fixup_ast.fix_st [run_process_replay] (#5502 )	2024-07-15 21:35:32 -04:00
George Hotz	9d4c3c553c	prepare expand to support multiexpand [run_process_replay] (#5503 )	2024-07-15 18:21:24 -07:00
chenyu	fd43d33b7d	shave some lines from transcend math [run_process_replay] (#5500 ) * shave some lines from transcend math [run_process_replay] * put input_dtype back	2024-07-15 21:02:24 -04:00
chenyu	63990705b5	test kernel opts case for 4 local and 4 groups (#5499 ) make sure local grouped dim is correct	2024-07-15 20:09:38 -04:00
Alessandro Benetti	13e200b437	add strict mkdocs check (#5497 )	2024-07-15 14:21:37 -07:00
nimlgen	8dfd11c1d8	docs: hcq add types (#5495 ) * docs: hcq add types * linter	2024-07-15 22:14:48 +03:00
George Hotz	aab1e8c6dc	uniform init to match torch (#5494 )	2024-07-15 12:07:44 -07:00
George Hotz	338b7590b9	hotfix: docs for BatchNorm	2024-07-15 12:04:17 -07:00
nimlgen	c9ec7ce070	start hcq docs (#5411 ) * start hcq docs * more hcq docs * docs * docs * linter * correct args * linter * ts returns int	2024-07-15 21:31:11 +03:00
Edward Wang	9a7d5a148e	move colorize_float to helpers.py (#5490 ) * add colorize_float to helpers.py * update references	2024-07-15 11:29:03 -07:00
P4ssenger	a347d91e0e	remove outdated thread local aliases (#5493 )	2024-07-15 11:28:11 -07:00
qazal	ac08f0eb00	reshape rawbufs in test_linearizer (#5492 ) * reshape rawbufs in test_linearizer * fix helper_linearizer_ast	2024-07-15 19:14:38 +03:00
qazal	ae4cb7994e	run process replay with DEBUG=0 (#5491 ) * process replay with DEBUG=0 * graceful shutdown * use and	2024-07-15 16:30:57 +03:00
Tobias Fischer	e219103677	Add Pad to Pooling (#5488 )	2024-07-14 21:50:20 -07:00
chenyu	eef43c9f49	include dims in kernel/nv invalid err msg (#5487 )	2024-07-14 22:51:30 -04:00
chenyu	c80801c266	len(full_shape)-ki.upcasted -> first_upcasted (#5485 ) [run_process_replay]	2024-07-14 20:21:18 -04:00
Tobias Fischer	5849130cbb	gather negative dim fix (#5486 )	2024-07-14 20:20:53 -04:00
qazal	3c378efcb6	process replay docs improvements (#5481 ) * minor cleanups * docs and logs * shorter * comma * s/print/logging.info [run_process_replay] * use logging.warn * process name is noise * revert lowerer change [run_process_replay]	2024-07-15 00:09:28 +03:00
chenyu	613a1dbeed	render lidx starting with 0 (#5478 ) * render lidx starting with 0 changed from ``` int gidx0 = gid.x; /* 4096 / int lidx4 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx5 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx6 = lid.z; / 2 / ``` to ``` int gidx0 = gid.x; / 4096 / int lidx0 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx1 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx2 = lid.z; / 2 / ``` the existing one started from pre-limited global dims which skip number if there are more than 3 global dims don't need start_dim --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-07-14 16:34:04 -04:00
qazal	671779f280	limit process replay diff to ~20% of kernels (#5480 ) * render lidx starting with 0 changed from ``` int gidx0 = gid.x; /* 4096 / int lidx4 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx5 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx6 = lid.z; / 2 / ``` to ``` int gidx0 = gid.x; / 4096 / int lidx0 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx1 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx2 = lid.z; / 2 / ``` the existing one started from pre-limited global dims which skip number if there are more than 3 global dims don't need start_dim * add changed * env var * more early exit * simpler? * Revert "Merge branch 'lidx0' into process_replay_limit" This reverts commit `cbadcfa5e9`, reversing changes made to `fc9bf37ee7`. * minor cleanup --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-14 23:10:08 +03:00
chenyu	f8a47608cc	test dtype.min and dtype.max (#5479 ) compared with np.iinfo for integer dtype	2024-07-14 15:31:37 -04:00

... 106 107 108 109 110 ...

10490 Commits