tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-19 02:44:40 -05:00

Author	SHA1	Message	Date
chenyu	12e6771209	failed test case for unrolled half4 (#5552 )	2024-07-18 13:05:52 -04:00
George Hotz	d1a7279605	indexing fold with casted bool (#5551 ) * cast bool is where * universal transform is wrong	2024-07-18 10:02:29 -07:00
kormann	2c4add6844	pretty print lazy op per default (#5505 ) * pretty lop * min diff * walrus * fix * min diff * simplify * pretty helper function * ws * pretty uop upat * tests * stricter tests * test passes * ws * stronger upat test * delete print_tree * min diff * stricter exp test * fix merge * stronger uops eval test * +readable and deep upat test * +readable and deep upat test * sort inv fix * fix * revert allowed_len	2024-07-18 09:34:08 -07:00
qazal	0ad1672d5f	fuse indexing (LazyOp creation) (#5506 ) * bring FUSE_AS_ONE_KERNEL back * operands need reshape? * fused but arange didnt fold * something deeply wrong * yay, fused * derive broadcasts * s/input/reduce_input * _fixup_ones proved a point * this is what it takes * down to 3 required reshapes: 1. output_shape 2. the second reduce merge dims 3. remove dims for above reshape * start real reshapes * resolve shape in the edges pre lazyop * outputs are the same shape * rewrite1: just the reduce * more correct * fuse_as_one_kernel * closer * this passes * dont rerun info * dont need these * not needed	2024-07-18 14:09:17 +03:00
George Hotz	fa7e734b49	MetaOps.KERNEL (#5543 )	2024-07-17 19:41:23 -07:00
George Hotz	d3b098299d	add failing regression test for image (#5540 ) * add failing regression test for image * tg type * simpler test * don't realize image to image casts caused issue * simple pad	2024-07-17 17:27:18 -07:00
qazal	61ee02e93d	start multireduce lowerer work (var/std) (#5537 ) * multireduce no-opts works * passed test_var_multireduce * cleanup * double reduce * extra check for range_group * more checking for range_groups * cleaning up debug prints * cleanup diff * linters * revert kernel changes * these are uops toposort --------- Co-authored-by: timmy <timmy0x@proton.me>	2024-07-17 23:43:46 +03:00
Francis Lam	c4eb30a04c	test/test_linearizer_failures: add a new beautiful_mnist one (#5531 ) * test/test_linearizer_failures: add a new beautiful_mnist one this one is from a DEPTH=2 fuzz_linearizer search * add GPU to test_failure_40 --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-17 16:27:04 -04:00
qazal	0259d76183	use Context only in replaying Kernel [run_process_replay] (#5535 )	2024-07-18 03:46:14 +08:00
George Hotz	1a68854766	PatternMatcher add (#5532 ) * PatternMatcher add [run_process_replay] * f4 dynamic * test_failure_36 is fixed * fix PTX	2024-07-17 12:44:42 -07:00
qazal	a7706e05f9	option to [skip_process_replay] (#5533 )	2024-07-17 22:30:46 +03:00
George Hotz	1242b302fa	expand UOps with rewrite rules (#5501 ) * expand UOps with rewrite rules [run_process_replay] * progress * much closer * close, way less bugs * bunch of expander tests * fix contract * ops tests pass * fix barrier * mostly passing * bitcast in expanded ops * support more expand merges * all tests pass maybe * fix empty EXPAND * fix LIN fuzzing * add ALL_SAME assert * all same * all same work * raise CompileError * pass fuzz linearizer * revert whitespace * fix nv tensor core test * fix mypy * bug fix * fuzzer passes * put tests back * expand arg to idx	2024-07-17 10:17:50 -07:00
George Hotz	158221b36b	expand tests from uop_expander [run_process_replay] (#5524 ) * expand tests from uop_expander * more changes from the branch	2024-07-17 09:22:36 -07:00
George Hotz	42c25cc961	fix fixup_ast (#5523 ) * fix fixup_ast * these lin failures are fixed	2024-07-17 08:52:21 -07:00
nimlgen	dcd462860f	elf loader (#5508 ) * elf loader * cleanup * cleaner * cleaner * fixes * revert this * fix div 0 * fix nv * amd fix * fix mockgpu * amd better? * restore relocs for <12.4 * linter * this is fixed now * revert this * process cdefines as function * cleaner * align * save lines * revert this change	2024-07-17 17:09:34 +03:00
Francis Lam	2d53abb04a	test/external/fuzz_linearizer: fix for new AST changes (#5519 ) * test/external/fuzz_linearizer: fix for new AST changes also add beautiful_mnist failures * add CLANG and LLVM to test_failure_35 failed_platforms * fix test_linearizer_failure names	2024-07-17 00:08:07 -04:00
chenyu	6e405b0a2b	add 0d tensor to trunc/floor/ceil/round tests (#5512 ) existing trunc test passes backward but its backward is incorrect in general. added tests that would fail	2024-07-16 16:48:25 -04:00
Tobias Fischer	87a2ef2bc2	Add Interpolate Function (#5482 ) * add interpolate function * fixed linter issue * reduced sizes in test --------- Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2024-07-16 09:44:01 -07:00
qazal	173064c69c	(re)start multireduce in codegen/* (#5391 ) * test_var_multireduce * run verify_lazyop * test_var_multireduce * assert lazyop * add test_indexing_multireduce * arange fuses (crude) * note: extra reshape * start readble * test_arange_simple * test_arange_expanded * test_indexing_multireduce * cleanups * skip ptx * skip nv and amd ci * skip arange expanded too * GPU=1 is slow too in CI	2024-07-16 14:20:48 +03:00
chenyu	07ff4b7d24	test_failure_33 ast that has UOps.UNMUL after linearize (#5504 ) * test_failure_33 ast that has UOps.UNMUL after linearize * smaller	2024-07-15 22:54:23 -04:00
chenyu	63990705b5	test kernel opts case for 4 local and 4 groups (#5499 ) make sure local grouped dim is correct	2024-07-15 20:09:38 -04:00
Edward Wang	9a7d5a148e	move colorize_float to helpers.py (#5490 ) * add colorize_float to helpers.py * update references	2024-07-15 11:29:03 -07:00
qazal	ac08f0eb00	reshape rawbufs in test_linearizer (#5492 ) * reshape rawbufs in test_linearizer * fix helper_linearizer_ast	2024-07-15 19:14:38 +03:00
qazal	ae4cb7994e	run process replay with DEBUG=0 (#5491 ) * process replay with DEBUG=0 * graceful shutdown * use and	2024-07-15 16:30:57 +03:00
Tobias Fischer	e219103677	Add Pad to Pooling (#5488 )	2024-07-14 21:50:20 -07:00
Tobias Fischer	5849130cbb	gather negative dim fix (#5486 )	2024-07-14 20:20:53 -04:00
qazal	3c378efcb6	process replay docs improvements (#5481 ) * minor cleanups * docs and logs * shorter * comma * s/print/logging.info [run_process_replay] * use logging.warn * process name is noise * revert lowerer change [run_process_replay]	2024-07-15 00:09:28 +03:00
chenyu	613a1dbeed	render lidx starting with 0 (#5478 ) * render lidx starting with 0 changed from ``` int gidx0 = gid.x; /* 4096 / int lidx4 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx5 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx6 = lid.z; / 2 / ``` to ``` int gidx0 = gid.x; / 4096 / int lidx0 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx1 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx2 = lid.z; / 2 / ``` the existing one started from pre-limited global dims which skip number if there are more than 3 global dims don't need start_dim --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-07-14 16:34:04 -04:00
qazal	671779f280	limit process replay diff to ~20% of kernels (#5480 ) * render lidx starting with 0 changed from ``` int gidx0 = gid.x; /* 4096 / int lidx4 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx5 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx6 = lid.z; / 2 / ``` to ``` int gidx0 = gid.x; / 4096 / int lidx0 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx1 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx2 = lid.z; / 2 / ``` the existing one started from pre-limited global dims which skip number if there are more than 3 global dims don't need start_dim * add changed * env var * more early exit * simpler? * Revert "Merge branch 'lidx0' into process_replay_limit" This reverts commit `cbadcfa5e9`, reversing changes made to `fc9bf37ee7`. * minor cleanup --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-14 23:10:08 +03:00
chenyu	f8a47608cc	test dtype.min and dtype.max (#5479 ) compared with np.iinfo for integer dtype	2024-07-14 15:31:37 -04:00
George Hotz	a9f5a764dc	make BatchNorm work for 2D and 3D (#5477 ) * make BatchNorm work for 2D and 3D * beautiful mnist shouldn't use BatchNorm2d	2024-07-14 11:39:58 -07:00
chenyu	e41ab66653	use is to compare types (#5476 ) new rule in latest ruff	2024-07-14 14:26:41 -04:00
nimlgen	61822d1a14	nv fix timeline signal rollover on copy queue (#5473 ) * hotfix: nv rollover to 32bits * test both queues	2024-07-14 16:06:12 +03:00
nimlgen	8835d6c49a	cleanup nv/amd program (#5449 ) * cleanup nv/amd program * fix amd * a bit cleaner * ugh, typo * linter * fix nv * tiny thing	2024-07-14 14:08:35 +03:00
qazal	0b3a34e3b1	vectorize folding [run_process_replay] (#5470 ) * test_gep_vec_fold * remove that * fix process replay * lint	2024-07-14 09:41:48 +03:00
chenyu	28972418c4	s/get_linearizer/get_kernel [run_process_replay] (#5467 )	2024-07-13 20:32:22 -04:00
Francis Lata	0345577032	UNet3D dataloader shared memory fix (#5465 ) * create separate SharedMemory between inputs and labels * update path check for shared mem * clean up unit test for dataset	2024-07-13 20:26:00 -04:00
George Hotz	942c58be90	BEAM_COMPARE=2 validates the correctness of BEAM kernels (#5458 ) * beam compare 2 * found issue maybe * correct, not fail * full rand * less numpy * extra simplify doesn't fix it * reorder * no numpy * check in reverse * test new tensor behavior * better error msg	2024-07-13 13:53:43 -07:00
qazal	487ceff825	hotfix: ASSERT_PROCESS_REPLAY sometimes doesn't exist (#5456 )	2024-07-13 21:15:40 +03:00
qazal	40ec9410f9	simpler process replay (#5452 ) * remove check_process_replay * that can go to the top * add assert back * [run_process_replay] * checkout code [run_process_replay] * temp [run_process_replay] * revert temp [run_process_replay] * ahh this is why [run_process_replay] * revert temp [run_process_replay]	2024-07-13 19:55:06 +03:00
qazal	23b907efbb	restore process replay runs by their id (#5453 )	2024-07-13 19:32:34 +03:00
George Hotz	e638b0084f	smaller multitensor resnet test (#5450 ) * minor improvments to matcher speed [run_process_replay] * oh, put that back * make fake images smaller for resnet test	2024-07-13 07:31:28 -07:00
qazal	bb1a9ebf78	run process replay in parallel (#5443 )	2024-07-13 11:29:36 +03:00
chenyu	3ebf569f04	relax fuzz transend math threshold a bit (#5442 ) * relax fuzz transend math threshold a bit * fuzz more * fuzz 50k	2024-07-13 03:31:21 -04:00
chenyu	e398734890	fuzz test transcend math (#5383 ) * fuzz test transcend math found something wrong with float64 sin reduction ``` from tinygrad import Tensor, dtypes import numpy as np print(Tensor([39800.0], dtype=dtypes.float64).sin().numpy()) print(Tensor([39800.0], dtype=dtypes.float32).sin().numpy()) print(Tensor([39800.0], dtype=dtypes.float16).sin().numpy()) print(np.sin(np.array([39800.0], dtype=np.float64))) print(np.sin(np.array([39800.0], dtype=np.float32))) print(np.sin(np.array([39800.0], dtype=np.float16))) ``` ``` CLANG=1 python test.py [0.92785633] [0.7428573] [-0.7705] [0.74285722] [0.7428572] [-0.7705] ``` * fix test * abs * skip	2024-07-13 01:54:52 -04:00
hikettei	3a7262d923	[Patch] Fixed an invaild value of fp64 xlog(DBL_MIN) (#5441 ) * [Patch] Removed weird NaN Handling in xlog2 resulting in different output around 1e-203 * Patch: compare the value of xlog(x) using y, allowing x <= 1e-200 * mypy * fuzzer tests for log2 * fix tests: use approximate dbl_min, fp64 fails at nv * update: gradually increment the scale (if y is not inf)	2024-07-13 01:11:53 -04:00
wozeparrot	90f0e2fc49	db in wal mode (#5388 )	2024-07-12 20:43:36 -07:00
chenyu	4df63da190	clean up rest of the loadop [run_process_replay] (#5440 ) to metaop and filter_sink	2024-07-12 23:38:51 -04:00
hikettei	0795139f30	Fix TRANSCENDENTAL=2 fp64 sin (#5385 ) * fixes on transcendental: fix for fp64 payne hanek, refactor for fp16 sin * revert the changes on test * refactor on xsin: removed cody_waite_reduction, always use payne_hanek * Revert "refactor on xsin: removed cody_waite_reduction, always use payne_hanek" This reverts commit `2fd401f251`. * still need cody_waite_reduction for the very smaller range * test: added a regression test for transcendental sin * test: found the worse case ulp 3.5 only in numpy * give the input as a valid dtype --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-12 23:15:04 -04:00
George Hotz	fb3011ac61	improve matcher speed [run_process_replay] (#5438 ) * improve matcher speed [run_process_replay] * don't use arg set in ptx	2024-07-12 20:02:19 -07:00

... 8 9 10 11 12 ...

2555 Commits