tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 06:18:01 -05:00

Author	SHA1	Message	Date
chenyu	fd43d33b7d	shave some lines from transcend math [run_process_replay] (#5500 ) * shave some lines from transcend math [run_process_replay] * put input_dtype back	2024-07-15 21:02:24 -04:00
chenyu	63990705b5	test kernel opts case for 4 local and 4 groups (#5499 ) make sure local grouped dim is correct	2024-07-15 20:09:38 -04:00
Alessandro Benetti	13e200b437	add strict mkdocs check (#5497 )	2024-07-15 14:21:37 -07:00
nimlgen	8dfd11c1d8	docs: hcq add types (#5495 ) * docs: hcq add types * linter	2024-07-15 22:14:48 +03:00
George Hotz	aab1e8c6dc	uniform init to match torch (#5494 )	2024-07-15 12:07:44 -07:00
George Hotz	338b7590b9	hotfix: docs for BatchNorm	2024-07-15 12:04:17 -07:00
nimlgen	c9ec7ce070	start hcq docs (#5411 ) * start hcq docs * more hcq docs * docs * docs * linter * correct args * linter * ts returns int	2024-07-15 21:31:11 +03:00
Edward Wang	9a7d5a148e	move colorize_float to helpers.py (#5490 ) * add colorize_float to helpers.py * update references	2024-07-15 11:29:03 -07:00
P4ssenger	a347d91e0e	remove outdated thread local aliases (#5493 )	2024-07-15 11:28:11 -07:00
qazal	ac08f0eb00	reshape rawbufs in test_linearizer (#5492 ) * reshape rawbufs in test_linearizer * fix helper_linearizer_ast	2024-07-15 19:14:38 +03:00
qazal	ae4cb7994e	run process replay with DEBUG=0 (#5491 ) * process replay with DEBUG=0 * graceful shutdown * use and	2024-07-15 16:30:57 +03:00
Tobias Fischer	e219103677	Add Pad to Pooling (#5488 )	2024-07-14 21:50:20 -07:00
chenyu	eef43c9f49	include dims in kernel/nv invalid err msg (#5487 )	2024-07-14 22:51:30 -04:00
chenyu	c80801c266	len(full_shape)-ki.upcasted -> first_upcasted (#5485 ) [run_process_replay]	2024-07-14 20:21:18 -04:00
Tobias Fischer	5849130cbb	gather negative dim fix (#5486 )	2024-07-14 20:20:53 -04:00
qazal	3c378efcb6	process replay docs improvements (#5481 ) * minor cleanups * docs and logs * shorter * comma * s/print/logging.info [run_process_replay] * use logging.warn * process name is noise * revert lowerer change [run_process_replay]	2024-07-15 00:09:28 +03:00
chenyu	613a1dbeed	render lidx starting with 0 (#5478 ) * render lidx starting with 0 changed from ``` int gidx0 = gid.x; /* 4096 / int lidx4 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx5 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx6 = lid.z; / 2 / ``` to ``` int gidx0 = gid.x; / 4096 / int lidx0 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx1 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx2 = lid.z; / 2 / ``` the existing one started from pre-limited global dims which skip number if there are more than 3 global dims don't need start_dim --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-07-14 16:34:04 -04:00
qazal	671779f280	limit process replay diff to ~20% of kernels (#5480 ) * render lidx starting with 0 changed from ``` int gidx0 = gid.x; /* 4096 / int lidx4 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx5 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx6 = lid.z; / 2 / ``` to ``` int gidx0 = gid.x; / 4096 / int lidx0 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx1 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx2 = lid.z; / 2 / ``` the existing one started from pre-limited global dims which skip number if there are more than 3 global dims don't need start_dim * add changed * env var * more early exit * simpler? * Revert "Merge branch 'lidx0' into process_replay_limit" This reverts commit `cbadcfa5e9`, reversing changes made to `fc9bf37ee7`. * minor cleanup --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-14 23:10:08 +03:00
chenyu	f8a47608cc	test dtype.min and dtype.max (#5479 ) compared with np.iinfo for integer dtype	2024-07-14 15:31:37 -04:00
George Hotz	a9f5a764dc	make BatchNorm work for 2D and 3D (#5477 ) * make BatchNorm work for 2D and 3D * beautiful mnist shouldn't use BatchNorm2d	2024-07-14 11:39:58 -07:00
chenyu	e41ab66653	use is to compare types (#5476 ) new rule in latest ruff	2024-07-14 14:26:41 -04:00
George Hotz	aade18d20c	beautiful_mnist in torch	2024-07-14 11:09:58 -07:00
nimlgen	604fb60143	docs: fix link to jit in env_vars (#5474 )	2024-07-14 16:08:16 +03:00
nimlgen	61822d1a14	nv fix timeline signal rollover on copy queue (#5473 ) * hotfix: nv rollover to 32bits * test both queues	2024-07-14 16:06:12 +03:00
nimlgen	8835d6c49a	cleanup nv/amd program (#5449 ) * cleanup nv/amd program * fix amd * a bit cleaner * ugh, typo * linter * fix nv * tiny thing	2024-07-14 14:08:35 +03:00
qazal	0b3a34e3b1	vectorize folding [run_process_replay] (#5470 ) * test_gep_vec_fold * remove that * fix process replay * lint	2024-07-14 09:41:48 +03:00
George Hotz	cdf63e41bf	mnist mlx example uses compile to be fair to tinyjit	2024-07-13 18:14:45 -07:00
George Hotz	8940530290	add mlx beautiful_mnist example	2024-07-13 17:55:47 -07:00
chenyu	28972418c4	s/get_linearizer/get_kernel [run_process_replay] (#5467 )	2024-07-13 20:32:22 -04:00
Francis Lata	0345577032	UNet3D dataloader shared memory fix (#5465 ) * create separate SharedMemory between inputs and labels * update path check for shared mem * clean up unit test for dataset	2024-07-13 20:26:00 -04:00
Carson Powers	ef578b4de8	new UOp style patterns [run_process_replay] (#5444 ) * express permute srcs in uop * loop folding / sum collapse pats -> uop style * UNMUL, const, phi on DEFINE_ACC pats -> uop style * fix: cvar not const * DEFINE_ACC w/o inputs, VECTORIZE-PHI-GEP pats -> uop style * fix VECTORIZE-PHI-GEP pat * contractor, reducer, float4 pats -> uop style * arange folding .where * one more * revert permute expression in UOp	2024-07-13 17:21:08 -07:00
George Hotz	942c58be90	BEAM_COMPARE=2 validates the correctness of BEAM kernels (#5458 ) * beam compare 2 * found issue maybe * correct, not fail * full rand * less numpy * extra simplify doesn't fix it * reorder * no numpy * check in reverse * test new tensor behavior * better error msg	2024-07-13 13:53:43 -07:00
nimlgen	6943ea5f29	nv remove copy_from_cpu command (#5459 )	2024-07-13 23:08:49 +03:00
nimlgen	67f70cef02	amd better allocation error messages (#5462 ) * amd better allocation error messages * a bit better	2024-07-13 22:55:09 +03:00
wozeparrot	2427f149a3	threefry as pattern matcher (#5371 )	2024-07-13 11:59:03 -07:00
qazal	487ceff825	hotfix: ASSERT_PROCESS_REPLAY sometimes doesn't exist (#5456 )	2024-07-13 21:15:40 +03:00
chenyu	de6ab56458	clean up transcend math with uop syntactic sugar [run_process_replay] (#5455 ) * clean up transcend math with uop syntactic sugar [run_process_replay] * that? * maybe	2024-07-13 14:00:14 -04:00
qazal	40ec9410f9	simpler process replay (#5452 ) * remove check_process_replay * that can go to the top * add assert back * [run_process_replay] * checkout code [run_process_replay] * temp [run_process_replay] * revert temp [run_process_replay] * ahh this is why [run_process_replay] * revert temp [run_process_replay]	2024-07-13 19:55:06 +03:00
chenyu	d2933d3548	simplify transcend math [run_process_replay] (#5454 ) there are some (x - x) in dfadd2_f2_f2_f2, dfmul2_f2_f2_f2, dfdiv2_f2_f2_f2 that were removed by pattern matcher	2024-07-13 12:43:31 -04:00
qazal	23b907efbb	restore process replay runs by their id (#5453 )	2024-07-13 19:32:34 +03:00
qazal	b8c9298164	verify_lazyop in for WMMA and group_for_reduces (#5448 ) * try passing no tc and group for reduces * minor * use op.arg * group_for_reduces	2024-07-13 18:06:19 +03:00
George Hotz	955e1179fb	move compile tests and merge (#5451 ) * move compile tests and merge * revert enet move, bump download cache * oh, try setting clang	2024-07-13 08:04:46 -07:00
George Hotz	e638b0084f	smaller multitensor resnet test (#5450 ) * minor improvments to matcher speed [run_process_replay] * oh, put that back * make fake images smaller for resnet test	2024-07-13 07:31:28 -07:00
Simone Margaritelli	03c3b14cc2	docs: addded JIT description to dos/env_vars.md (#5445 ) * docs: addded JIT description to dos/env_vars.md * docs: rephrased JIT=2 in env_vars.md	2024-07-13 07:07:11 -07:00
qazal	bb1a9ebf78	run process replay in parallel (#5443 )	2024-07-13 11:29:36 +03:00
chenyu	3ebf569f04	relax fuzz transend math threshold a bit (#5442 ) * relax fuzz transend math threshold a bit * fuzz more * fuzz 50k	2024-07-13 03:31:21 -04:00
chenyu	e398734890	fuzz test transcend math (#5383 ) * fuzz test transcend math found something wrong with float64 sin reduction ``` from tinygrad import Tensor, dtypes import numpy as np print(Tensor([39800.0], dtype=dtypes.float64).sin().numpy()) print(Tensor([39800.0], dtype=dtypes.float32).sin().numpy()) print(Tensor([39800.0], dtype=dtypes.float16).sin().numpy()) print(np.sin(np.array([39800.0], dtype=np.float64))) print(np.sin(np.array([39800.0], dtype=np.float32))) print(np.sin(np.array([39800.0], dtype=np.float16))) ``` ``` CLANG=1 python test.py [0.92785633] [0.7428573] [-0.7705] [0.74285722] [0.7428572] [-0.7705] ``` * fix test * abs * skip	2024-07-13 01:54:52 -04:00
hikettei	3a7262d923	[Patch] Fixed an invaild value of fp64 xlog(DBL_MIN) (#5441 ) * [Patch] Removed weird NaN Handling in xlog2 resulting in different output around 1e-203 * Patch: compare the value of xlog(x) using y, allowing x <= 1e-200 * mypy * fuzzer tests for log2 * fix tests: use approximate dbl_min, fp64 fails at nv * update: gradually increment the scale (if y is not inf)	2024-07-13 01:11:53 -04:00
wozeparrot	90f0e2fc49	db in wal mode (#5388 )	2024-07-12 20:43:36 -07:00
George Hotz	414aa6ee98	minor improvments to matcher speed [run_process_replay] (#5439 ) * minor improvments to matcher speed [run_process_replay] * oh, put that back	2024-07-12 20:41:41 -07:00

1 2 3 4 5 ...

5109 Commits