tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-01 18:25:04 -05:00

Author	SHA1	Message	Date
George Hotz	cdf63e41bf	mnist mlx example uses compile to be fair to tinyjit	2024-07-13 18:14:45 -07:00
George Hotz	8940530290	add mlx beautiful_mnist example	2024-07-13 17:55:47 -07:00
chenyu	28972418c4	s/get_linearizer/get_kernel [run_process_replay] (#5467 )	2024-07-13 20:32:22 -04:00
Francis Lata	0345577032	UNet3D dataloader shared memory fix (#5465 ) * create separate SharedMemory between inputs and labels * update path check for shared mem * clean up unit test for dataset	2024-07-13 20:26:00 -04:00
Carson Powers	ef578b4de8	new UOp style patterns [run_process_replay] (#5444 ) * express permute srcs in uop * loop folding / sum collapse pats -> uop style * UNMUL, const, phi on DEFINE_ACC pats -> uop style * fix: cvar not const * DEFINE_ACC w/o inputs, VECTORIZE-PHI-GEP pats -> uop style * fix VECTORIZE-PHI-GEP pat * contractor, reducer, float4 pats -> uop style * arange folding .where * one more * revert permute expression in UOp	2024-07-13 17:21:08 -07:00
George Hotz	942c58be90	BEAM_COMPARE=2 validates the correctness of BEAM kernels (#5458 ) * beam compare 2 * found issue maybe * correct, not fail * full rand * less numpy * extra simplify doesn't fix it * reorder * no numpy * check in reverse * test new tensor behavior * better error msg	2024-07-13 13:53:43 -07:00
nimlgen	6943ea5f29	nv remove copy_from_cpu command (#5459 )	2024-07-13 23:08:49 +03:00
nimlgen	67f70cef02	amd better allocation error messages (#5462 ) * amd better allocation error messages * a bit better	2024-07-13 22:55:09 +03:00
wozeparrot	2427f149a3	threefry as pattern matcher (#5371 )	2024-07-13 11:59:03 -07:00
qazal	487ceff825	hotfix: ASSERT_PROCESS_REPLAY sometimes doesn't exist (#5456 )	2024-07-13 21:15:40 +03:00
chenyu	de6ab56458	clean up transcend math with uop syntactic sugar [run_process_replay] (#5455 ) * clean up transcend math with uop syntactic sugar [run_process_replay] * that? * maybe	2024-07-13 14:00:14 -04:00
qazal	40ec9410f9	simpler process replay (#5452 ) * remove check_process_replay * that can go to the top * add assert back * [run_process_replay] * checkout code [run_process_replay] * temp [run_process_replay] * revert temp [run_process_replay] * ahh this is why [run_process_replay] * revert temp [run_process_replay]	2024-07-13 19:55:06 +03:00
chenyu	d2933d3548	simplify transcend math [run_process_replay] (#5454 ) there are some (x - x) in dfadd2_f2_f2_f2, dfmul2_f2_f2_f2, dfdiv2_f2_f2_f2 that were removed by pattern matcher	2024-07-13 12:43:31 -04:00
qazal	23b907efbb	restore process replay runs by their id (#5453 )	2024-07-13 19:32:34 +03:00
qazal	b8c9298164	verify_lazyop in for WMMA and group_for_reduces (#5448 ) * try passing no tc and group for reduces * minor * use op.arg * group_for_reduces	2024-07-13 18:06:19 +03:00
George Hotz	955e1179fb	move compile tests and merge (#5451 ) * move compile tests and merge * revert enet move, bump download cache * oh, try setting clang	2024-07-13 08:04:46 -07:00
George Hotz	e638b0084f	smaller multitensor resnet test (#5450 ) * minor improvments to matcher speed [run_process_replay] * oh, put that back * make fake images smaller for resnet test	2024-07-13 07:31:28 -07:00
Simone Margaritelli	03c3b14cc2	docs: addded JIT description to dos/env_vars.md (#5445 ) * docs: addded JIT description to dos/env_vars.md * docs: rephrased JIT=2 in env_vars.md	2024-07-13 07:07:11 -07:00
qazal	bb1a9ebf78	run process replay in parallel (#5443 )	2024-07-13 11:29:36 +03:00
chenyu	3ebf569f04	relax fuzz transend math threshold a bit (#5442 ) * relax fuzz transend math threshold a bit * fuzz more * fuzz 50k	2024-07-13 03:31:21 -04:00
chenyu	e398734890	fuzz test transcend math (#5383 ) * fuzz test transcend math found something wrong with float64 sin reduction ``` from tinygrad import Tensor, dtypes import numpy as np print(Tensor([39800.0], dtype=dtypes.float64).sin().numpy()) print(Tensor([39800.0], dtype=dtypes.float32).sin().numpy()) print(Tensor([39800.0], dtype=dtypes.float16).sin().numpy()) print(np.sin(np.array([39800.0], dtype=np.float64))) print(np.sin(np.array([39800.0], dtype=np.float32))) print(np.sin(np.array([39800.0], dtype=np.float16))) ``` ``` CLANG=1 python test.py [0.92785633] [0.7428573] [-0.7705] [0.74285722] [0.7428572] [-0.7705] ``` * fix test * abs * skip	2024-07-13 01:54:52 -04:00
hikettei	3a7262d923	[Patch] Fixed an invaild value of fp64 xlog(DBL_MIN) (#5441 ) * [Patch] Removed weird NaN Handling in xlog2 resulting in different output around 1e-203 * Patch: compare the value of xlog(x) using y, allowing x <= 1e-200 * mypy * fuzzer tests for log2 * fix tests: use approximate dbl_min, fp64 fails at nv * update: gradually increment the scale (if y is not inf)	2024-07-13 01:11:53 -04:00
wozeparrot	90f0e2fc49	db in wal mode (#5388 )	2024-07-12 20:43:36 -07:00
George Hotz	414aa6ee98	minor improvments to matcher speed [run_process_replay] (#5439 ) * minor improvments to matcher speed [run_process_replay] * oh, put that back	2024-07-12 20:41:41 -07:00
chenyu	4df63da190	clean up rest of the loadop [run_process_replay] (#5440 ) to metaop and filter_sink	2024-07-12 23:38:51 -04:00
hikettei	0795139f30	Fix TRANSCENDENTAL=2 fp64 sin (#5385 ) * fixes on transcendental: fix for fp64 payne hanek, refactor for fp16 sin * revert the changes on test * refactor on xsin: removed cody_waite_reduction, always use payne_hanek * Revert "refactor on xsin: removed cody_waite_reduction, always use payne_hanek" This reverts commit `2fd401f251`. * still need cody_waite_reduction for the very smaller range * test: added a regression test for transcendental sin * test: found the worse case ulp 3.5 only in numpy * give the input as a valid dtype --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-12 23:15:04 -04:00
George Hotz	fb3011ac61	improve matcher speed [run_process_replay] (#5438 ) * improve matcher speed [run_process_replay] * don't use arg set in ptx	2024-07-12 20:02:19 -07:00
George Hotz	03c2dc8bd7	lowerer is kernel [run_process_replay] (#5437 )	2024-07-12 18:50:55 -07:00
George Hotz	b8342fb085	independent lowerer [run_process_replay] (#5434 ) * independent lowerer [run_process_replay] * don't relinearize PTX * fix ptx * Revert "fix ptx" This reverts commit `f4e8e059c0`. * Revert "don't relinearize PTX" This reverts commit `f6c12c506c`. * parents is fine, no need for linearization * remove loop local idxs * recover stupid loop_idxs	2024-07-12 18:08:43 -07:00
chenyu	9a187e6102	fix handcode_opt script (#5435 ) * fix handcode_opt script * run in ci * real run in ci * HALF=0	2024-07-12 20:52:28 -04:00
wozeparrot	b80fd7d23c	allow benchmarking forward only (#5436 )	2024-07-12 17:37:49 -07:00
chenyu	00813a92a0	update Tensor.eye api to match torch (#5433 ) * update Tensor.eye api to match torch input is n for nrows and optional m for ncols * space * fix onnx	2024-07-12 20:25:12 -04:00
George Hotz	cddfd8e25d	bugfix: group for reduce should check all dimensions (#5431 )	2024-07-12 17:02:40 -07:00
George Hotz	fbaf040baf	compute full_shape from LazyOp [run_process_replay] (#5429 ) * compute full_shape from LazyOp * put KernelInfo in the sink * wrong but pass	2024-07-12 16:47:08 -07:00
George Hotz	870dc8c350	s/Linearizer/Lowerer [run_process_replay] (#5428 )	2024-07-12 15:54:07 -07:00
George Hotz	6707c778d0	scheduleitem is not Tuple [run_process_replay] (#5425 ) * scheduleitem is not Tuple [run_process_replay] * fix tests * fix op + fuzzers * fix mop test	2024-07-12 15:13:19 -07:00
chenyu	4cd1de038a	smaller reshape_and_permute arg in shift_to (#5426 ) adding tuples directly [run_process_replay]	2024-07-12 17:46:48 -04:00
George Hotz	94599c0637	fixup ast in kernel to be MetaOps.SINK [run_process_replay] (#5424 ) * fixup ast in kernel to be MetaOps.SINK [run_process_replay] * fix tests * fix more tests	2024-07-12 14:01:03 -07:00
George Hotz	b055ece550	hotfix: bump to cache gpuocelot	2024-07-12 13:54:14 -07:00
chenyu	d37056f3b1	pass Renderer.global_max / local_max into get_grouped_dims (#5423 ) [run_process_replay]	2024-07-12 16:49:27 -04:00
George Hotz	4aefb1595d	MetaOps.SINK [run_process_replay] (#5422 ) * s/loadops/metaops [run_process_replay] * add metaops.sink [run_process_replay]	2024-07-12 13:37:30 -07:00
George Hotz	f6ef283e6a	s/loadops/metaops [run_process_replay] (#5421 )	2024-07-12 13:26:50 -07:00
nimlgen	f4944ced09	tiny amd cleanups (#5420 )	2024-07-12 22:54:42 +03:00
chenyu	b17e4adb3a	add `-c advice.detachedHead=false` to process replay git checkout (#5419 ) remove the noisy `Note: switching to 'origin/master'. You are in 'detached HEAD' state. You can look around, make experimental changes...` in log	2024-07-12 15:13:26 -04:00
wozeparrot	d1cbd6bb95	unity handcode_resnet_opt and handcode_bert_opt (#5418 )	2024-07-12 12:05:01 -07:00
chenyu	a0dbe20dbd	skip some redundant and slow tests in ci (#5416 )	2024-07-12 14:43:13 -04:00
chenyu	76125c07be	make some grouped_dim test work (#5415 ) next need to support max size per dim, splitting and correct way to do reverse or arbitrary permute global dims	2024-07-12 14:22:50 -04:00
wozeparrot	b7cc75a9df	usage summary in handcode opt (#5414 )	2024-07-12 11:21:18 -07:00
uuuvn	3cb94a0a15	Rename tinygrad/runtime/driver to support (#5413 )	2024-07-12 11:06:42 -07:00
nimlgen	6604d2b2c3	amd/nv respect visible devs (#5409 ) * nv/amd respect visible devices * linter * sort amd gpus * env docs	2024-07-12 20:02:12 +03:00

... 110 111 112 113 114 ...

10633 Commits