tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-28 16:27:59 -05:00

Author	SHA1	Message	Date
chenyu	51432bfbff	add rand_like test case with device specified (#7663 ) in single device or copied multi case, device is applied. but for sharded case the device is silently ignored now. maybe similar to rand we just don't allow tuple device in rand_like	2024-11-13 09:32:55 -05:00
Reza Rezvan	23363dee55	Add: failing tests for uint8 `min()` (#7669 ) * add failing tests for uint8 `min()` * mark as expected failure	2024-11-13 22:12:53 +08:00
qazal	29508504ea	uop style prefer small dtype + cleanups [pr] (#7671 ) * just this * space * typing 2	2024-11-13 21:32:34 +08:00
qazal	e84d089ef1	delete ReduceOps, only use REDUCE_AXIS (#7667 )	2024-11-13 19:04:27 +08:00
qazal	217c006103	buffer access on UOp [pr] (#7665 ) * add .buffer access on uop * rename to buf_uop * start smaller * ptr != buffer!!	2024-11-13 17:04:19 +08:00
qazal	5da149d23c	uop can have base [pr] (#7666 )	2024-11-13 16:53:49 +08:00
qazal	ca99c67d78	refactors from the delete lazy diff [pr] (#7664 ) * dedup parent shapetrackers [pr] * arg -> dtype * move to ops * arg	2024-11-13 16:23:53 +08:00
chenyu	e6cfaaa496	metal benchmark JIT=2 -> JIT=1 (#7661 )	2024-11-12 22:55:27 -05:00
chenyu	4c5f7ddf1f	flux set model path in args (#7660 ) in addition to default downloading through fetch, add an arg to pass model path directly	2024-11-12 22:11:40 -05:00
chenyu	08706c2ea4	more readable rand [pr] (#7659 ) no walrus inside walrus	2024-11-12 19:02:27 -05:00
chenyu	1884f021e3	add conv3x3 to speed_v_theoretical (#7658 ) * add conv3x3 to speed_v_theoretical * show test duration	2024-11-12 16:41:56 -05:00
ignaciosica	54c0abcb2b	cleaner code_for_op order [pr] (#7653 ) * cleaner code_for_op order * mantain unary-bin-tern order * might as well reorder for cuda and amd	2024-11-12 15:13:56 -05:00
chenyu	962dafb467	use randn in speed_v_theoretical instead of rand (#7656 ) * use randn in speed_v_theoretical instead of rand this made green gemv 20% faster... but why? * update threshold	2024-11-12 15:00:32 -05:00
chenyu	397a2e6eb6	no special case for int32 in truncate [pr] (#7657 ) this masked an issue that idx is not data, and should never need truncate	2024-11-12 14:52:14 -05:00
chenyu	6159790ab8	add gemv to speed_v_theoretical (#7654 ) * add gemv to speed_v_theoretical getting ~300GB/s if we just count the memory of inputs and output * better green numbers * flip	2024-11-12 11:19:35 -05:00
qazal	e07d2d0966	skip TestBeamSearch.test_large_ast (#7652 )	2024-11-12 20:52:22 +08:00
qazal	0f02573830	save lines in assign tracking [pr] (#7651 )	2024-11-12 20:49:13 +08:00
qazal	fbad4900bf	move groups to uop [pr] (#7640 ) * override group post chase [pr] * key reduceop on ubuf * fix type	2024-11-12 20:09:13 +08:00
George Hotz	4f1f823021	add tiny test for randomness + remove ulong buffers (#7648 ) * add tiny test for randomness * Tensor._device_seeds is a Tuple * no tuple, just a 2 element tensor * no more longs * fix tests, and maybe ocelot works now * NV still doesn't work. cleanup rules * test + two more rules	2024-11-12 12:45:52 +08:00
chenyu	c06a5a9c72	Tensor.linspace raises for dtype.bool (#7649 ) also fixed an assert when passing str dtype to randint	2024-11-11 23:05:14 -05:00
geohotstan	5eef59d732	add Tensor.linspace (#7609 ) * add linspace * shave off tests and forgot to add to docs crap * WHOOPS * better tests	2024-11-12 10:29:36 +08:00
chenyu	99f29e50b2	update speed_v_theoretical numbers (#7647 ) better amd after set compute profile	2024-11-11 20:05:13 -05:00
chenyu	035e39f900	remove copied is_dtype_supported from onnx [pr] (#7646 )	2024-11-11 19:20:32 -05:00
Ahmed Harmouche	9c63c3d8ab	These casts should only happen if these are supported (#7644 )	2024-11-12 07:56:50 +08:00
chenyu	a88a15c7e8	setup perflevel in red CI (#7645 ) runs v4.1 bert setup. ``` rocm-smi --setprofile compute rocm-smi --setmclk 3 rocm-smi --setperflevel high ```	2024-11-11 18:44:55 -05:00
chenyu	773d5b60bf	beam benchmark tests (#7638 ) * beam benchmark tests * lower AMD number somehow * less flaky	2024-11-11 18:11:18 -05:00
chenyu	bfab03288d	fix HALF=1 in test_speed_v_torch (#7642 ) * fix HALF=1 in test_speed_v_torch "operation cache defeats" adds 1 to all arg, which were centered around 0. adding 1 makes big matmul and matvec go inf. fixed by subtract 1 after and bumpped tolerance for half input * bigger tol for BIG=2, update CI too * bigger tol	2024-11-11 14:29:37 -05:00
nimlgen	4d81b7952a	qcom match texture/sampler descriptors to OpenCL (#7622 ) * qcom ioctl compare more regs * bug fix	2024-11-11 21:56:51 +03:00
qazal	0b66a0d688	only lookup buf_uops in fuse.py [pr] (#7641 )	2024-11-11 19:14:30 +02:00
qazal	08b9f055f2	don't need outputs in fuse.py [pr] (#7639 )	2024-11-11 18:35:31 +02:00
George Hotz	b4cb6b89f9	hotfix: CI mac uses python 3.11	2024-11-11 23:42:35 +08:00
George Hotz	9648372ee6	hotfix: mac uses python 3.12	2024-11-11 23:23:48 +08:00
George Hotz	aaa8059aec	python 3.10 is minimum [pr] (#7636 )	2024-11-11 23:05:50 +08:00
Kinvert	6a0ed46b1c	adding viz to env_vars docs (#7630 )	2024-11-11 21:28:27 +08:00
George Hotz	d40673505f	new cloud is cloudy [pr] (#7631 ) * new cloud is cloudy [pr] * waste lines to add security * safety, with speed and less lines * timing and del * lines * cleanups * restore CloudSession * bump to 3.10 * quotes * renderer security	2024-11-11 20:18:04 +08:00
qazal	766a680588	swizzle parents with graph rewrite (#7625 ) * delete st_fixup * refactor * minimal diff	2024-11-11 16:50:38 +08:00
qazal	fec977b966	calling view on graph edges is fine [pr] (#7632 )	2024-11-11 16:35:18 +08:00
George Hotz	bbc64bf305	x\|(x&y) -> x (#7629 ) * x\|(x&y) -> x * fix tests	2024-11-11 10:00:18 +08:00
uuuvn	94a484542b	Hook memoryview via class instead of a function (#7627 )	2024-11-11 09:07:06 +08:00
qazal	a8da84cce0	recursive swizzle with just graph_rewrite [pr] (#7626 )	2024-11-10 20:14:21 +02:00
qazal	7275cfb9d8	cleanup swizzle upats [pr] (#7624 )	2024-11-10 17:05:27 +02:00
qazal	092a441748	test swizzle post permute (#7623 ) * test swizzle post permute * add st_fixup assert	2024-11-10 16:18:22 +02:00
George Hotz	745316493c	hotfix: add test_simple_conv2d_bias	2024-11-10 18:36:42 +08:00
George Hotz	44c1fd5661	add optional llvm opt [pr] (#7619 )	2024-11-10 13:26:49 +08:00
George Hotz	0a411b4f68	replace llvm with new llvm (#7616 ) * replace llvm with new llvm * fix test_linearizer * minor fixups * fix alloca * don't use alloca * fix DEFINE_ACC * lines * comments and lines * a little tighter	2024-11-10 11:28:52 +08:00
qazal	b61266eb97	late fusion spec for big graph [pr] (#7613 )	2024-11-09 23:43:11 +08:00
qazal	9d6b03d691	early assert swizzle in kernel [pr] (#7610 ) * early assert swizzle in kernel [pr] * better * note changes * TestIndexing 2	2024-11-09 21:54:43 +08:00
chenyu	8ca422e21a	script to compare kernel opt with BEAM (#7604 ) intersting that on m1 max hcopt wins BEAM 2 about 20% of the time	2024-11-08 17:40:28 -05:00
chenyu	573f145dcf	METAL raise RuntimeError with no compiler and bad src (#7603 ) fixed BEAM if src is invalid on METAL. it currently only accept RuntimeError in `_time_program`	2024-11-08 17:09:12 -05:00
chenyu	74b4d1c1e1	rewrite idx again in real_strides after uop_given_valid (#7600 ) uop_given_valid does not guarantee output to be flat. fixed one last real_strides test.	2024-11-08 14:30:32 -05:00

... 76 77 78 79 80 ...

10633 Commits