tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-19 02:44:40 -05:00

Author	SHA1	Message	Date
qazal	217c006103	buffer access on UOp [pr] (#7665 ) * add .buffer access on uop * rename to buf_uop * start smaller * ptr != buffer!!	2024-11-13 17:04:19 +08:00
qazal	5da149d23c	uop can have base [pr] (#7666 )	2024-11-13 16:53:49 +08:00
qazal	ca99c67d78	refactors from the delete lazy diff [pr] (#7664 ) * dedup parent shapetrackers [pr] * arg -> dtype * move to ops * arg	2024-11-13 16:23:53 +08:00
chenyu	e6cfaaa496	metal benchmark JIT=2 -> JIT=1 (#7661 )	2024-11-12 22:55:27 -05:00
chenyu	4c5f7ddf1f	flux set model path in args (#7660 ) in addition to default downloading through fetch, add an arg to pass model path directly	2024-11-12 22:11:40 -05:00
chenyu	08706c2ea4	more readable rand [pr] (#7659 ) no walrus inside walrus	2024-11-12 19:02:27 -05:00
chenyu	1884f021e3	add conv3x3 to speed_v_theoretical (#7658 ) * add conv3x3 to speed_v_theoretical * show test duration	2024-11-12 16:41:56 -05:00
ignaciosica	54c0abcb2b	cleaner code_for_op order [pr] (#7653 ) * cleaner code_for_op order * mantain unary-bin-tern order * might as well reorder for cuda and amd	2024-11-12 15:13:56 -05:00
chenyu	962dafb467	use randn in speed_v_theoretical instead of rand (#7656 ) * use randn in speed_v_theoretical instead of rand this made green gemv 20% faster... but why? * update threshold	2024-11-12 15:00:32 -05:00
chenyu	397a2e6eb6	no special case for int32 in truncate [pr] (#7657 ) this masked an issue that idx is not data, and should never need truncate	2024-11-12 14:52:14 -05:00
chenyu	6159790ab8	add gemv to speed_v_theoretical (#7654 ) * add gemv to speed_v_theoretical getting ~300GB/s if we just count the memory of inputs and output * better green numbers * flip	2024-11-12 11:19:35 -05:00
qazal	e07d2d0966	skip TestBeamSearch.test_large_ast (#7652 )	2024-11-12 20:52:22 +08:00
qazal	0f02573830	save lines in assign tracking [pr] (#7651 )	2024-11-12 20:49:13 +08:00
qazal	fbad4900bf	move groups to uop [pr] (#7640 ) * override group post chase [pr] * key reduceop on ubuf * fix type	2024-11-12 20:09:13 +08:00
George Hotz	4f1f823021	add tiny test for randomness + remove ulong buffers (#7648 ) * add tiny test for randomness * Tensor._device_seeds is a Tuple * no tuple, just a 2 element tensor * no more longs * fix tests, and maybe ocelot works now * NV still doesn't work. cleanup rules * test + two more rules	2024-11-12 12:45:52 +08:00
chenyu	c06a5a9c72	Tensor.linspace raises for dtype.bool (#7649 ) also fixed an assert when passing str dtype to randint	2024-11-11 23:05:14 -05:00
geohotstan	5eef59d732	add Tensor.linspace (#7609 ) * add linspace * shave off tests and forgot to add to docs crap * WHOOPS * better tests	2024-11-12 10:29:36 +08:00
chenyu	99f29e50b2	update speed_v_theoretical numbers (#7647 ) better amd after set compute profile	2024-11-11 20:05:13 -05:00
chenyu	035e39f900	remove copied is_dtype_supported from onnx [pr] (#7646 )	2024-11-11 19:20:32 -05:00
Ahmed Harmouche	9c63c3d8ab	These casts should only happen if these are supported (#7644 )	2024-11-12 07:56:50 +08:00
chenyu	a88a15c7e8	setup perflevel in red CI (#7645 ) runs v4.1 bert setup. ``` rocm-smi --setprofile compute rocm-smi --setmclk 3 rocm-smi --setperflevel high ```	2024-11-11 18:44:55 -05:00
chenyu	773d5b60bf	beam benchmark tests (#7638 ) * beam benchmark tests * lower AMD number somehow * less flaky	2024-11-11 18:11:18 -05:00
chenyu	bfab03288d	fix HALF=1 in test_speed_v_torch (#7642 ) * fix HALF=1 in test_speed_v_torch "operation cache defeats" adds 1 to all arg, which were centered around 0. adding 1 makes big matmul and matvec go inf. fixed by subtract 1 after and bumpped tolerance for half input * bigger tol for BIG=2, update CI too * bigger tol	2024-11-11 14:29:37 -05:00
nimlgen	4d81b7952a	qcom match texture/sampler descriptors to OpenCL (#7622 ) * qcom ioctl compare more regs * bug fix	2024-11-11 21:56:51 +03:00
qazal	0b66a0d688	only lookup buf_uops in fuse.py [pr] (#7641 )	2024-11-11 19:14:30 +02:00
qazal	08b9f055f2	don't need outputs in fuse.py [pr] (#7639 )	2024-11-11 18:35:31 +02:00
George Hotz	b4cb6b89f9	hotfix: CI mac uses python 3.11	2024-11-11 23:42:35 +08:00
George Hotz	9648372ee6	hotfix: mac uses python 3.12	2024-11-11 23:23:48 +08:00
George Hotz	aaa8059aec	python 3.10 is minimum [pr] (#7636 )	2024-11-11 23:05:50 +08:00
Kinvert	6a0ed46b1c	adding viz to env_vars docs (#7630 )	2024-11-11 21:28:27 +08:00
George Hotz	d40673505f	new cloud is cloudy [pr] (#7631 ) * new cloud is cloudy [pr] * waste lines to add security * safety, with speed and less lines * timing and del * lines * cleanups * restore CloudSession * bump to 3.10 * quotes * renderer security	2024-11-11 20:18:04 +08:00
qazal	766a680588	swizzle parents with graph rewrite (#7625 ) * delete st_fixup * refactor * minimal diff	2024-11-11 16:50:38 +08:00
qazal	fec977b966	calling view on graph edges is fine [pr] (#7632 )	2024-11-11 16:35:18 +08:00
George Hotz	bbc64bf305	x\|(x&y) -> x (#7629 ) * x\|(x&y) -> x * fix tests	2024-11-11 10:00:18 +08:00
uuuvn	94a484542b	Hook memoryview via class instead of a function (#7627 )	2024-11-11 09:07:06 +08:00
qazal	a8da84cce0	recursive swizzle with just graph_rewrite [pr] (#7626 )	2024-11-10 20:14:21 +02:00
qazal	7275cfb9d8	cleanup swizzle upats [pr] (#7624 )	2024-11-10 17:05:27 +02:00
qazal	092a441748	test swizzle post permute (#7623 ) * test swizzle post permute * add st_fixup assert	2024-11-10 16:18:22 +02:00
George Hotz	745316493c	hotfix: add test_simple_conv2d_bias	2024-11-10 18:36:42 +08:00
George Hotz	44c1fd5661	add optional llvm opt [pr] (#7619 )	2024-11-10 13:26:49 +08:00
George Hotz	0a411b4f68	replace llvm with new llvm (#7616 ) * replace llvm with new llvm * fix test_linearizer * minor fixups * fix alloca * don't use alloca * fix DEFINE_ACC * lines * comments and lines * a little tighter	2024-11-10 11:28:52 +08:00
qazal	b61266eb97	late fusion spec for big graph [pr] (#7613 )	2024-11-09 23:43:11 +08:00
qazal	9d6b03d691	early assert swizzle in kernel [pr] (#7610 ) * early assert swizzle in kernel [pr] * better * note changes * TestIndexing 2	2024-11-09 21:54:43 +08:00
chenyu	8ca422e21a	script to compare kernel opt with BEAM (#7604 ) intersting that on m1 max hcopt wins BEAM 2 about 20% of the time	2024-11-08 17:40:28 -05:00
chenyu	573f145dcf	METAL raise RuntimeError with no compiler and bad src (#7603 ) fixed BEAM if src is invalid on METAL. it currently only accept RuntimeError in `_time_program`	2024-11-08 17:09:12 -05:00
chenyu	74b4d1c1e1	rewrite idx again in real_strides after uop_given_valid (#7600 ) uop_given_valid does not guarantee output to be flat. fixed one last real_strides test.	2024-11-08 14:30:32 -05:00
chenyu	c6189e38c1	simplify_valid in real_strides (#7599 ) improved one more real_strides. after finishing the last one will think about always applying these in to_indexed_uops	2024-11-08 10:45:22 -05:00
George Hotz	d8691a4f03	lil touchups (#7597 )	2024-11-08 22:31:43 +08:00
Ahmed Harmouche	e35226e698	Remove Ops.ALU (#7595 )	2024-11-08 19:52:14 +08:00
Harald Schäfer	e7cbc29f48	openpilot benchmark: add cast from numpy to benchmark (#7593 ) * openpilot benchmark: add cast from numpy to benchmark * whitespace * comment	2024-11-08 19:31:00 +08:00

... 23 24 25 26 27 ...

7979 Commits