tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 22:38:16 -05:00

Author	SHA1	Message	Date
George Hotz	d1f6701eb7	hotfix: lower amd threshold + improve block reorder test	2025-04-22 20:44:29 +01:00
nimlgen	db51133537	rename HWInterface -> FileIOInterface (#9989 ) * rename HWInterface -> FileIOInterface * ugh	2025-04-22 22:18:57 +03:00
George Hotz	c1539b0319	putting add first orders loads as expected (#9991 )	2025-04-22 20:12:05 +01:00
nimlgen	bd580d8ea4	hcq: use mmio interface in nv (#9986 ) * hcq: start mmio interface * allow double cast * revert * faster? * simpler, not needed more now * dd * types * fix	2025-04-22 21:58:12 +03:00
George Hotz	feee6986c9	faster block reorder (#9990 ) * faster block reorder [pr] * that shouldn't change order * key just in sorted * ind	2025-04-22 19:18:57 +01:00
qazal	6cb2d18c03	refactor schedule linearize to defaultdict [pr] (#9984 ) * refactor schedule linearize to defaultdict [pr] * skip that * don't need .get	2025-04-23 00:00:23 +08:00
chenyu	9e5e371999	make DISABLE_COMPILER_CACHE a ContextVar [pr] (#9983 )	2025-04-22 10:32:54 -04:00
qazal	bbc324f5dc	remove CAST_AFTER_EXPAND (#9980 )	2025-04-22 21:06:11 +08:00
George Hotz	c519b553db	non recursive toposort is 2x+ faster (#9979 ) * non recursive toposort is 2x+ faster * don't change the order	2025-04-22 13:59:38 +01:00
qazal	0d9014d021	place create_ast last, type_verify in the end (once) [pr] (#9977 )	2025-04-22 20:15:23 +08:00
chenyu	fb89d9a584	retinanet eval combine output on GPUS[0] (#9966 ) eval 35 sec -> 20 sec. it was spending 13 seconds assembling output tensor on CPU backend. GPUS[0] seems to have enough memory, otherwise we can lower EVAL_BS	2025-04-22 07:43:51 -04:00
qazal	7b55846e08	prep STORE UOp creation for multi output [pr] (#9975 ) * prep STORE UOp creation for multi output [pr] * test_multioutput_ast	2025-04-22 19:34:52 +08:00
George Hotz	e358e0a0c6	move metadata set to tensor [pr] (#9976 ) * move metadata set to tensor [pr] * only track that in tensor.py	2025-04-22 12:30:35 +01:00
qazal	f6271515fe	refactor UOp.st [pr] (#9973 )	2025-04-22 18:46:56 +08:00
George Hotz	f5dc70c624	microbenchmarks + micro speed ups (#9972 ) * microbenchmarks * forgot the ubenchs * clean up type verify	2025-04-22 11:30:46 +01:00
qazal	1cf4e24ca5	fix kernelize usage with pm_gradient (#9953 ) * fix kernelize usage with pm_gradient * remove that	2025-04-22 17:26:05 +08:00
deftdawg	32bbff942c	amd: add nbio 7.2.0 for some rdna2 (#9964 ) * - Updated of #9700 which fixes #9665 but for the Steam Deck which was erroring on NBIO 7.2.0 * unrelated change --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-04-22 12:10:48 +03:00
Ignacio Sica	0e79aee706	use_tensor_cores bugfix (#9969 )	2025-04-21 22:58:17 -03:00
chenyu	5294c32279	dev scripts for retinanet (#9968 ) also BASE_DIR -> BASEDIR for consistency, and move wandb up a bit for more accurate timing	2025-04-21 17:54:56 -04:00
nimlgen	4340197132	am: download fw from web (#9956 ) * am: download fw from web * tested * link works * default to web * this is default * not used	2025-04-21 23:26:33 +03:00
nimlgen	7244ca863c	am: fix double read of sdma fw (#9965 )	2025-04-21 23:04:34 +03:00
uuuvn	b35f94b6ec	Don't hardcode default CLOUDDEV (#9935 )	2025-04-21 18:46:55 +01:00
Francis Lata	defa1e77f6	get the proper dataset count (#9962 )	2025-04-21 12:11:37 -04:00
qazal	36ed3c3253	fix kernelize with VIEW children (#9961 )	2025-04-21 23:38:46 +08:00
uuuvn	757533cbe6	Less verbose cloud multiprocessing start (#9960 ) The set name before starting part used to be required for #9935 when CLOUDDEV was a global variable, now just readability improvement	2025-04-21 16:19:54 +01:00
Francis Lata	d7e247f329	RetinaNet INITMLPERF support (#9950 ) * fixes to make fake data work * fix eval beam * fix merge issue	2025-04-21 10:32:05 -04:00
kamilisjon	014f870733	rm (#9959 ) Co-authored-by: KamilisJonkus <kamilis.jonkus@agmis.com>	2025-04-21 15:23:45 +01:00
chenyu	f68c7041c4	doc fix is_floating_point dtype.float -> dtypes.float (#9958 )	2025-04-21 09:23:59 -04:00
akhuntsaria	2d423e6737	fix assertion message for supported device in export_model (#9957 )	2025-04-21 09:23:44 -04:00
ttomsa	783a191925	rm mul from _masked_setitem (#9951 )	2025-04-21 06:41:50 -04:00
nimlgen	46469f00a2	am: tiny changes in psp load (#9952 )	2025-04-21 11:52:02 +03:00
qazal	0bee225a58	Tensor.kernelize docs (#9946 ) * Tensor.kernelize docs * syntax * test_kernelize_bw * Tensor.kernelize docstring * pruning * tiny details * details 2 * becomes_map terminology * more changes to becomes	2025-04-21 16:34:03 +08:00
Francis Lata	ea4cb2c715	small cleanups (#9947 )	2025-04-20 20:33:20 -04:00
qazal	e8910540f6	Kernelize can be called multiple times on a Tensor (#9949 ) * Kernelize can be called multiple times on a Tensor * add (failing) test_kernelize_bw	2025-04-21 06:28:47 +08:00
qazal	1d90be2cff	match kernelize API in process replay (#9948 )	2025-04-21 05:23:41 +08:00
qazal	343a5eb588	dedup assigns in grouper VIZ name function [pr] (#9942 )	2025-04-20 21:42:25 +08:00
qazal	e20ef7196a	Tensor.kernelize (#9845 ) * add kernelize * remove that * kernelize returns self * update abstractions2.py * kernelize in test_schedule * temp: assert BUFFER_VIEW's existence * ASSIGN must have a buffer or subbuffer target * assert and shrink * fix * padded setitem * var * toposort once * extra * base_buffer * end with BUFFER_VIEW * setitem for disk * test_setitem_becomes_subbuffer * mul slice test * torch backend fix 1 * non-deterministic * keep subbuffer	2025-04-20 20:53:49 +08:00
qazal	dd16087f62	fold double ASSIGN to same target (#9941 )	2025-04-20 19:06:38 +08:00
qazal	9a9aba4cd5	setitem tests (some failing) from kernelize (#9940 )	2025-04-20 18:47:55 +08:00
chenyu	6c30948df6	hand_coded_optimizations returns list[Opt] [pr] (#9938 ) new api looks like `k.apply_opts(hand_coded_optimizations(k))`	2025-04-19 20:26:59 -04:00
chenyu	720f20865b	remove required_optimizations (#9848 )	2025-04-19 16:51:16 -04:00
qazal	218e01833d	update scheduler section for abstractions2.py [pr] (#9927 )	2025-04-19 12:09:14 +03:00
chenyu	3fdba48fc7	update bert green and README (#9934 ) submission candidate	2025-04-18 21:21:28 -04:00
George Hotz	b359125ebf	rewrite the linearizer (#9885 ) * random speedups [pr] * speeding up linearizer * test_gemm passes * progress * test_gemm passes * working * simpler * blockstart unneeded * simpler * bugfix * work * don't compare * faster * progress * cleanups * work * cleanups * working * reorder * name is dumb * fix tests * lin2 works * clean ctx * mostly bottom up * passes * same speed now * new lin is faster * dedup * lines and tuples * track that * lin * revert that * tests should pass * merge siblings * cleaner expression * only lin2 * finally, some speed * simpler * fix unmergables with blockends	2025-04-18 22:35:40 +01:00
Ignacio Sica	023b1c28a2	`test_tensor_cores_padded` refactor (#9724 ) * set pad t 3 for amd padded tc test * change pad for amd regardless CI * test tc padded uops and correctness separately * add test_tensor_cores_padded_uops test to ci * remove redundant chack for amd device * cleanup	2025-04-18 17:05:54 -03:00
Ignacio Sica	afff82ba0f	fix `ptx` linearizer bug [pr] (#9926 ) * fix ptx bug * align 16 * revert align because it breaks pr * smallest diff that fixes ptx bug	2025-04-18 13:48:43 -03:00
chenyu	617b45748f	fuse embedding for bert on red (#9925 ) also updated BEAM param and use AMD driver for actual run. 535ms step	2025-04-18 07:20:25 -04:00
qazal	b58decac0c	fix diamond assigns before mapping tensors UOps to assigns (#9855 ) * keep tensor_map until diamond assign fixup * ctx	2025-04-18 14:17:43 +03:00
qazal	a37d921917	get name from SINK in process replay (#9924 ) * get name from SINK in process replay * space	2025-04-18 13:51:11 +03:00
George Hotz	aa98aff4cd	don't use ops name, just keep sink (#9922 ) * don't use ops name, just keep sink * fix test * endif sink	2025-04-18 08:59:18 +01:00

... 41 42 43 44 45 ...

10633 Commits