tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
George Hotz	cc1087d2ec	move simplify into views_to_indexed_uops (#9999 ) * move simplify into views_to_indexed_uops * cache that	2025-04-23 13:50:27 +01:00
chenyu	c39128133c	retinanet green scripts (#9996 ) also removed realize in data_get and used empty for fake data. slightly bigger lr. https://wandb.ai/chenyuxyz/MLPerf-RetinaNet/runs/8skid0e8?nw=nwuserchenyuxyz	2025-04-23 08:28:03 -04:00
George Hotz	a4a5f2d54a	faster block order [pr] (#9998 ) * faster block reorder [pr] * ahh, that's even faster	2025-04-23 13:11:30 +01:00
chenyu	61bfd23881	update mlperf-logging version (#9995 )	2025-04-22 19:32:39 -04:00
pkotzbach	dbbd755cba	FP8s truncate (#9937 ) * truncate fp8 * fix * maybe like that? * fix linters * ruff * move from extra and add ml_types to tests * minor changes * str to dtypes and nan support --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2025-04-22 19:12:49 -04:00
qazal	58180caad3	schedule linearize small cleanups [pr] (#9994 )	2025-04-23 05:42:29 +08:00
qazal	f4ec57baff	new schedule linearizer enqueues KERNEL UOps [pr] (#9993 ) * new schedule linearizer enqueues kernels [pr] * no defaultdict * diff * minor	2025-04-23 05:17:58 +08:00
George Hotz	d1f6701eb7	hotfix: lower amd threshold + improve block reorder test	2025-04-22 20:44:29 +01:00
nimlgen	db51133537	rename HWInterface -> FileIOInterface (#9989 ) * rename HWInterface -> FileIOInterface * ugh	2025-04-22 22:18:57 +03:00
George Hotz	c1539b0319	putting add first orders loads as expected (#9991 )	2025-04-22 20:12:05 +01:00
nimlgen	bd580d8ea4	hcq: use mmio interface in nv (#9986 ) * hcq: start mmio interface * allow double cast * revert * faster? * simpler, not needed more now * dd * types * fix	2025-04-22 21:58:12 +03:00
George Hotz	feee6986c9	faster block reorder (#9990 ) * faster block reorder [pr] * that shouldn't change order * key just in sorted * ind	2025-04-22 19:18:57 +01:00
qazal	6cb2d18c03	refactor schedule linearize to defaultdict [pr] (#9984 ) * refactor schedule linearize to defaultdict [pr] * skip that * don't need .get	2025-04-23 00:00:23 +08:00
chenyu	9e5e371999	make DISABLE_COMPILER_CACHE a ContextVar [pr] (#9983 )	2025-04-22 10:32:54 -04:00
qazal	bbc324f5dc	remove CAST_AFTER_EXPAND (#9980 )	2025-04-22 21:06:11 +08:00
George Hotz	c519b553db	non recursive toposort is 2x+ faster (#9979 ) * non recursive toposort is 2x+ faster * don't change the order	2025-04-22 13:59:38 +01:00
qazal	0d9014d021	place create_ast last, type_verify in the end (once) [pr] (#9977 )	2025-04-22 20:15:23 +08:00
chenyu	fb89d9a584	retinanet eval combine output on GPUS[0] (#9966 ) eval 35 sec -> 20 sec. it was spending 13 seconds assembling output tensor on CPU backend. GPUS[0] seems to have enough memory, otherwise we can lower EVAL_BS	2025-04-22 07:43:51 -04:00
qazal	7b55846e08	prep STORE UOp creation for multi output [pr] (#9975 ) * prep STORE UOp creation for multi output [pr] * test_multioutput_ast	2025-04-22 19:34:52 +08:00
George Hotz	e358e0a0c6	move metadata set to tensor [pr] (#9976 ) * move metadata set to tensor [pr] * only track that in tensor.py	2025-04-22 12:30:35 +01:00
qazal	f6271515fe	refactor UOp.st [pr] (#9973 )	2025-04-22 18:46:56 +08:00
George Hotz	f5dc70c624	microbenchmarks + micro speed ups (#9972 ) * microbenchmarks * forgot the ubenchs * clean up type verify	2025-04-22 11:30:46 +01:00
qazal	1cf4e24ca5	fix kernelize usage with pm_gradient (#9953 ) * fix kernelize usage with pm_gradient * remove that	2025-04-22 17:26:05 +08:00
deftdawg	32bbff942c	amd: add nbio 7.2.0 for some rdna2 (#9964 ) * - Updated of #9700 which fixes #9665 but for the Steam Deck which was erroring on NBIO 7.2.0 * unrelated change --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-04-22 12:10:48 +03:00
Ignacio Sica	0e79aee706	use_tensor_cores bugfix (#9969 )	2025-04-21 22:58:17 -03:00
chenyu	5294c32279	dev scripts for retinanet (#9968 ) also BASE_DIR -> BASEDIR for consistency, and move wandb up a bit for more accurate timing	2025-04-21 17:54:56 -04:00
nimlgen	4340197132	am: download fw from web (#9956 ) * am: download fw from web * tested * link works * default to web * this is default * not used	2025-04-21 23:26:33 +03:00
nimlgen	7244ca863c	am: fix double read of sdma fw (#9965 )	2025-04-21 23:04:34 +03:00
uuuvn	b35f94b6ec	Don't hardcode default CLOUDDEV (#9935 )	2025-04-21 18:46:55 +01:00
Francis Lata	defa1e77f6	get the proper dataset count (#9962 )	2025-04-21 12:11:37 -04:00
qazal	36ed3c3253	fix kernelize with VIEW children (#9961 )	2025-04-21 23:38:46 +08:00
uuuvn	757533cbe6	Less verbose cloud multiprocessing start (#9960 ) The set name before starting part used to be required for #9935 when CLOUDDEV was a global variable, now just readability improvement	2025-04-21 16:19:54 +01:00
Francis Lata	d7e247f329	RetinaNet INITMLPERF support (#9950 ) * fixes to make fake data work * fix eval beam * fix merge issue	2025-04-21 10:32:05 -04:00
kamilisjon	014f870733	rm (#9959 ) Co-authored-by: KamilisJonkus <kamilis.jonkus@agmis.com>	2025-04-21 15:23:45 +01:00
chenyu	f68c7041c4	doc fix is_floating_point dtype.float -> dtypes.float (#9958 )	2025-04-21 09:23:59 -04:00
akhuntsaria	2d423e6737	fix assertion message for supported device in export_model (#9957 )	2025-04-21 09:23:44 -04:00
ttomsa	783a191925	rm mul from _masked_setitem (#9951 )	2025-04-21 06:41:50 -04:00
nimlgen	46469f00a2	am: tiny changes in psp load (#9952 )	2025-04-21 11:52:02 +03:00
qazal	0bee225a58	Tensor.kernelize docs (#9946 ) * Tensor.kernelize docs * syntax * test_kernelize_bw * Tensor.kernelize docstring * pruning * tiny details * details 2 * becomes_map terminology * more changes to becomes	2025-04-21 16:34:03 +08:00
Francis Lata	ea4cb2c715	small cleanups (#9947 )	2025-04-20 20:33:20 -04:00
qazal	e8910540f6	Kernelize can be called multiple times on a Tensor (#9949 ) * Kernelize can be called multiple times on a Tensor * add (failing) test_kernelize_bw	2025-04-21 06:28:47 +08:00
qazal	1d90be2cff	match kernelize API in process replay (#9948 )	2025-04-21 05:23:41 +08:00
qazal	343a5eb588	dedup assigns in grouper VIZ name function [pr] (#9942 )	2025-04-20 21:42:25 +08:00
qazal	e20ef7196a	Tensor.kernelize (#9845 ) * add kernelize * remove that * kernelize returns self * update abstractions2.py * kernelize in test_schedule * temp: assert BUFFER_VIEW's existence * ASSIGN must have a buffer or subbuffer target * assert and shrink * fix * padded setitem * var * toposort once * extra * base_buffer * end with BUFFER_VIEW * setitem for disk * test_setitem_becomes_subbuffer * mul slice test * torch backend fix 1 * non-deterministic * keep subbuffer	2025-04-20 20:53:49 +08:00
qazal	dd16087f62	fold double ASSIGN to same target (#9941 )	2025-04-20 19:06:38 +08:00
qazal	9a9aba4cd5	setitem tests (some failing) from kernelize (#9940 )	2025-04-20 18:47:55 +08:00
chenyu	6c30948df6	hand_coded_optimizations returns list[Opt] [pr] (#9938 ) new api looks like `k.apply_opts(hand_coded_optimizations(k))`	2025-04-19 20:26:59 -04:00
chenyu	720f20865b	remove required_optimizations (#9848 )	2025-04-19 16:51:16 -04:00
qazal	218e01833d	update scheduler section for abstractions2.py [pr] (#9927 )	2025-04-19 12:09:14 +03:00
chenyu	3fdba48fc7	update bert green and README (#9934 ) submission candidate	2025-04-18 21:21:28 -04:00

... 38 39 40 41 42 ...

10490 Commits