tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 07:18:40 -05:00

Author	SHA1	Message	Date
deftdawg	32bbff942c	amd: add nbio 7.2.0 for some rdna2 (#9964 ) * - Updated of #9700 which fixes #9665 but for the Steam Deck which was erroring on NBIO 7.2.0 * unrelated change --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-04-22 12:10:48 +03:00
Ignacio Sica	0e79aee706	use_tensor_cores bugfix (#9969 )	2025-04-21 22:58:17 -03:00
chenyu	5294c32279	dev scripts for retinanet (#9968 ) also BASE_DIR -> BASEDIR for consistency, and move wandb up a bit for more accurate timing	2025-04-21 17:54:56 -04:00
nimlgen	4340197132	am: download fw from web (#9956 ) * am: download fw from web * tested * link works * default to web * this is default * not used	2025-04-21 23:26:33 +03:00
nimlgen	7244ca863c	am: fix double read of sdma fw (#9965 )	2025-04-21 23:04:34 +03:00
uuuvn	b35f94b6ec	Don't hardcode default CLOUDDEV (#9935 )	2025-04-21 18:46:55 +01:00
Francis Lata	defa1e77f6	get the proper dataset count (#9962 )	2025-04-21 12:11:37 -04:00
qazal	36ed3c3253	fix kernelize with VIEW children (#9961 )	2025-04-21 23:38:46 +08:00
uuuvn	757533cbe6	Less verbose cloud multiprocessing start (#9960 ) The set name before starting part used to be required for #9935 when CLOUDDEV was a global variable, now just readability improvement	2025-04-21 16:19:54 +01:00
Francis Lata	d7e247f329	RetinaNet INITMLPERF support (#9950 ) * fixes to make fake data work * fix eval beam * fix merge issue	2025-04-21 10:32:05 -04:00
kamilisjon	014f870733	rm (#9959 ) Co-authored-by: KamilisJonkus <kamilis.jonkus@agmis.com>	2025-04-21 15:23:45 +01:00
chenyu	f68c7041c4	doc fix is_floating_point dtype.float -> dtypes.float (#9958 )	2025-04-21 09:23:59 -04:00
akhuntsaria	2d423e6737	fix assertion message for supported device in export_model (#9957 )	2025-04-21 09:23:44 -04:00
ttomsa	783a191925	rm mul from _masked_setitem (#9951 )	2025-04-21 06:41:50 -04:00
nimlgen	46469f00a2	am: tiny changes in psp load (#9952 )	2025-04-21 11:52:02 +03:00
qazal	0bee225a58	Tensor.kernelize docs (#9946 ) * Tensor.kernelize docs * syntax * test_kernelize_bw * Tensor.kernelize docstring * pruning * tiny details * details 2 * becomes_map terminology * more changes to becomes	2025-04-21 16:34:03 +08:00
Francis Lata	ea4cb2c715	small cleanups (#9947 )	2025-04-20 20:33:20 -04:00
qazal	e8910540f6	Kernelize can be called multiple times on a Tensor (#9949 ) * Kernelize can be called multiple times on a Tensor * add (failing) test_kernelize_bw	2025-04-21 06:28:47 +08:00
qazal	1d90be2cff	match kernelize API in process replay (#9948 )	2025-04-21 05:23:41 +08:00
qazal	343a5eb588	dedup assigns in grouper VIZ name function [pr] (#9942 )	2025-04-20 21:42:25 +08:00
qazal	e20ef7196a	Tensor.kernelize (#9845 ) * add kernelize * remove that * kernelize returns self * update abstractions2.py * kernelize in test_schedule * temp: assert BUFFER_VIEW's existence * ASSIGN must have a buffer or subbuffer target * assert and shrink * fix * padded setitem * var * toposort once * extra * base_buffer * end with BUFFER_VIEW * setitem for disk * test_setitem_becomes_subbuffer * mul slice test * torch backend fix 1 * non-deterministic * keep subbuffer	2025-04-20 20:53:49 +08:00
qazal	dd16087f62	fold double ASSIGN to same target (#9941 )	2025-04-20 19:06:38 +08:00
qazal	9a9aba4cd5	setitem tests (some failing) from kernelize (#9940 )	2025-04-20 18:47:55 +08:00
chenyu	6c30948df6	hand_coded_optimizations returns list[Opt] [pr] (#9938 ) new api looks like `k.apply_opts(hand_coded_optimizations(k))`	2025-04-19 20:26:59 -04:00
chenyu	720f20865b	remove required_optimizations (#9848 )	2025-04-19 16:51:16 -04:00
qazal	218e01833d	update scheduler section for abstractions2.py [pr] (#9927 )	2025-04-19 12:09:14 +03:00
chenyu	3fdba48fc7	update bert green and README (#9934 ) submission candidate	2025-04-18 21:21:28 -04:00
George Hotz	b359125ebf	rewrite the linearizer (#9885 ) * random speedups [pr] * speeding up linearizer * test_gemm passes * progress * test_gemm passes * working * simpler * blockstart unneeded * simpler * bugfix * work * don't compare * faster * progress * cleanups * work * cleanups * working * reorder * name is dumb * fix tests * lin2 works * clean ctx * mostly bottom up * passes * same speed now * new lin is faster * dedup * lines and tuples * track that * lin * revert that * tests should pass * merge siblings * cleaner expression * only lin2 * finally, some speed * simpler * fix unmergables with blockends	2025-04-18 22:35:40 +01:00
Ignacio Sica	023b1c28a2	`test_tensor_cores_padded` refactor (#9724 ) * set pad t 3 for amd padded tc test * change pad for amd regardless CI * test tc padded uops and correctness separately * add test_tensor_cores_padded_uops test to ci * remove redundant chack for amd device * cleanup	2025-04-18 17:05:54 -03:00
Ignacio Sica	afff82ba0f	fix `ptx` linearizer bug [pr] (#9926 ) * fix ptx bug * align 16 * revert align because it breaks pr * smallest diff that fixes ptx bug	2025-04-18 13:48:43 -03:00
chenyu	617b45748f	fuse embedding for bert on red (#9925 ) also updated BEAM param and use AMD driver for actual run. 535ms step	2025-04-18 07:20:25 -04:00
qazal	b58decac0c	fix diamond assigns before mapping tensors UOps to assigns (#9855 ) * keep tensor_map until diamond assign fixup * ctx	2025-04-18 14:17:43 +03:00
qazal	a37d921917	get name from SINK in process replay (#9924 ) * get name from SINK in process replay * space	2025-04-18 13:51:11 +03:00
George Hotz	aa98aff4cd	don't use ops name, just keep sink (#9922 ) * don't use ops name, just keep sink * fix test * endif sink	2025-04-18 08:59:18 +01:00
George Hotz	8919370c76	hotfix: fix test_save_all_dtypes on METAL	2025-04-18 08:42:31 +01:00
qazal	16dfe0a902	upstream remu (#9921 )	2025-04-18 01:57:36 +03:00
qazal	d287afe3b1	remove shapeless const check in full_shape [pr] (#9911 ) * remove shapeless const check in full_shape [pr] * those can go too	2025-04-18 00:00:26 +03:00
chenyu	fe6a482f1d	pin hypothesis version to 6.131.0 (#9920 ) 6.131.1 seems to cause timeout in CI	2025-04-17 16:34:10 -04:00
chenyu	f5256e0020	Kernel.apply_opts [pr] (#9917 ) * Kernel.apply_opts [pr] updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization * not you yet	2025-04-17 08:00:56 -04:00
chenyu	e2ed673c94	FUSE_ARANGE_UINT to not fuse uint (#9915 ) hack to bypass rand, can FUSE_ARANGE on green for 6ms per step	2025-04-16 18:49:38 -04:00
qazal	497daa658a	hotfix: edge-labels go above the overlay (#9910 )	2025-04-16 23:38:12 +08:00
qazal	e8e43c6dad	ensure edge labels are always on top (#9908 )	2025-04-16 21:08:06 +08:00
qazal	5265f25088	add counter for incoming edges in viz (#9907 )	2025-04-16 20:14:14 +08:00
Eitan Turok	2c7c205bc5	Fix dtype comparisons in vectorized transcendental + tests (#9794 ) * init test * cleanup * init * update * fix * fix python runtime for vectorized code * awesome helper * update * update * cleanup * more cleaning * cleanup more * fix tests * more cleaning * cleanup more * fix * even cleaner * failing tests is sad * cleanup * better name * make tests pass * remove vec from python runtime * remove vec from eval_uop * remove expected failues * better name	2025-04-16 08:06:12 -04:00
qazal	929e5a9905	do not construct GrouperContext [pr] (#9906 )	2025-04-16 18:26:31 +08:00
Xingyu	047c8fd70d	Add amax support to Tensor operations in Torch Backend (#9905 ) * Add amax support to Tensor operations - Implemented amax function in backend.py for tensor max operations. - Added unit tests for amax in test.py to ensure correct functionality. * Fix formatting in amax output function - Adjusted spacing in the amax output lambda function in backend.py - Improved code readability for better maintenance	2025-04-16 10:35:50 +01:00
uuuvn	d7f623dac2	Use Buffer in cloud server instead of opaques (#9875 ) Not-quite-required but makes cloud graph a lot cleaner because unlike raw compiled programs `GraphRunner` takes `Buffer`s like other runners. Otherwise either of: adding a new option to not free on `__del__`, (ab)using `external_ptr` to prevent free, or making something like a `FakeBuffer` is required.	2025-04-16 10:17:32 +01:00
qazal	05334e0f3f	construct children from UOp.toposort [pr] (#9882 ) * construct children from UOp.toposort [pr] * only for bases	2025-04-16 16:55:59 +08:00
geohotstan	4e8f25109a	Revert "ONNX add output shape validation (#9720 )" (#9904 ) This reverts commit `ac713e04db`.	2025-04-16 03:15:56 -04:00
chenyu	e8024c8281	faster bert global_norm (#9901 ) tinyamd 2% faster. also updated beam params that's 2-3% faster. update mlperf doc and steps too	2025-04-15 18:24:44 -04:00

... 37 38 39 40 41 ...

10417 Commits