tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 07:18:40 -05:00

Author	SHA1	Message	Date
uuuvn	754d789f51	Fix and enable jit tests on CLOUD (#10031 )	2025-04-24 18:39:31 +03:00
qazal	0b482fb824	add RDNA3 parser to remu (#10025 ) * llvm ref * work * all of them * salu * cleaner * start * vector ops * done * replace SMEM * vopd * sop1 * SOPC * null stays null_src * sopp * SOPK * sop2 * vop1 * vop2 * remove allow(dead_code) * vopc	2025-04-24 21:34:07 +08:00
uuuvn	0d903c9495	Print clouddev instead of cloudev's renderer (#10023 ) This is kind of a bug because currently with DEBUG>=1 it will say that remote has device and then an array of renderer props instead of a real device name which doesn't make sense: ``` 127.0.0.1 - - [24/Apr/2025 16:50:44] "GET /properties HTTP/1.1" 200 - remote has device ['tinygrad.renderer.cstyle', 'MetalRenderer', []] opened device CLOUD from pid:20210 ``` Now it will actually print the name of device behind cloud: ``` 127.0.0.1 - - [24/Apr/2025 16:56:29] "GET /properties HTTP/1.1" 200 - remote has device METAL opened device CLOUD from pid:20315 ```	2025-04-24 09:32:08 -04:00
George Hotz	aec75f51ef	fixup some slow CI tests [pr] (#10027 ) * fixup some slow CI tests [pr] * shrink test index	2025-04-24 09:20:49 -04:00
qazal	c990aac2b1	skip flaky test_transcribe_file1_OOB (#10026 )	2025-04-24 21:09:43 +08:00
George Hotz	4e2ccfddc6	ci refactor to split AMD/NVIDIA [pr] (#10024 ) * ci refactor to split AMD [pr] * split * split amd tests * explicit 0	2025-04-24 08:59:54 -04:00
uuuvn	0c68e44d6f	Cloud properties (#10021 )	2025-04-24 08:17:01 -04:00
George Hotz	db00d88415	hotfix: handle bad z3 install like z3 import fail	2025-04-24 08:09:40 -04:00
Sieds Lykles	e75be6eafc	[bounty] [pr] index validation with z3 (#9981 ) * index validation with z3 * Change comment * toposort -> toposort() --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 08:06:08 -04:00
quortus	9e49721c47	CPUGraph support for clang (#10014 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 07:52:35 -04:00
Park Jun	c3ad7b2a84	create randperm and support pytorch backend (#10019 )	2025-04-24 07:29:02 -04:00
Matthew Daiter	b545338e59	isin_Tensor_out added (#10018 )	2025-04-24 07:26:51 -04:00
chenyu	a25abf55e3	retinanet only call postprocess_detections with RUNMLPERF (#10017 ) during setup only need to compile `_eval_step().numpy()`	2025-04-23 20:45:38 -04:00
nimlgen	7f53e80db9	hotfix: amd mmio on mi300 (#10016 ) * hotfix: amd mmio on mi300 * fix * ops	2025-04-24 01:08:18 +03:00
nimlgen	1c5e353249	am: use mmio iface (#10012 ) * am: use mmio iface * linters * fixes * fixes + cleanups * mute * mypy * style	2025-04-24 00:27:04 +03:00
chenyu	65faa1d94b	explicit device in mlperf scripts (#10015 )	2025-04-23 17:11:52 -04:00
chenyu	a3f938dbee	remove retinanet INITMLPERF from beam script (#10011 ) it only controls logging, loading real data or not is solely controlled by RUNMLPERF	2025-04-23 14:32:54 -04:00
nimlgen	cc52b9c528	am: add entry() to PT (#10010 )	2025-04-23 20:45:52 +03:00
nimlgen	c952cb965e	amd: use mmio iface (#9997 ) * amd: use mmio iface * mypy * rename	2025-04-23 20:13:00 +03:00
Francis Lata	5542aeb0e4	RetinaNet MLPerf flag updates (#10009 ) * add RUNMLPERF and update INITMLPERF usage * update scripts to use RUNMLPERF	2025-04-23 13:00:34 -04:00
George Hotz	de0504276b	pop 0 is slow [pr] (#10007 )	2025-04-23 17:00:59 +01:00
chenyu	d3a8d5c128	print postprocess_detections time in retinanet eval (#10005 ) `BS=96 BASEDIR="/raid/datasets/openimages" MODEL=retinanet python examples/mlperf/model_eval.py` ``` ... loaded dataset @ 8.64s loaded initial data @ 12.57s **** 619.97 ms to enqueue, 46042.13 ms to realize ( 116.22 ms fetching, 45399.58 ms postprocess_detections). 0.09 examples/sec. 0.83 TFLOPS @ 59.23s ** 147.49 ms to enqueue, 37362.16 ms to realize ( 146.96 ms fetching, 36618.84 ms postprocess_detections). 0.11 examples/sec. 1.03 TFLOPS @ 96.74s ** 152.85 ms to enqueue, 37244.08 ms to realize ( 120.67 ms fetching, 36235.19 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 134.14s ** 146.39 ms to enqueue, 37279.85 ms to realize ( 65.07 ms fetching, 36233.56 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 171.56s ** 152.41 ms to enqueue, 37264.04 ms to realize ( 127.08 ms fetching, 36196.10 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 208.98s ** 151.29 ms to enqueue, 36868.08 ms to realize ( 142.73 ms fetching, 36153.07 ms postprocess_detections). 0.11 examples/sec. 1.05 TFLOPS @ 246.00s **** 136.41 ms to enqueue, 37325.04 ms to realize ( 90.29 ms fetching, 36573.38 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 283.46s ```	2025-04-23 11:39:56 -04:00
George Hotz	2ed3acd767	toposort is a function [pr] (#10004 )	2025-04-23 16:25:03 +01:00
uuuvn	0730ff0e50	Skip test that requires lru if device's allocator isn't lru (#10003 )	2025-04-23 16:12:56 +01:00
George Hotz	954cb06957	deepwalk without recursion [pr] (#10002 ) * deepwalk without recursion [pr] * comment and remove that test	2025-04-23 15:57:50 +01:00
uuuvn	9de73ccc22	Skip test that requires python 3.12 on older versions (#10001 ) `out.cast(it.dtype.fmt).tolist()` fails with `ValueError: memoryview: destination format must be a native single character format prefixed with an optional '@'`	2025-04-23 10:09:26 -04:00
George Hotz	71ecc7fa1a	use a pattern matcher for upcast [pr] (#10000 )	2025-04-23 14:24:23 +01:00
George Hotz	cc1087d2ec	move simplify into views_to_indexed_uops (#9999 ) * move simplify into views_to_indexed_uops * cache that	2025-04-23 13:50:27 +01:00
chenyu	c39128133c	retinanet green scripts (#9996 ) also removed realize in data_get and used empty for fake data. slightly bigger lr. https://wandb.ai/chenyuxyz/MLPerf-RetinaNet/runs/8skid0e8?nw=nwuserchenyuxyz	2025-04-23 08:28:03 -04:00
George Hotz	a4a5f2d54a	faster block order [pr] (#9998 ) * faster block reorder [pr] * ahh, that's even faster	2025-04-23 13:11:30 +01:00
chenyu	61bfd23881	update mlperf-logging version (#9995 )	2025-04-22 19:32:39 -04:00
pkotzbach	dbbd755cba	FP8s truncate (#9937 ) * truncate fp8 * fix * maybe like that? * fix linters * ruff * move from extra and add ml_types to tests * minor changes * str to dtypes and nan support --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2025-04-22 19:12:49 -04:00
qazal	58180caad3	schedule linearize small cleanups [pr] (#9994 )	2025-04-23 05:42:29 +08:00
qazal	f4ec57baff	new schedule linearizer enqueues KERNEL UOps [pr] (#9993 ) * new schedule linearizer enqueues kernels [pr] * no defaultdict * diff * minor	2025-04-23 05:17:58 +08:00
George Hotz	d1f6701eb7	hotfix: lower amd threshold + improve block reorder test	2025-04-22 20:44:29 +01:00
nimlgen	db51133537	rename HWInterface -> FileIOInterface (#9989 ) * rename HWInterface -> FileIOInterface * ugh	2025-04-22 22:18:57 +03:00
George Hotz	c1539b0319	putting add first orders loads as expected (#9991 )	2025-04-22 20:12:05 +01:00
nimlgen	bd580d8ea4	hcq: use mmio interface in nv (#9986 ) * hcq: start mmio interface * allow double cast * revert * faster? * simpler, not needed more now * dd * types * fix	2025-04-22 21:58:12 +03:00
George Hotz	feee6986c9	faster block reorder (#9990 ) * faster block reorder [pr] * that shouldn't change order * key just in sorted * ind	2025-04-22 19:18:57 +01:00
qazal	6cb2d18c03	refactor schedule linearize to defaultdict [pr] (#9984 ) * refactor schedule linearize to defaultdict [pr] * skip that * don't need .get	2025-04-23 00:00:23 +08:00
chenyu	9e5e371999	make DISABLE_COMPILER_CACHE a ContextVar [pr] (#9983 )	2025-04-22 10:32:54 -04:00
qazal	bbc324f5dc	remove CAST_AFTER_EXPAND (#9980 )	2025-04-22 21:06:11 +08:00
George Hotz	c519b553db	non recursive toposort is 2x+ faster (#9979 ) * non recursive toposort is 2x+ faster * don't change the order	2025-04-22 13:59:38 +01:00
qazal	0d9014d021	place create_ast last, type_verify in the end (once) [pr] (#9977 )	2025-04-22 20:15:23 +08:00
chenyu	fb89d9a584	retinanet eval combine output on GPUS[0] (#9966 ) eval 35 sec -> 20 sec. it was spending 13 seconds assembling output tensor on CPU backend. GPUS[0] seems to have enough memory, otherwise we can lower EVAL_BS	2025-04-22 07:43:51 -04:00
qazal	7b55846e08	prep STORE UOp creation for multi output [pr] (#9975 ) * prep STORE UOp creation for multi output [pr] * test_multioutput_ast	2025-04-22 19:34:52 +08:00
George Hotz	e358e0a0c6	move metadata set to tensor [pr] (#9976 ) * move metadata set to tensor [pr] * only track that in tensor.py	2025-04-22 12:30:35 +01:00
qazal	f6271515fe	refactor UOp.st [pr] (#9973 )	2025-04-22 18:46:56 +08:00
George Hotz	f5dc70c624	microbenchmarks + micro speed ups (#9972 ) * microbenchmarks * forgot the ubenchs * clean up type verify	2025-04-22 11:30:46 +01:00
qazal	1cf4e24ca5	fix kernelize usage with pm_gradient (#9953 ) * fix kernelize usage with pm_gradient * remove that	2025-04-22 17:26:05 +08:00

... 36 37 38 39 40 ...

10417 Commits