tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
qazal	d13c100981	don't sort dims in verify_sink_dims [pr] (#10059 ) * don't sort dims in verify_sink_dims [pr] * 1 can exist with n * put process_replay warn last * assert shape is the same * bring that back	2025-04-26 23:24:30 +08:00
George Hotz	c80fe6d5fc	handle some fancier reduces (#10057 ) * reduce_unparented * handle fancier reduces * fold more * bugfix	2025-04-26 11:20:15 -04:00
nimlgen	e08270c1ba	nv: fix program init for no-args kernels (#10058 )	2025-04-26 18:08:53 +03:00
George Hotz	11113c9d07	reduce_unparented (#10056 )	2025-04-26 09:48:16 -04:00
George Hotz	ea5dddc537	reduce collapse generic (#10045 ) * reduce collapse generic * new arange folder * new range folding * correct with sym * all tests pass * indexing ops passes * failing tests * fix tests, remove unused * revert that * torch indexing is fast * skip on webgpu * touchups * comments	2025-04-26 09:13:24 -04:00
quortus	5cdc96409e	Update outdated renderer.render calls (#10044 )	2025-04-26 07:35:19 -04:00
nimlgen	e055b9422f	am: fix mmap failures (#10054 )	2025-04-26 14:21:28 +03:00
qazal	e1d2b64e92	remu new instructions (#10050 ) * remu new instructions * test_ds_store_half * test_v_mul_f16	2025-04-26 02:04:12 +03:00
qazal	bba5d0a3e4	remu refactors (#10028 ) * remu refactors * scc is sgpr 253 * remove that * rename to vcc_lo * run cargo test in CI * llvm-mc * meh * work * work_group work 1 * seeded_lanes is dumb * better than seeded_lanes * does not need to be address * 128 sgpr per wave * scc is sgpr, we don't know which one * null_src once more * derive clone, wave init is cleaner * init comes first	2025-04-26 04:31:10 +08:00
nimlgen	0fc85a2b0a	hcqfuzz: init (#10049 ) * hcqfuzz: init * fix fuzz * linter * graph * taht test * update readme	2025-04-25 23:19:21 +03:00
qazal	b30050e287	fix amdgpu_disassemble on mac [pr] (#10042 )	2025-04-25 15:23:11 +08:00
George Hotz	a197aa4ef3	upat reduce syntax [pr] (#10040 ) * upat reduce syntax [pr] * switch z3 to graph_rewrite	2025-04-24 22:05:28 -04:00
Ignacio Sica	76a86735c0	hotfix `amd` bf16 is supported case (#10039 ) * hotfix amd and amd_llvm * bf16 not supported in ci * hotfix amd_llvm is not a device * remove default * dont gate on ci and amd_llvm * minor cleanup * skip bf16 tc test for amd_llvm	2025-04-24 21:29:27 -03:00
Ignacio Sica	b4f823acbe	fix helper_tc_allclose (#9606 ) * fix helper_tc_allclose * cleanup * hotfix * cleanup * cleanup * check real buffer and add cast for bf16 * cleanup * fix padded for ops_python * avoid assert on amd emulated tc * swap dimensions * revert, should have nothing to do with padded * revert fix, should not go in this pr * remove skip	2025-04-24 18:36:40 -03:00
Rory Clear	3a189fa561	More yolo processing in tinygrad (#9928 ) * more tg less np * update webgpu html for new compile * resize boxes * remove text * add back note * fix indentation * fix indentation * remove magic num * remove now unused funcs * back to numpy nms * no loop * fix iou suppression * update test * dont suppress other classes * add working scale * fix expected value, rounded up 0.24 was being counted * add postprocess bool for onnx test * fix indents * clean * clean * fix indent * remove print * fix indent * remove unused import * remove hardcoded 0.25 * space * spacing * clean label_predictions func * remove single item lists * space * use postprocess output in test * space * clean * clean * remove redundant threshold * remove redundant threshold * clean * rename var * move loop into func * unhardcode iou_threshold * remove unused values * clean * add note * clean * keep const * move back funcs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 16:21:46 -04:00
chenyu	74c6cf8be3	lint mlperf model_train (#10038 )	2025-04-24 16:19:44 -04:00
Ignacio Sica	51ca19d061	set `test_tensor_cores_padded_amd` to expectedFailure (#10036 ) * init * add expected failure to correctly track progres * hotfix * skip for amd_llvm as well * add skip * add pr number * move comment to amd test * change reason	2025-04-24 17:11:40 -03:00
b1tg	914d89fa0b	fix tensor cores for gfx1201 (#9838 ) * fix tensor cores for gfx1201 * fix typo * fix python wmma * AMDLLVMRenderer with arch + AMDLLVM tensor_cores * fix ci * clean up * more tensor cores for RDNA4 * fix half/half, bfloat16/float, bfloat16/bfloat16 for amd_llvm --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 14:57:41 -04:00
uuuvn	779aa1e2e9	Enable image tests on cloud if clouddev supports image (#9903 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 14:30:12 -04:00
uuuvn	29a12b19ea	Add macos CLOUD tests (#10033 ) A lot more work is required to enable all of them and move into osxtests matrix, for now i created a separate runner for them (copied from WebGPU) Will add test/test_graph.py to those tests in #9876	2025-04-24 14:14:13 -04:00
Nishant Rajadhyaksha	55942a8d8e	[Bounty] moved index_tensor off cpu in torch_backend (#9916 ) * moved index tensor off cpu in torch_backend * added support for None based indexing * fix_to_pass_tests * fix segfault tests	2025-04-24 14:12:37 -04:00
Ignacio Sica	373ca59b7f	use is_dtype_supported to check dtype support in tc tests (#10035 )	2025-04-24 14:59:14 -03:00
Ignacio Sica	93a1e9eeb9	improve `bf16` case for `is_dtype_supported` [pr] (#10034 ) * fix is_dtype_supported for bf16 * hotfix * add llvm and amd_llvm * gate on machine * separate gpu vs cpu cases * add arm case	2025-04-24 14:03:57 -03:00
uuuvn	754d789f51	Fix and enable jit tests on CLOUD (#10031 )	2025-04-24 18:39:31 +03:00
qazal	0b482fb824	add RDNA3 parser to remu (#10025 ) * llvm ref * work * all of them * salu * cleaner * start * vector ops * done * replace SMEM * vopd * sop1 * SOPC * null stays null_src * sopp * SOPK * sop2 * vop1 * vop2 * remove allow(dead_code) * vopc	2025-04-24 21:34:07 +08:00
uuuvn	0d903c9495	Print clouddev instead of cloudev's renderer (#10023 ) This is kind of a bug because currently with DEBUG>=1 it will say that remote has device and then an array of renderer props instead of a real device name which doesn't make sense: ``` 127.0.0.1 - - [24/Apr/2025 16:50:44] "GET /properties HTTP/1.1" 200 - remote has device ['tinygrad.renderer.cstyle', 'MetalRenderer', []] opened device CLOUD from pid:20210 ``` Now it will actually print the name of device behind cloud: ``` 127.0.0.1 - - [24/Apr/2025 16:56:29] "GET /properties HTTP/1.1" 200 - remote has device METAL opened device CLOUD from pid:20315 ```	2025-04-24 09:32:08 -04:00
George Hotz	aec75f51ef	fixup some slow CI tests [pr] (#10027 ) * fixup some slow CI tests [pr] * shrink test index	2025-04-24 09:20:49 -04:00
qazal	c990aac2b1	skip flaky test_transcribe_file1_OOB (#10026 )	2025-04-24 21:09:43 +08:00
George Hotz	4e2ccfddc6	ci refactor to split AMD/NVIDIA [pr] (#10024 ) * ci refactor to split AMD [pr] * split * split amd tests * explicit 0	2025-04-24 08:59:54 -04:00
uuuvn	0c68e44d6f	Cloud properties (#10021 )	2025-04-24 08:17:01 -04:00
George Hotz	db00d88415	hotfix: handle bad z3 install like z3 import fail	2025-04-24 08:09:40 -04:00
Sieds Lykles	e75be6eafc	[bounty] [pr] index validation with z3 (#9981 ) * index validation with z3 * Change comment * toposort -> toposort() --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 08:06:08 -04:00
quortus	9e49721c47	CPUGraph support for clang (#10014 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 07:52:35 -04:00
Park Jun	c3ad7b2a84	create randperm and support pytorch backend (#10019 )	2025-04-24 07:29:02 -04:00
Matthew Daiter	b545338e59	isin_Tensor_out added (#10018 )	2025-04-24 07:26:51 -04:00
chenyu	a25abf55e3	retinanet only call postprocess_detections with RUNMLPERF (#10017 ) during setup only need to compile `_eval_step().numpy()`	2025-04-23 20:45:38 -04:00
nimlgen	7f53e80db9	hotfix: amd mmio on mi300 (#10016 ) * hotfix: amd mmio on mi300 * fix * ops	2025-04-24 01:08:18 +03:00
nimlgen	1c5e353249	am: use mmio iface (#10012 ) * am: use mmio iface * linters * fixes * fixes + cleanups * mute * mypy * style	2025-04-24 00:27:04 +03:00
chenyu	65faa1d94b	explicit device in mlperf scripts (#10015 )	2025-04-23 17:11:52 -04:00
chenyu	a3f938dbee	remove retinanet INITMLPERF from beam script (#10011 ) it only controls logging, loading real data or not is solely controlled by RUNMLPERF	2025-04-23 14:32:54 -04:00
nimlgen	cc52b9c528	am: add entry() to PT (#10010 )	2025-04-23 20:45:52 +03:00
nimlgen	c952cb965e	amd: use mmio iface (#9997 ) * amd: use mmio iface * mypy * rename	2025-04-23 20:13:00 +03:00
Francis Lata	5542aeb0e4	RetinaNet MLPerf flag updates (#10009 ) * add RUNMLPERF and update INITMLPERF usage * update scripts to use RUNMLPERF	2025-04-23 13:00:34 -04:00
George Hotz	de0504276b	pop 0 is slow [pr] (#10007 )	2025-04-23 17:00:59 +01:00
chenyu	d3a8d5c128	print postprocess_detections time in retinanet eval (#10005 ) `BS=96 BASEDIR="/raid/datasets/openimages" MODEL=retinanet python examples/mlperf/model_eval.py` ``` ... loaded dataset @ 8.64s loaded initial data @ 12.57s **** 619.97 ms to enqueue, 46042.13 ms to realize ( 116.22 ms fetching, 45399.58 ms postprocess_detections). 0.09 examples/sec. 0.83 TFLOPS @ 59.23s ** 147.49 ms to enqueue, 37362.16 ms to realize ( 146.96 ms fetching, 36618.84 ms postprocess_detections). 0.11 examples/sec. 1.03 TFLOPS @ 96.74s ** 152.85 ms to enqueue, 37244.08 ms to realize ( 120.67 ms fetching, 36235.19 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 134.14s ** 146.39 ms to enqueue, 37279.85 ms to realize ( 65.07 ms fetching, 36233.56 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 171.56s ** 152.41 ms to enqueue, 37264.04 ms to realize ( 127.08 ms fetching, 36196.10 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 208.98s ** 151.29 ms to enqueue, 36868.08 ms to realize ( 142.73 ms fetching, 36153.07 ms postprocess_detections). 0.11 examples/sec. 1.05 TFLOPS @ 246.00s **** 136.41 ms to enqueue, 37325.04 ms to realize ( 90.29 ms fetching, 36573.38 ms postprocess_detections). 0.11 examples/sec. 1.04 TFLOPS @ 283.46s ```	2025-04-23 11:39:56 -04:00
George Hotz	2ed3acd767	toposort is a function [pr] (#10004 )	2025-04-23 16:25:03 +01:00
uuuvn	0730ff0e50	Skip test that requires lru if device's allocator isn't lru (#10003 )	2025-04-23 16:12:56 +01:00
George Hotz	954cb06957	deepwalk without recursion [pr] (#10002 ) * deepwalk without recursion [pr] * comment and remove that test	2025-04-23 15:57:50 +01:00
uuuvn	9de73ccc22	Skip test that requires python 3.12 on older versions (#10001 ) `out.cast(it.dtype.fmt).tolist()` fails with `ValueError: memoryview: destination format must be a native single character format prefixed with an optional '@'`	2025-04-23 10:09:26 -04:00
George Hotz	71ecc7fa1a	use a pattern matcher for upcast [pr] (#10000 )	2025-04-23 14:24:23 +01:00

... 37 38 39 40 41 ...

10490 Commits