tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
George Hotz	b341296304	hotfix: save sdxl ram	2025-04-27 12:09:45 -04:00
George Hotz	68c5f7ba80	load fast in sdxl (#10072 ) * load fast in sdxl * back to that with the ret * no context	2025-04-27 11:58:51 -04:00
George Hotz	768eb94c3e	disable debug for load_state_dict [pr] (#10070 )	2025-04-27 11:11:56 -04:00
George Hotz	4b8ef6ce78	hotfix: sdxl corealize	2025-04-27 10:41:46 -04:00
George Hotz	b6d2effaf5	assign is contiguous (#10066 ) * assign is contiguous * disable process replay for SDXL	2025-04-27 08:40:33 -04:00
George Hotz	1253819151	make beautiful indexing use a Variable (#10063 ) * make beautiful indexing use a Variable * stunning test * better color * training is broken * fix tests * fix variable indexing * fix test * no contiguous * revert that * revert that too * indexing two bind * skip for webgpu * make not slow	2025-04-27 08:22:38 -04:00
Rory Clear	a13a43c4fe	yolo 416 to 640 res (#10047 )	2025-04-26 20:45:58 -04:00
chenyu	4c1ce1a299	don't simplify if div folding resulted in negative numerator (#10064 ) * don't simplify if div folding resulted in negative numerator * test	2025-04-26 17:01:18 -04:00
George Hotz	1805403821	fix rand arange folding (#10060 ) * test rand range * --amend * fix rand arange folding * reduce_rangeless fix	2025-04-26 12:24:05 -04:00
qazal	d13c100981	don't sort dims in verify_sink_dims [pr] (#10059 ) * don't sort dims in verify_sink_dims [pr] * 1 can exist with n * put process_replay warn last * assert shape is the same * bring that back	2025-04-26 23:24:30 +08:00
George Hotz	c80fe6d5fc	handle some fancier reduces (#10057 ) * reduce_unparented * handle fancier reduces * fold more * bugfix	2025-04-26 11:20:15 -04:00
nimlgen	e08270c1ba	nv: fix program init for no-args kernels (#10058 )	2025-04-26 18:08:53 +03:00
George Hotz	11113c9d07	reduce_unparented (#10056 )	2025-04-26 09:48:16 -04:00
George Hotz	ea5dddc537	reduce collapse generic (#10045 ) * reduce collapse generic * new arange folder * new range folding * correct with sym * all tests pass * indexing ops passes * failing tests * fix tests, remove unused * revert that * torch indexing is fast * skip on webgpu * touchups * comments	2025-04-26 09:13:24 -04:00
quortus	5cdc96409e	Update outdated renderer.render calls (#10044 )	2025-04-26 07:35:19 -04:00
nimlgen	e055b9422f	am: fix mmap failures (#10054 )	2025-04-26 14:21:28 +03:00
qazal	e1d2b64e92	remu new instructions (#10050 ) * remu new instructions * test_ds_store_half * test_v_mul_f16	2025-04-26 02:04:12 +03:00
qazal	bba5d0a3e4	remu refactors (#10028 ) * remu refactors * scc is sgpr 253 * remove that * rename to vcc_lo * run cargo test in CI * llvm-mc * meh * work * work_group work 1 * seeded_lanes is dumb * better than seeded_lanes * does not need to be address * 128 sgpr per wave * scc is sgpr, we don't know which one * null_src once more * derive clone, wave init is cleaner * init comes first	2025-04-26 04:31:10 +08:00
nimlgen	0fc85a2b0a	hcqfuzz: init (#10049 ) * hcqfuzz: init * fix fuzz * linter * graph * taht test * update readme	2025-04-25 23:19:21 +03:00
qazal	b30050e287	fix amdgpu_disassemble on mac [pr] (#10042 )	2025-04-25 15:23:11 +08:00
George Hotz	a197aa4ef3	upat reduce syntax [pr] (#10040 ) * upat reduce syntax [pr] * switch z3 to graph_rewrite	2025-04-24 22:05:28 -04:00
Ignacio Sica	76a86735c0	hotfix `amd` bf16 is supported case (#10039 ) * hotfix amd and amd_llvm * bf16 not supported in ci * hotfix amd_llvm is not a device * remove default * dont gate on ci and amd_llvm * minor cleanup * skip bf16 tc test for amd_llvm	2025-04-24 21:29:27 -03:00
Ignacio Sica	b4f823acbe	fix helper_tc_allclose (#9606 ) * fix helper_tc_allclose * cleanup * hotfix * cleanup * cleanup * check real buffer and add cast for bf16 * cleanup * fix padded for ops_python * avoid assert on amd emulated tc * swap dimensions * revert, should have nothing to do with padded * revert fix, should not go in this pr * remove skip	2025-04-24 18:36:40 -03:00
Rory Clear	3a189fa561	More yolo processing in tinygrad (#9928 ) * more tg less np * update webgpu html for new compile * resize boxes * remove text * add back note * fix indentation * fix indentation * remove magic num * remove now unused funcs * back to numpy nms * no loop * fix iou suppression * update test * dont suppress other classes * add working scale * fix expected value, rounded up 0.24 was being counted * add postprocess bool for onnx test * fix indents * clean * clean * fix indent * remove print * fix indent * remove unused import * remove hardcoded 0.25 * space * spacing * clean label_predictions func * remove single item lists * space * use postprocess output in test * space * clean * clean * remove redundant threshold * remove redundant threshold * clean * rename var * move loop into func * unhardcode iou_threshold * remove unused values * clean * add note * clean * keep const * move back funcs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 16:21:46 -04:00
chenyu	74c6cf8be3	lint mlperf model_train (#10038 )	2025-04-24 16:19:44 -04:00
Ignacio Sica	51ca19d061	set `test_tensor_cores_padded_amd` to expectedFailure (#10036 ) * init * add expected failure to correctly track progres * hotfix * skip for amd_llvm as well * add skip * add pr number * move comment to amd test * change reason	2025-04-24 17:11:40 -03:00
b1tg	914d89fa0b	fix tensor cores for gfx1201 (#9838 ) * fix tensor cores for gfx1201 * fix typo * fix python wmma * AMDLLVMRenderer with arch + AMDLLVM tensor_cores * fix ci * clean up * more tensor cores for RDNA4 * fix half/half, bfloat16/float, bfloat16/bfloat16 for amd_llvm --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 14:57:41 -04:00
uuuvn	779aa1e2e9	Enable image tests on cloud if clouddev supports image (#9903 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 14:30:12 -04:00
uuuvn	29a12b19ea	Add macos CLOUD tests (#10033 ) A lot more work is required to enable all of them and move into osxtests matrix, for now i created a separate runner for them (copied from WebGPU) Will add test/test_graph.py to those tests in #9876	2025-04-24 14:14:13 -04:00
Nishant Rajadhyaksha	55942a8d8e	[Bounty] moved index_tensor off cpu in torch_backend (#9916 ) * moved index tensor off cpu in torch_backend * added support for None based indexing * fix_to_pass_tests * fix segfault tests	2025-04-24 14:12:37 -04:00
Ignacio Sica	373ca59b7f	use is_dtype_supported to check dtype support in tc tests (#10035 )	2025-04-24 14:59:14 -03:00
Ignacio Sica	93a1e9eeb9	improve `bf16` case for `is_dtype_supported` [pr] (#10034 ) * fix is_dtype_supported for bf16 * hotfix * add llvm and amd_llvm * gate on machine * separate gpu vs cpu cases * add arm case	2025-04-24 14:03:57 -03:00
uuuvn	754d789f51	Fix and enable jit tests on CLOUD (#10031 )	2025-04-24 18:39:31 +03:00
qazal	0b482fb824	add RDNA3 parser to remu (#10025 ) * llvm ref * work * all of them * salu * cleaner * start * vector ops * done * replace SMEM * vopd * sop1 * SOPC * null stays null_src * sopp * SOPK * sop2 * vop1 * vop2 * remove allow(dead_code) * vopc	2025-04-24 21:34:07 +08:00
uuuvn	0d903c9495	Print clouddev instead of cloudev's renderer (#10023 ) This is kind of a bug because currently with DEBUG>=1 it will say that remote has device and then an array of renderer props instead of a real device name which doesn't make sense: ``` 127.0.0.1 - - [24/Apr/2025 16:50:44] "GET /properties HTTP/1.1" 200 - remote has device ['tinygrad.renderer.cstyle', 'MetalRenderer', []] opened device CLOUD from pid:20210 ``` Now it will actually print the name of device behind cloud: ``` 127.0.0.1 - - [24/Apr/2025 16:56:29] "GET /properties HTTP/1.1" 200 - remote has device METAL opened device CLOUD from pid:20315 ```	2025-04-24 09:32:08 -04:00
George Hotz	aec75f51ef	fixup some slow CI tests [pr] (#10027 ) * fixup some slow CI tests [pr] * shrink test index	2025-04-24 09:20:49 -04:00
qazal	c990aac2b1	skip flaky test_transcribe_file1_OOB (#10026 )	2025-04-24 21:09:43 +08:00
George Hotz	4e2ccfddc6	ci refactor to split AMD/NVIDIA [pr] (#10024 ) * ci refactor to split AMD [pr] * split * split amd tests * explicit 0	2025-04-24 08:59:54 -04:00
uuuvn	0c68e44d6f	Cloud properties (#10021 )	2025-04-24 08:17:01 -04:00
George Hotz	db00d88415	hotfix: handle bad z3 install like z3 import fail	2025-04-24 08:09:40 -04:00
Sieds Lykles	e75be6eafc	[bounty] [pr] index validation with z3 (#9981 ) * index validation with z3 * Change comment * toposort -> toposort() --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 08:06:08 -04:00
quortus	9e49721c47	CPUGraph support for clang (#10014 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 07:52:35 -04:00
Park Jun	c3ad7b2a84	create randperm and support pytorch backend (#10019 )	2025-04-24 07:29:02 -04:00
Matthew Daiter	b545338e59	isin_Tensor_out added (#10018 )	2025-04-24 07:26:51 -04:00
chenyu	a25abf55e3	retinanet only call postprocess_detections with RUNMLPERF (#10017 ) during setup only need to compile `_eval_step().numpy()`	2025-04-23 20:45:38 -04:00
nimlgen	7f53e80db9	hotfix: amd mmio on mi300 (#10016 ) * hotfix: amd mmio on mi300 * fix * ops	2025-04-24 01:08:18 +03:00
nimlgen	1c5e353249	am: use mmio iface (#10012 ) * am: use mmio iface * linters * fixes * fixes + cleanups * mute * mypy * style	2025-04-24 00:27:04 +03:00
chenyu	65faa1d94b	explicit device in mlperf scripts (#10015 )	2025-04-23 17:11:52 -04:00
chenyu	a3f938dbee	remove retinanet INITMLPERF from beam script (#10011 ) it only controls logging, loading real data or not is solely controlled by RUNMLPERF	2025-04-23 14:32:54 -04:00
nimlgen	cc52b9c528	am: add entry() to PT (#10010 )	2025-04-23 20:45:52 +03:00

1 2 3 4 5 ...

8599 Commits