tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 07:18:40 -05:00

Author	SHA1	Message	Date
George Hotz	d32f5e9f3a	improve rendering of shapes in viz + investigate symbolic [pr] (#10091 )	2025-04-28 16:44:09 -04:00
Sieds Lykles	dbb7aee02e	Split constant in div with negative x (#10088 ) * add rule * change test * lower complexity limit * remove offset in fold_unrolled_divs * remove import * add one more condition	2025-04-28 16:24:14 -04:00
chenyu	610ee79b22	cherry pick mlperf5.0 branch to master (#10089 )	2025-04-28 15:36:56 -04:00
chenyu	459a223202	simpler Literal annotation in code_for_workitem [pr] (#10087 )	2025-04-28 14:59:25 -04:00
nimlgen	dcd9a633c3	am: load minimum fw (#10083 ) * am: load minimum psp parts * try thos * remove me & pfp	2025-04-28 21:28:05 +03:00
George Hotz	ecff82a698	fixing single kernel softmax: resolve (#10086 ) * fixing single kernel softmax: resolve * add failing lin test	2025-04-28 13:46:20 -04:00
George Hotz	4c242b0483	hotfix: tests all pass on metal local	2025-04-28 12:09:00 -04:00
George Hotz	690dac79b5	don't modify the ranges on reduce rewrite (#10062 ) * bug in div range folding * simpler * oh, this is right for indexing, but the div mod folding needs to be fixed * reenable * Passing test_complexity_w_unroll2 (#10068) * Passing * remove non_folded_divs * Add check for negative tern in div folding * Add test * bump that limit * fix casted --------- Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>	2025-04-28 12:01:19 -04:00
quortus	5130759605	Make sure clang always inline batched functions (#10037 )	2025-04-28 10:48:24 -04:00
George Hotz	c4a50f9d89	fix full shape in kernel.py [pr] (#10085 ) * fix full shape in kernel.py * fix that heuristic * full shape in shapetracker is fast * fix process replay [pr] * simpler * this * i'm just going to ignore that one	2025-04-28 09:32:58 -04:00
qazal	ac37510f60	remu: only write v_cmp result if exec is set (#10084 )	2025-04-28 20:31:52 +08:00
qazal	d6b436a815	remu bugfix with -0.0 negation (#10082 )	2025-04-28 15:46:42 +08:00
nimlgen	15e4302784	am: optimize zeroing out boot structs (#10081 )	2025-04-28 10:15:32 +03:00
nimlgen	68e5ab8552	am: fix typo in fw loading (#10080 )	2025-04-28 09:45:00 +03:00
chenyu	e996584685	olmoe in mac benchmark (#10077 )	2025-04-27 21:07:02 -04:00
George Hotz	732e172961	don't require contiguous after fuse (#10074 )	2025-04-27 13:17:22 -04:00
qazal	1aed04ec12	cpu is ground truth in VALIDATE_WITH_CPU=1 [pr] (#10067 )	2025-04-28 01:14:21 +08:00
George Hotz	129bddde74	lin failure from SINGLE_KERNEL_SOFTMAX (#10073 ) * lin failure from SINGLE_KERNEL_SOFTMAX * fix lin issue * more pure diff	2025-04-27 13:02:10 -04:00
George Hotz	b341296304	hotfix: save sdxl ram	2025-04-27 12:09:45 -04:00
George Hotz	68c5f7ba80	load fast in sdxl (#10072 ) * load fast in sdxl * back to that with the ret * no context	2025-04-27 11:58:51 -04:00
George Hotz	768eb94c3e	disable debug for load_state_dict [pr] (#10070 )	2025-04-27 11:11:56 -04:00
George Hotz	4b8ef6ce78	hotfix: sdxl corealize	2025-04-27 10:41:46 -04:00
George Hotz	b6d2effaf5	assign is contiguous (#10066 ) * assign is contiguous * disable process replay for SDXL	2025-04-27 08:40:33 -04:00
George Hotz	1253819151	make beautiful indexing use a Variable (#10063 ) * make beautiful indexing use a Variable * stunning test * better color * training is broken * fix tests * fix variable indexing * fix test * no contiguous * revert that * revert that too * indexing two bind * skip for webgpu * make not slow	2025-04-27 08:22:38 -04:00
Rory Clear	a13a43c4fe	yolo 416 to 640 res (#10047 )	2025-04-26 20:45:58 -04:00
chenyu	4c1ce1a299	don't simplify if div folding resulted in negative numerator (#10064 ) * don't simplify if div folding resulted in negative numerator * test	2025-04-26 17:01:18 -04:00
George Hotz	1805403821	fix rand arange folding (#10060 ) * test rand range * --amend * fix rand arange folding * reduce_rangeless fix	2025-04-26 12:24:05 -04:00
qazal	d13c100981	don't sort dims in verify_sink_dims [pr] (#10059 ) * don't sort dims in verify_sink_dims [pr] * 1 can exist with n * put process_replay warn last * assert shape is the same * bring that back	2025-04-26 23:24:30 +08:00
George Hotz	c80fe6d5fc	handle some fancier reduces (#10057 ) * reduce_unparented * handle fancier reduces * fold more * bugfix	2025-04-26 11:20:15 -04:00
nimlgen	e08270c1ba	nv: fix program init for no-args kernels (#10058 )	2025-04-26 18:08:53 +03:00
George Hotz	11113c9d07	reduce_unparented (#10056 )	2025-04-26 09:48:16 -04:00
George Hotz	ea5dddc537	reduce collapse generic (#10045 ) * reduce collapse generic * new arange folder * new range folding * correct with sym * all tests pass * indexing ops passes * failing tests * fix tests, remove unused * revert that * torch indexing is fast * skip on webgpu * touchups * comments	2025-04-26 09:13:24 -04:00
quortus	5cdc96409e	Update outdated renderer.render calls (#10044 )	2025-04-26 07:35:19 -04:00
nimlgen	e055b9422f	am: fix mmap failures (#10054 )	2025-04-26 14:21:28 +03:00
qazal	e1d2b64e92	remu new instructions (#10050 ) * remu new instructions * test_ds_store_half * test_v_mul_f16	2025-04-26 02:04:12 +03:00
qazal	bba5d0a3e4	remu refactors (#10028 ) * remu refactors * scc is sgpr 253 * remove that * rename to vcc_lo * run cargo test in CI * llvm-mc * meh * work * work_group work 1 * seeded_lanes is dumb * better than seeded_lanes * does not need to be address * 128 sgpr per wave * scc is sgpr, we don't know which one * null_src once more * derive clone, wave init is cleaner * init comes first	2025-04-26 04:31:10 +08:00
nimlgen	0fc85a2b0a	hcqfuzz: init (#10049 ) * hcqfuzz: init * fix fuzz * linter * graph * taht test * update readme	2025-04-25 23:19:21 +03:00
qazal	b30050e287	fix amdgpu_disassemble on mac [pr] (#10042 )	2025-04-25 15:23:11 +08:00
George Hotz	a197aa4ef3	upat reduce syntax [pr] (#10040 ) * upat reduce syntax [pr] * switch z3 to graph_rewrite	2025-04-24 22:05:28 -04:00
Ignacio Sica	76a86735c0	hotfix `amd` bf16 is supported case (#10039 ) * hotfix amd and amd_llvm * bf16 not supported in ci * hotfix amd_llvm is not a device * remove default * dont gate on ci and amd_llvm * minor cleanup * skip bf16 tc test for amd_llvm	2025-04-24 21:29:27 -03:00
Ignacio Sica	b4f823acbe	fix helper_tc_allclose (#9606 ) * fix helper_tc_allclose * cleanup * hotfix * cleanup * cleanup * check real buffer and add cast for bf16 * cleanup * fix padded for ops_python * avoid assert on amd emulated tc * swap dimensions * revert, should have nothing to do with padded * revert fix, should not go in this pr * remove skip	2025-04-24 18:36:40 -03:00
Rory Clear	3a189fa561	More yolo processing in tinygrad (#9928 ) * more tg less np * update webgpu html for new compile * resize boxes * remove text * add back note * fix indentation * fix indentation * remove magic num * remove now unused funcs * back to numpy nms * no loop * fix iou suppression * update test * dont suppress other classes * add working scale * fix expected value, rounded up 0.24 was being counted * add postprocess bool for onnx test * fix indents * clean * clean * fix indent * remove print * fix indent * remove unused import * remove hardcoded 0.25 * space * spacing * clean label_predictions func * remove single item lists * space * use postprocess output in test * space * clean * clean * remove redundant threshold * remove redundant threshold * clean * rename var * move loop into func * unhardcode iou_threshold * remove unused values * clean * add note * clean * keep const * move back funcs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 16:21:46 -04:00
chenyu	74c6cf8be3	lint mlperf model_train (#10038 )	2025-04-24 16:19:44 -04:00
Ignacio Sica	51ca19d061	set `test_tensor_cores_padded_amd` to expectedFailure (#10036 ) * init * add expected failure to correctly track progres * hotfix * skip for amd_llvm as well * add skip * add pr number * move comment to amd test * change reason	2025-04-24 17:11:40 -03:00
b1tg	914d89fa0b	fix tensor cores for gfx1201 (#9838 ) * fix tensor cores for gfx1201 * fix typo * fix python wmma * AMDLLVMRenderer with arch + AMDLLVM tensor_cores * fix ci * clean up * more tensor cores for RDNA4 * fix half/half, bfloat16/float, bfloat16/bfloat16 for amd_llvm --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 14:57:41 -04:00
uuuvn	779aa1e2e9	Enable image tests on cloud if clouddev supports image (#9903 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 14:30:12 -04:00
uuuvn	29a12b19ea	Add macos CLOUD tests (#10033 ) A lot more work is required to enable all of them and move into osxtests matrix, for now i created a separate runner for them (copied from WebGPU) Will add test/test_graph.py to those tests in #9876	2025-04-24 14:14:13 -04:00
Nishant Rajadhyaksha	55942a8d8e	[Bounty] moved index_tensor off cpu in torch_backend (#9916 ) * moved index tensor off cpu in torch_backend * added support for None based indexing * fix_to_pass_tests * fix segfault tests	2025-04-24 14:12:37 -04:00
Ignacio Sica	373ca59b7f	use is_dtype_supported to check dtype support in tc tests (#10035 )	2025-04-24 14:59:14 -03:00
Ignacio Sica	93a1e9eeb9	improve `bf16` case for `is_dtype_supported` [pr] (#10034 ) * fix is_dtype_supported for bf16 * hotfix * add llvm and amd_llvm * gate on machine * separate gpu vs cpu cases * add arm case	2025-04-24 14:03:57 -03:00

... 35 36 37 38 39 ...

10417 Commits