tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
qazal	a80fb4e641	viz: better ordering of device engines in profiler (#14590 )	2026-02-06 23:08:09 +09:00
qazal	b7e3fbe07e	llama: add VIZ=-1 to dev_run (#14583 ) * llama: add VIZ=-1 to dev_run * readme * cleaner * add profile.sh script * better grouping of options * add other row * readme edits * work	2026-02-06 22:59:22 +09:00
nimlgen	fbeb978170	diff devices for sdma (#14589 ) * start * x * fix * sdma * c * clean * x * hm * cleaer	2026-02-06 16:39:12 +03:00
George Hotz	7cb996e153	bottom up earliest rewrites (#14587 ) * better * bottom up earliest rewrites * fix	2026-02-06 18:13:07 +08:00
George Hotz	03af2404e2	small changes and test fixes from kernel is call (#14586 )	2026-02-06 17:08:33 +08:00
George Hotz	3c26ce29b2	make disk tensor tests process safe (#14584 )	2026-02-06 15:39:55 +08:00
qazal	cf73d7e2a7	hotfix: disable slower asm gemm shape from llama seqlen 8192 (#14582 )	2026-02-06 15:05:19 +09:00
qazal	be77873974	llama: contig backward for wk / wv matmul backward (#14581 )	2026-02-06 14:54:00 +09:00
chenyu	15d3344d9e	use int inputs in test_assign (#14580 ) int is less flaky	2026-02-06 00:07:31 -05:00
qazal	50a166a5fa	viz: cleanup amdgpu target mapping (#14579 ) * viz: cleanup amdgpu target mapping * linter * unwraps	2026-02-06 13:51:51 +09:00
chenyu	b09dc646f5	revert some late_buffer_view change (#14578 ) revert #14478 which breaks tinyfs	2026-02-05 22:51:40 -05:00
chenyu	d41836f135	remove KERNEL special case in realize_assign [pr] (#14573 )	2026-02-05 21:55:44 -05:00
George Hotz	6cbcf98627	KernelInfo is required on get_program (#14571 ) * rangeify always adds KernelInfo * fix tests * skip flaky test	2026-02-06 10:49:27 +08:00
George Hotz	28c56a783c	add CallInfo and viz call toggle (#14570 )	2026-02-06 09:30:58 +08:00
wozeparrot	f73468d516	fa: block skipping for fa kv bwd (#14569 )	2026-02-05 16:13:53 -08:00
chenyu	b7ef775677	more cleanup in create_schedule [pr] (#14566 ) fixed wrong comments and simplified queue building	2026-02-05 16:12:17 -05:00
Garret Castro	cee7ef7ab2	disable threads (#14555 )	2026-02-05 16:11:32 -05:00
chenyu	79b7799dba	clean up linearize schedule [pr] (#14565 ) * clean up linearize schedule [pr] don't mix ScheduleItem and UOp in schedule queue * ok	2026-02-05 15:24:09 -05:00
chenyu	41a179f542	fix test_xlm_roberta_large (#14564 ) onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too	2026-02-05 14:56:06 -05:00
Christopher Milan	aa9dc50577	dtype decomps don't require bitshifts (#14542 ) * dtype decomps don't require bitshifts * simplify shr/shl * ruff	2026-02-05 14:42:30 -05:00
Christopher Milan	b47397ab17	list ml_dtypes as dependency for DSP (#14562 ) * pin onnxruntime to 1.23.2 for DSP * list ml_dtypes instead This reverts commit `84bb2cc0fc`.	2026-02-05 14:27:50 -05:00
chenyu	2b47a9a1b5	skip test_xlm_roberta_large (#14563 ) symlink model not allowed in latest onnxruntime	2026-02-05 14:00:24 -05:00
chenyu	42c18da88a	add Ops asserts in toposort sched_sink [pr] (#14561 ) more explicit	2026-02-05 12:40:02 -05:00
nimlgen	483bba4f05	nv: use prof_exec_counter (#14559 )	2026-02-05 19:00:14 +03:00
qazal	190042358f	llama: faster bf16 matmul / rope backward (#14558 )	2026-02-05 23:57:25 +09:00
George Hotz	b398335f62	assembly/amd: fix saturation in python remu (#14557 ) * PYTHONREMU: failing test for V_SUB_NC_U32_E64 clamp * fix saturation in PYTHON_REMU * simpler * more tests, less lines --------- Co-authored-by: Christopher Milan <chrismilan@ucla.edu>	2026-02-05 18:35:57 +08:00
wozeparrot	c1ea6687e5	fa: simpler is faster (#14548 )	2026-02-05 01:13:17 -08:00
George Hotz	43e7eda4e7	grad_b uses custom gemm (#14550 ) * grad_b uses custom gemm * fix multi backward, acc is in float32 * test_gemm_batched * square gemm --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: qazal <qazal.software@gmail.com>	2026-02-05 15:22:27 +09:00
qazal	f9cfb64cd9	test asm_gemm in CI (#14551 ) * test asm_gemm in CI * default float16 * use a smaller shape for multi * smaller size * smaller for CI * smaller for ci * need half	2026-02-05 13:32:22 +09:00
chenyu	c0ca7f9c51	use more UOp.sum and UOp.prod [pr] (#14549 )	2026-02-04 22:05:20 -05:00
chenyu	e8dace41b6	clean up UOp.vars [pr] (#14547 )	2026-02-04 20:52:25 -05:00
Christopher Milan	232848d086	PYTHONREMU: VOP3P integer operations with constants don't cast to fp16 (#14546 ) * PYTHONREMU: VOP3P integer operations with constants don't cast to fp16 * put that back * cleaner * do that once	2026-02-04 20:10:59 -05:00
wozeparrot	2966619834	feat: llama uses enable_gqa during training (#14545 )	2026-02-04 16:22:31 -08:00
chenyu	664f1bf76d	minor ops/jit cleanups [pr] (#14543 )	2026-02-04 17:21:34 -05:00
chenyu	03d0fa9c3f	merge as_buf into buf_uop [pr] (#14541 )	2026-02-04 16:32:23 -05:00
chenyu	43ef24a8af	remove buf_target [pr] (#14540 ) not really needed	2026-02-04 15:03:47 -05:00
chenyu	8b7343b950	clean up is_realized [pr] (#14538 ) base cannot be Ops.MULTI since MULTI is a view now	2026-02-04 14:24:10 -05:00
Christopher Milan	5338ce6b74	test S_PACK in extra/assembly/amd/test/hw (#14537 ) * S_PACK_LL_B32_B16 in test/hw * add rest of S_PACK instructions	2026-02-04 14:17:16 -05:00
chenyu	9052db678f	remove allow_shape_mismatch in Tensor.replace (#14536 ) move all logic to torch_backend and not hacking Tensor method	2026-02-04 12:38:18 -05:00
nimlgen	ec2b6bbda8	hcq: update signal logic (#14531 )	2026-02-04 19:32:56 +03:00
nimlgen	62786d488a	am: mi3xx perf (#14529 )	2026-02-04 19:32:43 +03:00
chenyu	d57d24c7d4	Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535 ) it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data	2026-02-04 11:31:11 -05:00
chenyu	024f57ecf5	jit input_buffers cleanup [pr] (#14532 )	2026-02-04 10:14:38 -05:00
chenyu	67f91e897b	UOp.is_contiguous -> UOp.has_buffer_identity [pr] (#14530 ) one more confusing buffer related method, but it's definitely not is_contiguous	2026-02-04 09:21:26 -05:00
George Hotz	fb9df1e031	pretty print binary (#14520 )	2026-02-04 18:04:35 +08:00
Christopher Milan	8c3c026d86	decomp float16 to float32 (#14417 ) * decomp float16 to float32 * denormals arent zero * add test * denormals are zero * fix * oops * bitcast works * fix LOADs * test_dtype passing * cleanup * mypy * debug print * only emulate if EMULATED * very ugly, but passes spec * add test_dtype_alu tests * Revert "very ugly, but passes spec" This reverts commit `fdc3999b65`. * bottom up decompositions * that should have symbolic * simplify a bit * SPEC really works * run with DEBUG * debug=4 * rm debug	2026-02-04 01:37:47 -05:00
Christopher Milan	ecbce5269e	PYTHONREMU properly supports S_PACK_LL_B32_B16 (#14527 ) * PYTHONREMU properly supports S_PACK_LL_B32_B16 * default	2026-02-03 23:45:33 -05:00
wozeparrot	720c9597a9	feat: llama uses is_causal on sdpa during training (#14528 )	2026-02-03 20:24:30 -08:00
chenyu	9c2fc118ef	relax setitem target check (#14526 ) old check was too conservative	2026-02-03 22:32:49 -05:00
qazal	d1bfbe9ce3	isolate slow llama gemm (#14525 )	2026-02-04 12:20:10 +09:00

1 2 3 4 5 ...

12073 Commits