tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
qazal	b10802eb53	use existing VIZ ContextVar instead of getenv (#14610 )	2026-02-08 15:37:55 +09:00
chenyu	510b65489e	style change rangeify assign [pr] (#14616 ) consistent naming, also a standalone fucntion to replace complicated lambda	2026-02-07 15:47:32 -05:00
chenyu	b7afd4471c	use arg instead of 3rd op for ASSIGN [pr] (#14613 )	2026-02-07 14:17:10 -05:00
nimlgen	88c3022223	amd: kfd iface early exit (#14612 ) * amd: kfd iface early exit * l * revert	2026-02-07 18:57:10 +03:00
nimlgen	ce7bfc6ce8	nv: use nv_flags for all fields (#14607 )	2026-02-07 15:01:38 +03:00
qazal	c2544e2252	viz: remove outdated comment (#14608 )	2026-02-07 20:05:24 +09:00
nimlgen	6838b35cff	mockgpu: hevc (#14606 ) * mockgpu: hevc * eng	2026-02-07 12:27:55 +03:00
chenyu	884592f6c8	pin z3-solver version (#14605 ) found exact input that crashes z3 4.15.4	2026-02-06 22:49:31 -05:00
George Hotz	7a2a3b5c71	Remove Ops.KERNEL, it's all Ops.CALL now (#14603 )	2026-02-07 10:21:54 +08:00
George Hotz	ca6604eae2	kernel is call (#14577 ) * call is kernel * closer * fix bugs * dedup * pm_gate_kernel_sink * better * Revert "better" This reverts commit `b4c799b810`. * Reapply "better" This reverts commit `e53f094ce7`. * cleanups * work * remove junk * subtle fix * index * viz cleanups * disable assert for now	2026-02-07 10:10:14 +08:00
wozeparrot	d87ae1c84c	feat: tinyfs load test in benchmark (#14602 )	2026-02-06 18:00:00 -08:00
ttomsa	462b455562	cleanup linearize (#14523 )	2026-02-07 08:54:02 +08:00
ttomsa	d5652e4da2	new dtype aliases (#14596 )	2026-02-07 08:53:35 +08:00
Christopher Milan	ad9e2f0de7	decompose bf16 (#14601 )	2026-02-06 19:24:09 -05:00
Christopher Milan	7bb45e7df0	decompose fp8 to bigger floats [skip_process_replay] (#14554 ) * decompose fp8 also * it works * cleanup * no shift required * default to float * cleanup * fixes * fp8e5m2 * don't rely on behavior comparing nans * cleanup	2026-02-06 19:05:40 -05:00
chenyu	81f6cdb4ab	delete realize_assign [pr] (#14575 ) use realize and realize_srcs like COPY and STORE. src[0] always has BUFFER for base	2026-02-06 17:12:33 -05:00
chenyu	7d193a6e26	fix wgsl bitcast (#14600 ) was wrong for signed int	2026-02-06 16:57:36 -05:00
chenyu	b9fe8b7591	fix opt in process replay [pr] (#14599 )	2026-02-06 16:49:56 -05:00
chenyu	197ebcbbbc	log seed with flush=True in fuzz_symbolic (#14597 ) * log seed with flush=True in fuzz_symbolic i think z3 can crash. added reading seed from argv to see if we repro later * fuzz_symbolic_symbolic_div	2026-02-06 15:03:57 -05:00
nimlgen	fbb67a3f95	am_smi: fix after regen (#14594 )	2026-02-06 20:57:41 +03:00
qazal	a80fb4e641	viz: better ordering of device engines in profiler (#14590 )	2026-02-06 23:08:09 +09:00
qazal	b7e3fbe07e	llama: add VIZ=-1 to dev_run (#14583 ) * llama: add VIZ=-1 to dev_run * readme * cleaner * add profile.sh script * better grouping of options * add other row * readme edits * work	2026-02-06 22:59:22 +09:00
nimlgen	fbeb978170	diff devices for sdma (#14589 ) * start * x * fix * sdma * c * clean * x * hm * cleaer	2026-02-06 16:39:12 +03:00
George Hotz	7cb996e153	bottom up earliest rewrites (#14587 ) * better * bottom up earliest rewrites * fix	2026-02-06 18:13:07 +08:00
George Hotz	03af2404e2	small changes and test fixes from kernel is call (#14586 )	2026-02-06 17:08:33 +08:00
George Hotz	3c26ce29b2	make disk tensor tests process safe (#14584 )	2026-02-06 15:39:55 +08:00
qazal	cf73d7e2a7	hotfix: disable slower asm gemm shape from llama seqlen 8192 (#14582 )	2026-02-06 15:05:19 +09:00
qazal	be77873974	llama: contig backward for wk / wv matmul backward (#14581 )	2026-02-06 14:54:00 +09:00
chenyu	15d3344d9e	use int inputs in test_assign (#14580 ) int is less flaky	2026-02-06 00:07:31 -05:00
qazal	50a166a5fa	viz: cleanup amdgpu target mapping (#14579 ) * viz: cleanup amdgpu target mapping * linter * unwraps	2026-02-06 13:51:51 +09:00
chenyu	b09dc646f5	revert some late_buffer_view change (#14578 ) revert #14478 which breaks tinyfs	2026-02-05 22:51:40 -05:00
chenyu	d41836f135	remove KERNEL special case in realize_assign [pr] (#14573 )	2026-02-05 21:55:44 -05:00
George Hotz	6cbcf98627	KernelInfo is required on get_program (#14571 ) * rangeify always adds KernelInfo * fix tests * skip flaky test	2026-02-06 10:49:27 +08:00
George Hotz	28c56a783c	add CallInfo and viz call toggle (#14570 )	2026-02-06 09:30:58 +08:00
wozeparrot	f73468d516	fa: block skipping for fa kv bwd (#14569 )	2026-02-05 16:13:53 -08:00
chenyu	b7ef775677	more cleanup in create_schedule [pr] (#14566 ) fixed wrong comments and simplified queue building	2026-02-05 16:12:17 -05:00
Garret Castro	cee7ef7ab2	disable threads (#14555 )	2026-02-05 16:11:32 -05:00
chenyu	79b7799dba	clean up linearize schedule [pr] (#14565 ) * clean up linearize schedule [pr] don't mix ScheduleItem and UOp in schedule queue * ok	2026-02-05 15:24:09 -05:00
chenyu	41a179f542	fix test_xlm_roberta_large (#14564 ) onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too	2026-02-05 14:56:06 -05:00
Christopher Milan	aa9dc50577	dtype decomps don't require bitshifts (#14542 ) * dtype decomps don't require bitshifts * simplify shr/shl * ruff	2026-02-05 14:42:30 -05:00
Christopher Milan	b47397ab17	list ml_dtypes as dependency for DSP (#14562 ) * pin onnxruntime to 1.23.2 for DSP * list ml_dtypes instead This reverts commit `84bb2cc0fc`.	2026-02-05 14:27:50 -05:00
chenyu	2b47a9a1b5	skip test_xlm_roberta_large (#14563 ) symlink model not allowed in latest onnxruntime	2026-02-05 14:00:24 -05:00
chenyu	42c18da88a	add Ops asserts in toposort sched_sink [pr] (#14561 ) more explicit	2026-02-05 12:40:02 -05:00
nimlgen	483bba4f05	nv: use prof_exec_counter (#14559 )	2026-02-05 19:00:14 +03:00
qazal	190042358f	llama: faster bf16 matmul / rope backward (#14558 )	2026-02-05 23:57:25 +09:00
George Hotz	b398335f62	assembly/amd: fix saturation in python remu (#14557 ) * PYTHONREMU: failing test for V_SUB_NC_U32_E64 clamp * fix saturation in PYTHON_REMU * simpler * more tests, less lines --------- Co-authored-by: Christopher Milan <chrismilan@ucla.edu>	2026-02-05 18:35:57 +08:00
wozeparrot	c1ea6687e5	fa: simpler is faster (#14548 )	2026-02-05 01:13:17 -08:00
George Hotz	43e7eda4e7	grad_b uses custom gemm (#14550 ) * grad_b uses custom gemm * fix multi backward, acc is in float32 * test_gemm_batched * square gemm --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: qazal <qazal.software@gmail.com>	2026-02-05 15:22:27 +09:00
qazal	f9cfb64cd9	test asm_gemm in CI (#14551 ) * test asm_gemm in CI * default float16 * use a smaller shape for multi * smaller size * smaller for CI * smaller for ci * need half	2026-02-05 13:32:22 +09:00
chenyu	c0ca7f9c51	use more UOp.sum and UOp.prod [pr] (#14549 )	2026-02-04 22:05:20 -05:00

1 2 3 4 5 ...

12093 Commits