tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
chenyu	557134e1c7	model/test fix that failed with WEBGPU=1 DEBUG=2 (#14706 )	2026-02-12 09:08:16 -05:00
George Hotz	4680247e35	renderer/amd: move in tree (#14702 ) * renderer/amd: move in tree * fix paths in tests * 24000 lines * no delete for amd files	2026-02-12 18:09:16 +08:00
George Hotz	d5fc3ea1ba	assembly/amd: mypy+ruff passes (#14701 ) * assembly/amd: mypy+ruff passes * touchups	2026-02-12 16:59:42 +08:00
George Hotz	025049c521	clean up sqtt / update src formatting in viz (#14696 ) * update src formatting in viz * rename to RDNA3/RDNA4 in sqtt * wrap * move sqttmap * update readme * why did that change? * cdna * that's just for test	2026-02-12 14:27:14 +08:00
George Hotz	befc1e800c	assembly/amd: disasm is test only (#14694 ) * assembly/amd: disasm is test only * viz uses str	2026-02-12 12:33:46 +08:00
George Hotz	c331798201	move tests to test/backend (#14691 ) * move tests to test/backend * fix imports * fix CI * revert that one * Fix formatting in README for test command	2026-02-12 11:09:44 +08:00
George Hotz	3fab43c57c	add cache to asm gemm (#14675 )	2026-02-11 08:26:30 +08:00
wozeparrot	69574542ab	fix: use correct fa implementation in eval (#14651 )	2026-02-09 18:20:44 -08:00
qazal	80b0119cef	llama: add new asm gemm shape (#14611 ) * llama: add new asm gemm shape * work * cleanup * half dtype * more comment	2026-02-10 00:34:29 +09:00
nimlgen	e087c58ae0	print tables in llama/profile.sh (#14639 )	2026-02-09 12:32:54 +03:00
nimlgen	01a4ee4d66	do not hive_reset when amdgpu (#14624 )	2026-02-08 19:14:13 +03:00
George Hotz	183d38b128	remove CUSTOM_KERNEL / directly construct it (#14604 ) * remove CUSTOM_KERNEL / directly construct it * clean that up * simpler multi * custom kernel spec * remove Kernel * fix multi * use sharded shape * explicit regression test	2026-02-08 18:43:33 +08:00
nimlgen	e29a88ca09	hive_reset respects lock (#14618 )	2026-02-08 10:47:25 +03:00
wozeparrot	d87ae1c84c	feat: tinyfs load test in benchmark (#14602 )	2026-02-06 18:00:00 -08:00
nimlgen	fbb67a3f95	am_smi: fix after regen (#14594 )	2026-02-06 20:57:41 +03:00
qazal	b7e3fbe07e	llama: add VIZ=-1 to dev_run (#14583 ) * llama: add VIZ=-1 to dev_run * readme * cleaner * add profile.sh script * better grouping of options * add other row * readme edits * work	2026-02-06 22:59:22 +09:00
nimlgen	fbeb978170	diff devices for sdma (#14589 ) * start * x * fix * sdma * c * clean * x * hm * cleaer	2026-02-06 16:39:12 +03:00
qazal	cf73d7e2a7	hotfix: disable slower asm gemm shape from llama seqlen 8192 (#14582 )	2026-02-06 15:05:19 +09:00
qazal	be77873974	llama: contig backward for wk / wv matmul backward (#14581 )	2026-02-06 14:54:00 +09:00
wozeparrot	f73468d516	fa: block skipping for fa kv bwd (#14569 )	2026-02-05 16:13:53 -08:00
chenyu	41a179f542	fix test_xlm_roberta_large (#14564 ) onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too	2026-02-05 14:56:06 -05:00
qazal	190042358f	llama: faster bf16 matmul / rope backward (#14558 )	2026-02-05 23:57:25 +09:00
George Hotz	b398335f62	assembly/amd: fix saturation in python remu (#14557 ) * PYTHONREMU: failing test for V_SUB_NC_U32_E64 clamp * fix saturation in PYTHON_REMU * simpler * more tests, less lines --------- Co-authored-by: Christopher Milan <chrismilan@ucla.edu>	2026-02-05 18:35:57 +08:00
wozeparrot	c1ea6687e5	fa: simpler is faster (#14548 )	2026-02-05 01:13:17 -08:00
George Hotz	43e7eda4e7	grad_b uses custom gemm (#14550 ) * grad_b uses custom gemm * fix multi backward, acc is in float32 * test_gemm_batched * square gemm --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: qazal <qazal.software@gmail.com>	2026-02-05 15:22:27 +09:00
qazal	f9cfb64cd9	test asm_gemm in CI (#14551 ) * test asm_gemm in CI * default float16 * use a smaller shape for multi * smaller size * smaller for CI * smaller for ci * need half	2026-02-05 13:32:22 +09:00
Christopher Milan	232848d086	PYTHONREMU: VOP3P integer operations with constants don't cast to fp16 (#14546 ) * PYTHONREMU: VOP3P integer operations with constants don't cast to fp16 * put that back * cleaner * do that once	2026-02-04 20:10:59 -05:00
wozeparrot	2966619834	feat: llama uses enable_gqa during training (#14545 )	2026-02-04 16:22:31 -08:00
Christopher Milan	5338ce6b74	test S_PACK in extra/assembly/amd/test/hw (#14537 ) * S_PACK_LL_B32_B16 in test/hw * add rest of S_PACK instructions	2026-02-04 14:17:16 -05:00
chenyu	9052db678f	remove allow_shape_mismatch in Tensor.replace (#14536 ) move all logic to torch_backend and not hacking Tensor method	2026-02-04 12:38:18 -05:00
nimlgen	62786d488a	am: mi3xx perf (#14529 )	2026-02-04 19:32:43 +03:00
chenyu	d57d24c7d4	Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535 ) it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data	2026-02-04 11:31:11 -05:00
chenyu	67f91e897b	UOp.is_contiguous -> UOp.has_buffer_identity [pr] (#14530 ) one more confusing buffer related method, but it's definitely not is_contiguous	2026-02-04 09:21:26 -05:00
Christopher Milan	ecbce5269e	PYTHONREMU properly supports S_PACK_LL_B32_B16 (#14527 ) * PYTHONREMU properly supports S_PACK_LL_B32_B16 * default	2026-02-03 23:45:33 -05:00
wozeparrot	720c9597a9	feat: llama uses is_causal on sdpa during training (#14528 )	2026-02-03 20:24:30 -08:00
qazal	d1bfbe9ce3	isolate slow llama gemm (#14525 )	2026-02-04 12:20:10 +09:00
George Hotz	d59e6e7a37	move more tests to test/null, split some existing ones (#14512 ) * move more tests to test/null, split some existing ones * null work * null work * move more * fixes * move PIL * PIL in CLIP * don't move that	2026-02-03 20:20:20 +08:00
qazal	a98c53769a	ASM_GEMM=1 runs the UOp gemm on non cdna (#14516 ) * ASM_GEMM=1 runs the UOp gemm on non cdna tests run on mac in 3 seconds * min diff	2026-02-03 20:42:02 +09:00
qazal	5c1d21349e	viz: profiler command line tool (#14515 )	2026-02-03 19:51:25 +09:00
George Hotz	dd2de4f838	rename all DEFINE_GLOBAL to PARAM (#14511 )	2026-02-03 15:09:38 +08:00
wozeparrot	bbcd3d67a3	fa: faster (#14453 )	2026-02-02 21:34:17 -08:00
chenyu	66d2b02f11	delete files that depends on extra.optimization.helpers (#14499 )	2026-02-02 13:33:33 -05:00
George Hotz	6e958dbfd4	assembly/amd: add RDNA4 support to emulator (#14341 ) * start new rdna4 * work * plus works * more pass * rdna4 * assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp * stale * rev * rr * rdna4 emu tests * cleanup * cleanup * simp * works * better factorizaion * hacks * fix mockgpu * guard both * cleaner * gate * bug fix and a few tests * all test_tiny	2026-02-02 21:35:59 +08:00
qazal	965940dd00	sqtt: update examples after event field change (#14493 ) * regen sqtt examples * cdna * rdna4 * packet counts for rdna3 * sqttmap work	2026-02-02 21:39:48 +09:00
George Hotz	965149a46d	assembly/amd: add ds perm instructions (#14486 ) * assembly/amd: add ds perm instructions * NO SKIP * fix preexisting RDNA3 issues * pcode * assert * asserts * unify * simp * good fix	2026-02-02 16:02:00 +08:00
Robbe Derks	d75a1b0d5a	usbgpu: use BOT interface for `patch.py` (#13644 ) * BOT usage * cleanup * fix lint * fix ruff * fix -7?	2026-02-02 11:54:46 +08:00
qazal	616e9c1483	CDNA assembly gemm in tensor.py with flag (#14310 ) * work * work * the assembly * remove the old one * remove ws bufs, assert splitk * notes cleanup * work * gemm args * gemm in mixins would be nice * add gemm gradient * print counters * the realize is for DEBUG=2 aesthetics * dedup * rewrite to python dsl, no list copies * leave that * add B, M, N, K to gemm name * it's M0 not NULL * fp16 support * test cleanup + more gemms * work from viz * more work * gemm batch_size * xccg path work * tiny comments on the label naming * s_waitcnt	2026-01-31 22:34:14 +09:00
qazal	d69bc5aa1a	make DEV=NULL EMULATE=AMD amd_asm_matmul run (#14460 )	2026-01-31 20:45:24 +09:00
George Hotz	b705c9143c	assembly/amd: test more instructions (#14365 ) * assembly/amd: test more instructions * more * passing * revert * no const fold * remove junk * cleaner	2026-01-31 12:40:22 +08:00
Christopher Milan	e575dd8275	prevent UB in long decomp and more emulated tests (#14447 )	2026-01-30 19:38:41 -05:00

1 2 3 4 5 ...

1633 Commits