tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
George Hotz	4680247e35	renderer/amd: move in tree (#14702 ) * renderer/amd: move in tree * fix paths in tests * 24000 lines * no delete for amd files	2026-02-12 18:09:16 +08:00
George Hotz	befc1e800c	assembly/amd: disasm is test only (#14694 ) * assembly/amd: disasm is test only * viz uses str	2026-02-12 12:33:46 +08:00
George Hotz	3fab43c57c	add cache to asm gemm (#14675 )	2026-02-11 08:26:30 +08:00
qazal	80b0119cef	llama: add new asm gemm shape (#14611 ) * llama: add new asm gemm shape * work * cleanup * half dtype * more comment	2026-02-10 00:34:29 +09:00
George Hotz	183d38b128	remove CUSTOM_KERNEL / directly construct it (#14604 ) * remove CUSTOM_KERNEL / directly construct it * clean that up * simpler multi * custom kernel spec * remove Kernel * fix multi * use sharded shape * explicit regression test	2026-02-08 18:43:33 +08:00
qazal	cf73d7e2a7	hotfix: disable slower asm gemm shape from llama seqlen 8192 (#14582 )	2026-02-06 15:05:19 +09:00
George Hotz	43e7eda4e7	grad_b uses custom gemm (#14550 ) * grad_b uses custom gemm * fix multi backward, acc is in float32 * test_gemm_batched * square gemm --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: qazal <qazal.software@gmail.com>	2026-02-05 15:22:27 +09:00
qazal	f9cfb64cd9	test asm_gemm in CI (#14551 ) * test asm_gemm in CI * default float16 * use a smaller shape for multi * smaller size * smaller for CI * smaller for ci * need half	2026-02-05 13:32:22 +09:00
qazal	d1bfbe9ce3	isolate slow llama gemm (#14525 )	2026-02-04 12:20:10 +09:00
qazal	a98c53769a	ASM_GEMM=1 runs the UOp gemm on non cdna (#14516 ) * ASM_GEMM=1 runs the UOp gemm on non cdna tests run on mac in 3 seconds * min diff	2026-02-03 20:42:02 +09:00
qazal	616e9c1483	CDNA assembly gemm in tensor.py with flag (#14310 ) * work * work * the assembly * remove the old one * remove ws bufs, assert splitk * notes cleanup * work * gemm args * gemm in mixins would be nice * add gemm gradient * print counters * the realize is for DEBUG=2 aesthetics * dedup * rewrite to python dsl, no list copies * leave that * add B, M, N, K to gemm name * it's M0 not NULL * fp16 support * test cleanup + more gemms * work from viz * more work * gemm batch_size * xccg path work * tiny comments on the label naming * s_waitcnt	2026-01-31 22:34:14 +09:00
qazal	dfefeddeed	add tflops to cdna gemm custom kernel (#14281 )	2026-01-22 12:48:28 +09:00
qazal	b46da603fe	codegen/custom_kernel: do not attach KernelInfo to user program (#14160 )	2026-01-15 14:01:48 +09:00
qazal	bd55507ee4	RDNA3 fp16 assembly gemm 85 TFLOPS (#13990 )	2026-01-03 18:34:23 +09:00
qazal	2cc64d71b0	simplify mi350x gemm / viz asm tests (#13984 ) * mi350x gemm cleanup * asm tests work * simpler asm tests	2026-01-03 11:11:07 +09:00
qazal	5f52266225	mi350x gemm: use Tensor.custom_kernel in asm test (#13969 ) * mi350x gemm: use Tensor.custom_kernel in asm test * A @ B for baseline	2026-01-02 18:30:50 +09:00
qazal	c0f52c9dcb	split assembly gemm to per arch directory (#13953 )	2026-01-02 00:10:22 +09:00
qazal	6a5430ab00	correct args order in mi350x gemm (#13949 )	2026-01-01 23:01:46 +09:00
qazal	b23f4517ab	prep mi350x gemm for python dsl (#13918 ) * start by pruning existing asm * better branch names * split to template and real instructions	2025-12-31 20:00:57 +09:00
qazal	b557c46233	assembly gemm clean ups, instructions for cli (#13892 )	2025-12-30 16:14:06 +09:00
qazal	f541540129	variable N for asm gemm (#13869 ) * variable N for asm gemm * cleanup spacing	2025-12-29 19:35:50 +09:00
qazal	fc5278746f	mi350x assembly gemm cleanups (#13867 )	2025-12-29 18:47:23 +09:00
qazal	066d96c397	print tflops in asm gemm test (#13859 ) * print tflops in asm gemm test * change order	2025-12-29 02:26:40 +09:00
qazal	2cfbabdc34	mi350x 1tflop bf16 gemm in extra (#13702 )	2025-12-28 21:45:42 +09:00

24 Commits