tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
wozeparrot	9317e96881	fa: explicitly pass shapes (#14857 )	2026-02-19 05:26:16 -08:00
nimlgen	3b95fa0ed4	am_smi: enable mem usage back (#14858 )	2026-02-18 19:27:27 +03:00
wozeparrot	6d301ad2c4	feat: llama wqkv (#14841 )	2026-02-17 23:01:33 -08:00
wozeparrot	95e97ec341	seperate llama optim (#14810 )	2026-02-17 13:02:35 -08:00
qazal	f8e485ee9e	nvcc/nvdisasm macos shim (#14822 ) * move to backend * and arch * setup_nvcc_osx * blackwell * min test * now getting dumb assert is_ptx * support cubin. * work * remove that * simpler	2026-02-17 20:07:05 +09:00
qazal	f590564bf7	gemm multiple is only for cdna4 asm (#14814 ) * gemm multiple is only for cdna4 asm * move to backend * and arch * path	2026-02-17 14:00:02 +09:00
George Hotz	5bd2862d1a	late compile the cdna gemm (#14783 ) * late compile the cdna gemm * remove old things * finalize inplace --------- Co-authored-by: qazal <qazal.software@gmail.com>	2026-02-17 13:04:22 +09:00
George Hotz	f081f154ae	parameterize the CDNA asm gemm (#14813 ) * parameterize the CDNA asm gemm * fix llama test * fix * add more gemmt ests * confirm all match * test these asm gemms	2026-02-17 11:35:18 +08:00
nimlgen	131bbbbfd8	am: smu_v13_0_12 (#14800 )	2026-02-16 22:58:10 +03:00
wozeparrot	45aebe1572	hipkittens fa backward (#14723 )	2026-02-16 00:38:44 -08:00
qazal	c7a4dbf918	viz: get program binary from the UOp (#14787 ) * viz: get program binary from the UOp * remove that * less * rename View Program to View Source * two words * fix	2026-02-16 15:46:58 +09:00
George Hotz	dff9cf35c2	amd asm emulator fixes + run it in CI (#14786 ) * amd asm fix, try 2 * fix tests	2026-02-16 13:24:21 +08:00
qazal	55a4dfa2e0	cdna4 asm_gemm tests in CI on the null backend (#14785 ) * cdna4 asm_gemm tests in CI on the null backend * no .numpy() in null * better * gemm/asm: device comes from renderer	2026-02-16 14:06:23 +09:00
George Hotz	ac079e43d7	ElementwiseMixin (#14777 )	2026-02-16 08:50:47 +08:00
qazal	33b31d9cd6	tinykittens flash attention dtype fix, add CI (#14770 ) * don't hardcdoe amd device * add failing tests, ci too * fix: fix for dtype mixin * bump to rocm 7.1 --------- Co-authored-by: Woze Parrot <wozeparrot@gmail.com>	2026-02-16 01:15:11 +09:00
qazal	9bb6014900	keep existing profile trace in viz cli (#14757 )	2026-02-15 13:16:32 +09:00
nimlgen	4ab51b55bd	stream pma decoder (#14746 )	2026-02-14 17:40:18 +03:00
George Hotz	c0de4f75b1	improve mmapeak, print names with sqtt (#14726 )	2026-02-13 16:07:06 +08:00
wozeparrot	0613c0ac0c	hipkittens fa forward (#14692 )	2026-02-12 20:16:43 -08:00
George Hotz	4088d686b2	remove llvm requirement from amd (#14717 ) * remove llvm requirement from amd * tests pass * test * sink kernarg_size * move stuff * amd_asm_matmul to new style * default type * fix tests, simpler * cu mode is faster and simpler * darken	2026-02-13 10:50:12 +08:00
chenyu	557134e1c7	model/test fix that failed with WEBGPU=1 DEBUG=2 (#14706 )	2026-02-12 09:08:16 -05:00
George Hotz	4680247e35	renderer/amd: move in tree (#14702 ) * renderer/amd: move in tree * fix paths in tests * 24000 lines * no delete for amd files	2026-02-12 18:09:16 +08:00
George Hotz	d5fc3ea1ba	assembly/amd: mypy+ruff passes (#14701 ) * assembly/amd: mypy+ruff passes * touchups	2026-02-12 16:59:42 +08:00
George Hotz	025049c521	clean up sqtt / update src formatting in viz (#14696 ) * update src formatting in viz * rename to RDNA3/RDNA4 in sqtt * wrap * move sqttmap * update readme * why did that change? * cdna * that's just for test	2026-02-12 14:27:14 +08:00
George Hotz	befc1e800c	assembly/amd: disasm is test only (#14694 ) * assembly/amd: disasm is test only * viz uses str	2026-02-12 12:33:46 +08:00
George Hotz	c331798201	move tests to test/backend (#14691 ) * move tests to test/backend * fix imports * fix CI * revert that one * Fix formatting in README for test command	2026-02-12 11:09:44 +08:00
George Hotz	3fab43c57c	add cache to asm gemm (#14675 )	2026-02-11 08:26:30 +08:00
wozeparrot	69574542ab	fix: use correct fa implementation in eval (#14651 )	2026-02-09 18:20:44 -08:00
qazal	80b0119cef	llama: add new asm gemm shape (#14611 ) * llama: add new asm gemm shape * work * cleanup * half dtype * more comment	2026-02-10 00:34:29 +09:00
nimlgen	e087c58ae0	print tables in llama/profile.sh (#14639 )	2026-02-09 12:32:54 +03:00
nimlgen	01a4ee4d66	do not hive_reset when amdgpu (#14624 )	2026-02-08 19:14:13 +03:00
George Hotz	183d38b128	remove CUSTOM_KERNEL / directly construct it (#14604 ) * remove CUSTOM_KERNEL / directly construct it * clean that up * simpler multi * custom kernel spec * remove Kernel * fix multi * use sharded shape * explicit regression test	2026-02-08 18:43:33 +08:00
nimlgen	e29a88ca09	hive_reset respects lock (#14618 )	2026-02-08 10:47:25 +03:00
wozeparrot	d87ae1c84c	feat: tinyfs load test in benchmark (#14602 )	2026-02-06 18:00:00 -08:00
nimlgen	fbb67a3f95	am_smi: fix after regen (#14594 )	2026-02-06 20:57:41 +03:00
qazal	b7e3fbe07e	llama: add VIZ=-1 to dev_run (#14583 ) * llama: add VIZ=-1 to dev_run * readme * cleaner * add profile.sh script * better grouping of options * add other row * readme edits * work	2026-02-06 22:59:22 +09:00
nimlgen	fbeb978170	diff devices for sdma (#14589 ) * start * x * fix * sdma * c * clean * x * hm * cleaer	2026-02-06 16:39:12 +03:00
qazal	cf73d7e2a7	hotfix: disable slower asm gemm shape from llama seqlen 8192 (#14582 )	2026-02-06 15:05:19 +09:00
qazal	be77873974	llama: contig backward for wk / wv matmul backward (#14581 )	2026-02-06 14:54:00 +09:00
wozeparrot	f73468d516	fa: block skipping for fa kv bwd (#14569 )	2026-02-05 16:13:53 -08:00
chenyu	41a179f542	fix test_xlm_roberta_large (#14564 ) onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too	2026-02-05 14:56:06 -05:00
qazal	190042358f	llama: faster bf16 matmul / rope backward (#14558 )	2026-02-05 23:57:25 +09:00
George Hotz	b398335f62	assembly/amd: fix saturation in python remu (#14557 ) * PYTHONREMU: failing test for V_SUB_NC_U32_E64 clamp * fix saturation in PYTHON_REMU * simpler * more tests, less lines --------- Co-authored-by: Christopher Milan <chrismilan@ucla.edu>	2026-02-05 18:35:57 +08:00
wozeparrot	c1ea6687e5	fa: simpler is faster (#14548 )	2026-02-05 01:13:17 -08:00
George Hotz	43e7eda4e7	grad_b uses custom gemm (#14550 ) * grad_b uses custom gemm * fix multi backward, acc is in float32 * test_gemm_batched * square gemm --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: qazal <qazal.software@gmail.com>	2026-02-05 15:22:27 +09:00
qazal	f9cfb64cd9	test asm_gemm in CI (#14551 ) * test asm_gemm in CI * default float16 * use a smaller shape for multi * smaller size * smaller for CI * smaller for ci * need half	2026-02-05 13:32:22 +09:00
Christopher Milan	232848d086	PYTHONREMU: VOP3P integer operations with constants don't cast to fp16 (#14546 ) * PYTHONREMU: VOP3P integer operations with constants don't cast to fp16 * put that back * cleaner * do that once	2026-02-04 20:10:59 -05:00
wozeparrot	2966619834	feat: llama uses enable_gqa during training (#14545 )	2026-02-04 16:22:31 -08:00
Christopher Milan	5338ce6b74	test S_PACK in extra/assembly/amd/test/hw (#14537 ) * S_PACK_LL_B32_B16 in test/hw * add rest of S_PACK instructions	2026-02-04 14:17:16 -05:00
chenyu	9052db678f	remove allow_shape_mismatch in Tensor.replace (#14536 ) move all logic to torch_backend and not hacking Tensor method	2026-02-04 12:38:18 -05:00

1 2 3 4 5 ...

1653 Commits