tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
qazal	b8a55d5f68	sqtt: new packet types, add discovery script (#14960 )	2026-02-28 04:27:27 +09:00
qazal	448e997be4	gemm/asm: cleanup custom function args (#15007 )	2026-02-25 22:05:56 +09:00
wozeparrot	8d9545e09e	llama3: correctly shard wqkv (#14978 )	2026-02-23 23:57:10 -08:00
wozeparrot	25565b2410	fa: test for mp (#14907 )	2026-02-22 21:47:36 -08:00
qazal	d6145736c7	sqtt: examples generator changes from inst_discovery (#14961 ) * sqtt examples generator changes from inst_discovery * rdna4 * rdna3 * cdna * sad reality for mi300x	2026-02-23 14:42:48 +09:00
George Hotz	8ef5544e4a	realized PYTHON copies (#14934 ) * realized PYTHON copies * comment that out * fix that test * append afters * contig * disk copies * should be 124 * 332	2026-02-21 20:29:31 +08:00
George Hotz	55d3a5def9	preallocate all realized buffers (#14823 ) * preallocate all realized buffers * contiguous * work * comment that out * move to schedule * better * correct fix * just buffer * disk bufs * fixes disk tensor stuff * fix symbolic stuff * fix multi * 162 failures * bugfixes * don't check that anymore * fix schedule tests * mnist should be contiguious * type and buffer * fix tests * shrink axis correction * mypy fixes * tests skips * same 37 failures * dedup * no shrink in the graph * 29 failures * skips * fix custom kernel * fix training * those optimizations aren't supported currently * simpler * more correct * tests * 14 failures * works * fix that test * broken * 11 failures * only kernel counts left * fixes * all tests pass * remove tensor_map * op test * 200 -> 230 * test fixes * fixes * revert test_tiny thing * guard * revert that * test tiny passes * no contigs there * base realize back * Revert "no contigs there" This reverts commit `c45bb9fcfd`. * revert that * chop many assigns * 12 failures * fix tests * tests * apply after * pre-commit * remove old code * delete that * fix types * remove extra contig * fix dataloader * torch fix * disk fix * update kernel fusion numbres * runs on amd * restore kernel count * add that rule back * that * disable that * wrong * add the correct rule for that folding * more tests * guard c1.arg * no newlines * realize those * split into a different file * remove detach/contig back * skip 2 * update that	2026-02-20 20:05:54 +08:00
qazal	32f569b573	viz/sqtt: decoder fixes pre rdna4/cdna4 work (#14900 ) * viz/sqtt: decoder fixes pre rdna4/cdna4 work * fix * branch_inst + more tests * smaller	2026-02-20 12:10:15 +09:00
wozeparrot	9317e96881	fa: explicitly pass shapes (#14857 )	2026-02-19 05:26:16 -08:00
nimlgen	3b95fa0ed4	am_smi: enable mem usage back (#14858 )	2026-02-18 19:27:27 +03:00
wozeparrot	6d301ad2c4	feat: llama wqkv (#14841 )	2026-02-17 23:01:33 -08:00
wozeparrot	95e97ec341	seperate llama optim (#14810 )	2026-02-17 13:02:35 -08:00
qazal	f8e485ee9e	nvcc/nvdisasm macos shim (#14822 ) * move to backend * and arch * setup_nvcc_osx * blackwell * min test * now getting dumb assert is_ptx * support cubin. * work * remove that * simpler	2026-02-17 20:07:05 +09:00
qazal	f590564bf7	gemm multiple is only for cdna4 asm (#14814 ) * gemm multiple is only for cdna4 asm * move to backend * and arch * path	2026-02-17 14:00:02 +09:00
George Hotz	5bd2862d1a	late compile the cdna gemm (#14783 ) * late compile the cdna gemm * remove old things * finalize inplace --------- Co-authored-by: qazal <qazal.software@gmail.com>	2026-02-17 13:04:22 +09:00
George Hotz	f081f154ae	parameterize the CDNA asm gemm (#14813 ) * parameterize the CDNA asm gemm * fix llama test * fix * add more gemmt ests * confirm all match * test these asm gemms	2026-02-17 11:35:18 +08:00
nimlgen	131bbbbfd8	am: smu_v13_0_12 (#14800 )	2026-02-16 22:58:10 +03:00
wozeparrot	45aebe1572	hipkittens fa backward (#14723 )	2026-02-16 00:38:44 -08:00
qazal	c7a4dbf918	viz: get program binary from the UOp (#14787 ) * viz: get program binary from the UOp * remove that * less * rename View Program to View Source * two words * fix	2026-02-16 15:46:58 +09:00
George Hotz	dff9cf35c2	amd asm emulator fixes + run it in CI (#14786 ) * amd asm fix, try 2 * fix tests	2026-02-16 13:24:21 +08:00
qazal	55a4dfa2e0	cdna4 asm_gemm tests in CI on the null backend (#14785 ) * cdna4 asm_gemm tests in CI on the null backend * no .numpy() in null * better * gemm/asm: device comes from renderer	2026-02-16 14:06:23 +09:00
George Hotz	ac079e43d7	ElementwiseMixin (#14777 )	2026-02-16 08:50:47 +08:00
qazal	33b31d9cd6	tinykittens flash attention dtype fix, add CI (#14770 ) * don't hardcdoe amd device * add failing tests, ci too * fix: fix for dtype mixin * bump to rocm 7.1 --------- Co-authored-by: Woze Parrot <wozeparrot@gmail.com>	2026-02-16 01:15:11 +09:00
qazal	9bb6014900	keep existing profile trace in viz cli (#14757 )	2026-02-15 13:16:32 +09:00
nimlgen	4ab51b55bd	stream pma decoder (#14746 )	2026-02-14 17:40:18 +03:00
George Hotz	c0de4f75b1	improve mmapeak, print names with sqtt (#14726 )	2026-02-13 16:07:06 +08:00
wozeparrot	0613c0ac0c	hipkittens fa forward (#14692 )	2026-02-12 20:16:43 -08:00
George Hotz	4088d686b2	remove llvm requirement from amd (#14717 ) * remove llvm requirement from amd * tests pass * test * sink kernarg_size * move stuff * amd_asm_matmul to new style * default type * fix tests, simpler * cu mode is faster and simpler * darken	2026-02-13 10:50:12 +08:00
chenyu	557134e1c7	model/test fix that failed with WEBGPU=1 DEBUG=2 (#14706 )	2026-02-12 09:08:16 -05:00
George Hotz	4680247e35	renderer/amd: move in tree (#14702 ) * renderer/amd: move in tree * fix paths in tests * 24000 lines * no delete for amd files	2026-02-12 18:09:16 +08:00
George Hotz	d5fc3ea1ba	assembly/amd: mypy+ruff passes (#14701 ) * assembly/amd: mypy+ruff passes * touchups	2026-02-12 16:59:42 +08:00
George Hotz	025049c521	clean up sqtt / update src formatting in viz (#14696 ) * update src formatting in viz * rename to RDNA3/RDNA4 in sqtt * wrap * move sqttmap * update readme * why did that change? * cdna * that's just for test	2026-02-12 14:27:14 +08:00
George Hotz	befc1e800c	assembly/amd: disasm is test only (#14694 ) * assembly/amd: disasm is test only * viz uses str	2026-02-12 12:33:46 +08:00
George Hotz	c331798201	move tests to test/backend (#14691 ) * move tests to test/backend * fix imports * fix CI * revert that one * Fix formatting in README for test command	2026-02-12 11:09:44 +08:00
George Hotz	3fab43c57c	add cache to asm gemm (#14675 )	2026-02-11 08:26:30 +08:00
wozeparrot	69574542ab	fix: use correct fa implementation in eval (#14651 )	2026-02-09 18:20:44 -08:00
qazal	80b0119cef	llama: add new asm gemm shape (#14611 ) * llama: add new asm gemm shape * work * cleanup * half dtype * more comment	2026-02-10 00:34:29 +09:00
nimlgen	e087c58ae0	print tables in llama/profile.sh (#14639 )	2026-02-09 12:32:54 +03:00
nimlgen	01a4ee4d66	do not hive_reset when amdgpu (#14624 )	2026-02-08 19:14:13 +03:00
George Hotz	183d38b128	remove CUSTOM_KERNEL / directly construct it (#14604 ) * remove CUSTOM_KERNEL / directly construct it * clean that up * simpler multi * custom kernel spec * remove Kernel * fix multi * use sharded shape * explicit regression test	2026-02-08 18:43:33 +08:00
nimlgen	e29a88ca09	hive_reset respects lock (#14618 )	2026-02-08 10:47:25 +03:00
wozeparrot	d87ae1c84c	feat: tinyfs load test in benchmark (#14602 )	2026-02-06 18:00:00 -08:00
nimlgen	fbb67a3f95	am_smi: fix after regen (#14594 )	2026-02-06 20:57:41 +03:00
qazal	b7e3fbe07e	llama: add VIZ=-1 to dev_run (#14583 ) * llama: add VIZ=-1 to dev_run * readme * cleaner * add profile.sh script * better grouping of options * add other row * readme edits * work	2026-02-06 22:59:22 +09:00
nimlgen	fbeb978170	diff devices for sdma (#14589 ) * start * x * fix * sdma * c * clean * x * hm * cleaer	2026-02-06 16:39:12 +03:00
qazal	cf73d7e2a7	hotfix: disable slower asm gemm shape from llama seqlen 8192 (#14582 )	2026-02-06 15:05:19 +09:00
qazal	be77873974	llama: contig backward for wk / wv matmul backward (#14581 )	2026-02-06 14:54:00 +09:00
wozeparrot	f73468d516	fa: block skipping for fa kv bwd (#14569 )	2026-02-05 16:13:53 -08:00
chenyu	41a179f542	fix test_xlm_roberta_large (#14564 ) onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too	2026-02-05 14:56:06 -05:00
qazal	190042358f	llama: faster bf16 matmul / rope backward (#14558 )	2026-02-05 23:57:25 +09:00

1 2 3 4 5 ...

1661 Commits