tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
qazal	8119d9f082	sqtt: decode each instruction exec (#13093 ) * sqtt: decode each instruction exec * start tests * run_asm * capture sqtt per kernel * chaining vgprs * test things * inst_execs in viz * can also configure l and g * 1l + cleanup * test_sleep * test_wmma * work * test sleep with llvm builtin	2025-11-05 17:30:27 +08:00
nimlgen	eaf7cbc178	amd: flush sqtt after each kernel (#13092 ) * amd: flush sqtt after each kernel * merge for rgp	2025-11-04 22:12:48 +08:00
nimlgen	49191ada77	roc: install sqtt decoder (#13091 ) * roc: install? * msg * 0.1.4	2025-11-04 18:56:01 +08:00
nimlgen	2e97eaa866	roc: no nullptr when no wave instructions (#13087 )	2025-11-04 17:32:14 +08:00
wozeparrot	9c00c0688a	tk fa: use 16x64 tiles (#13086 )	2025-11-03 18:25:38 -08:00
wozeparrot	4ed0f216b5	fix: make max_matmul run again (#13085 )	2025-11-03 18:09:09 -08:00
qazal	6df34a5887	lint sqtt parser with mypy (#13079 ) * llvm address table errs * mypy likes annotated dicts * unwrap nullable	2025-11-04 00:53:59 +08:00
nimlgen	dfde3f54d9	rocprof: use llvm disasm (#13077 ) * rocprof: use llvm disasm * rm	2025-11-03 23:58:58 +08:00
qazal	27d42fd575	sqtt decoder print behind DEBUG>=5 (#13076 ) * sqtt decoder print behind DEBUG>=5 * gfx version stuff also behind 5	2025-11-03 23:20:03 +08:00
George Hotz	416b15cc59	improve uop matmul syntax (#13074 ) * improve uop matmul syntax * store takes const * copy * cleanups * faster and simpler * label them reduce * better syntax * touchup	2025-11-03 21:34:26 +08:00
qazal	1c0d4f1cd2	viz: counters loader (#12987 ) * standalone custom loader * first iteration on the ui * work * add center helper * add edge offsets * enumerate all edge types * try dagre layout algorithm * simpler spec * bring back double edges * more work on edge paths * aesthetics * custom edges also works * dimmer inactive links * cleanup * cleanup * split out the ncu layout * this is just a k/v map now * rm that * more cleanup and comments * do work * also this work * simpler start * rm that * sqtt work * view sqtt * sqtt * --custom is just in profile * wrap c call * from tinygrad install * eg. module not found	2025-11-03 19:42:36 +08:00
George Hotz	1e3d6e49a6	index slicing + allclose (#13071 ) * continue work on slicing+allclose * Revert "Revert "slicing + allclose"" This reverts commit `6c7a12f21c`. * fix tests + better syntax * forgot an after * slot is an integer	2025-11-03 13:01:48 +08:00
George Hotz	8cbef912d2	move reshape to MathTraits (#13054 ) * move reshape to MathTraits * confirm it works in amd_uop_matmul	2025-11-02 12:56:15 +08:00
George Hotz	267be7fc5e	fp16 acc	2025-11-02 12:53:04 +08:00
wozeparrot	8206eab4fc	fix: tk fa 4 workers (#13052 )	2025-11-01 16:41:29 -07:00
George Hotz	e98506735b	add CONTRACT support to UOp programs (#13043 ) * add contract support * use contract * 342 tflops	2025-11-01 19:11:32 +08:00
George Hotz	65a0a31475	AMD mi350x matmul from stream (#13040 ) * works * working mfma * 120 TFLOPS * regs * 192 TFLOPS * try pipelining * something * notes * contract * linter to 3.11 * that was a bug	2025-11-01 17:55:19 +08:00
nimlgen	a23226e61e	amd: pmc for gfx9 (#13036 ) * amd: pmc for gfx9 * xcc * vmid mask * ugh * tiny * minor * sorryg	2025-11-01 04:26:34 +08:00
nimlgen	f6786c1bfd	autogen: py314 (#13038 ) * autogen: py314 * bump py?	2025-11-01 04:02:19 +08:00
George Hotz	bc178d14a9	matmul example on metal showing off tensor core (#13033 ) * matmul example on metal showing off tensor core * flip the args of placeholder * mat_idx * imp	2025-10-31 19:40:36 +08:00
George Hotz	b46229ca51	use shrink in amd_matmul_uop (#13026 ) * use shrink in amd_matmul_uop * colors	2025-10-31 10:43:41 +08:00
wozeparrot	78f7650eec	faster tk matmul (#13006 )	2025-10-30 19:09:27 -07:00
George Hotz	512513c403	cleanup amd uop matmul (#13025 ) * cleanup amd uop matmul * remove mod * move that out * better variable names * var names * more * render fallback * colors	2025-10-31 10:04:45 +08:00
nimlgen	629b177b66	amd: sqtt works in profile mode (#13019 )	2025-10-30 23:48:52 +08:00
nimlgen	4d7a7096c9	am: enable perfmon (#13013 ) * am: enable perfmon * try * msg	2025-10-30 22:28:36 +08:00
George Hotz	4a741e8364	modernize amd uop matmul (#13011 ) * modernize amd uop matmul * progress * comment * more comments * revert that * mac cleanups * fix estimates * format	2025-10-30 17:02:38 +08:00
wozeparrot	92a87e37e4	fix: fetch_file (#13010 )	2025-10-29 22:44:22 -07:00
nimlgen	a6f5b1482e	amd: perf counters (#12975 ) * amd: perf counters * sq * cleaner * fix * if enabled * ruff * mypy * counters * reset * fix * no cpu	2025-10-30 00:10:31 +08:00
wozeparrot	d66c997a39	feat: thunderkittens fa2 (#12955 )	2025-10-28 11:27:45 -07:00
wozeparrot	24884c6768	fix: don't use KITTENS_HOPPER for 4090 (#12954 )	2025-10-27 17:19:53 -07:00
George Hotz	25c2da1579	check SPEC=2 in CI (#12945 ) * check SPEC=2 in CI * split SPEC=2 * fast enough	2025-10-27 21:53:57 +08:00
nimlgen	f4da94af28	system: reset is a method of pcidevice (#12936 )	2025-10-27 16:21:10 +08:00
wozeparrot	6b54378eba	working kitten matmul (#12935 )	2025-10-26 23:40:49 -07:00
George Hotz	db5c918215	source extra/cl_android.sh to fix opencl on android	2025-10-26 15:27:51 +08:00
qazal	2f95c10702	remu new instructions / use volatile in emulator tests (#12862 ) * remu new instructions * start moving to volatile * test_simple works * test_exec_mov works and lid is still here * test_exec_cmp_vopc * clang did s_mov_b32 exec_lo, 1 * don't hardcode v1 * support volatile in tests * hw_test passes * only the volatile version * subrev saturating behavior	2025-10-23 11:13:43 +08:00
chenyu	c5cee74706	remove BLOCK_REORDER (#12854 ) not used	2025-10-21 19:10:14 -04:00
b1tg	60d7e232f2	cuda fp8 (#12782 ) * cuda fp8 * tensor core * tc test * clean * clean pm	2025-10-21 15:05:25 -04:00
chenyu	8baa61bd67	use torch 2.9 and its Muon in test (#12773 ) * use torch 2.9 and its Muon in test * relax and disable	2025-10-21 13:35:17 -04:00
chenyu	f51f9aaa16	muon ns_params -> ns_coefficients (#12850 ) match the official torch one	2025-10-21 12:35:52 -04:00
nimlgen	1ad6598963	amd: trace all instructions (#12831 )	2025-10-21 20:52:24 +08:00
George Hotz	cad3ada909	tinygpu: build with SIP off works	2025-10-20 09:11:09 +08:00
nimlgen	59784a5972	amd: ensure ts is written (#12794 )	2025-10-19 23:55:49 +08:00
George Hotz	89e7f2fa00	mmapeak: gfx1103 support	2025-10-19 16:57:28 +08:00
George Hotz	617614beb7	add mi350x support to mmapeak (#12784 )	2025-10-19 16:11:07 +08:00
nimlgen	037f6e8fa0	qcom: ioctl for 7xx (#12777 )	2025-10-18 20:33:14 +08:00
geohotstan	5d209ee7ec	onnx helper intermediate node output validation (#12740 ) * start * update comments * good * add comments and better printing * done	2025-10-16 11:17:47 -04:00
nimlgen	3aa2277b8f	nv: usb4 (#12696 ) * hackish * prog * match * l * simpler * refactor * not osx * apple things * tiny changes * fix mask * match fix * nn	2025-10-16 20:11:19 +08:00
wozeparrot	cc2dfe22f5	tinyfs: fetch file utility (#12719 )	2025-10-15 23:38:56 -07:00
George Hotz	4a151e7533	make xcode signing happy, waiting for entitlement (#12712 )	2025-10-16 10:20:34 +08:00
Daniel	d65bd669f8	update tiny torch backend hook (#12575 ) * update the backend to fix torch deprecation warning * use param_hook to avoid full backward hook needlessly firing on inputs which do not require gradients * fix indentation --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-10-15 14:02:33 -04:00

1 2 3 4 5 ...

1311 Commits