tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	04c79505ec	no subnormal bf16 (#13905 )	2025-12-30 13:02:53 -05:00
chenyu	39f99b207a	update IGNORE_OOB error message (#13904 ) IGNORE_OOB=1 to disable	2025-12-30 12:25:55 -05:00
George Hotz	7e14cdcb06	assembly/amd: clean up clt/ctz hack (#13901 ) * assembly/amd: clean up clt/ctz hack * add breaks	2025-12-30 11:59:28 -05:00
George Hotz	69cdc8066d	assembly/amd: add dtype tests to AMD IDE CI (#13899 ) * add dtype tests to AMD IDE CI * more tests * add trig preop * regen done * split to amd autogen * simpler	2025-12-30 11:09:51 -05:00
George Hotz	9c89be5235	assembly/amd: fix v_perm_b32 + PC fixes (#13897 ) * assembly/amd: fix v_perm_b32 * add pc support	2025-12-30 09:25:40 -05:00
George Hotz	2b838dc1d8	assembly/amd: fix AMD_LLVM=1 support in emulator (#13881 ) * fix AMD_LLVM=1 support in emulator * more llvm with dtype * work * more fixes * fix dtype	2025-12-30 09:09:57 -05:00
nimlgen	a19d21ea9c	am: mi3xx smu clocks (#13894 ) * am: mi3xx smu clocks * x	2025-12-30 16:44:17 +03:00
qazal	b557c46233	assembly gemm clean ups, instructions for cli (#13892 )	2025-12-30 16:14:06 +09:00
qazal	d7e1f26e3d	command line interface for sqtt viz (#13891 ) * command line interface for sqtt viz * cleanup * api surface area * this confuses the llms * document	2025-12-30 12:33:21 +09:00
chenyu	ab58926b00	update sampling in test_float_cast_to_unsigned (#13889 ) filter is slow for small dtypes	2025-12-29 21:35:46 -05:00
Christopher Milan	0497387e45	NIR: new-style (fix beam) (#13887 ) * NIR: fix beam * new reduce * Revert "Revert "NIR: new-style compilers (#13875)" (#13888)" This reverts commit `fc4faed0b2`. * oops	2025-12-29 18:41:29 -05:00
Christopher Milan	fc4faed0b2	Revert "NIR: new-style compilers (#13875 )" (#13888 ) This reverts commit `72236bbd3d`.	2025-12-29 17:42:28 -05:00
George Hotz	94bca91f3e	assembly/amd: have asm go through the dsl (#13886 ) * assembly/amd: have asm go through the dsl * lil	2025-12-29 17:39:11 -05:00
George Hotz	7322d9ec4a	assembly/amd: add new instruction support to pcode (#13885 ) * assembly/amd: add new instruction support * more * regen all	2025-12-29 17:30:17 -05:00
George Hotz	0d326f5b9b	fix missing instructions in psuedocode (#13884 )	2025-12-29 16:11:22 -05:00
Christopher Milan	9c6850fc01	remove try-catches on llvm import (#13883 )	2025-12-29 15:56:17 -05:00
George Hotz	9d8397be11	add CDNA3+RDNA4 support (#13882 ) * fix CI * remove junk * rename lib to dsl * correct * cleanups	2025-12-29 15:51:29 -05:00
Christopher Milan	72236bbd3d	NIR: new-style compilers (#13875 ) * NIR: new-style compilers * mypy * simplify NIR compilers * lvp compiler too * mypy * simplify * mypy	2025-12-29 15:31:41 -05:00
George Hotz	81cf9ea0ab	rename to extra.assembly.amd (#13879 )	2025-12-29 14:10:55 -05:00
George Hotz	37f0fa11b6	rdna3 test cleanups (#13878 ) * rdna3 test cleanups * cleanups * ugh DONT SKIP	2025-12-29 13:41:59 -05:00
George Hotz	35db73b231	add cdna4 support to parsers (#13877 ) * add cdna4 support to parsers * cdna4	2025-12-29 13:23:43 -05:00
Clément Verrier	d178235309	delete tree structure from CLAUDE.md (#13876 ) Claude Code should be able to figure out the correct structure, and the hardcoded tree structure might become outdated.	2025-12-29 13:23:20 -05:00
George Hotz	ff856a74cb	minor refactoring for rdna3 (#13873 ) * minor refactoring for rdna3 * fix div scale stuff * more bugfixes	2025-12-29 13:20:00 -05:00
C T	39923203ba	fix exception in cuda bindings code on windows (#13823 ) * fix cuda on windows * fix linter errors * test github action install cuda-toolkit * Revert "test github action install cuda-toolkit" This reverts commit `c18ad6f937`. * Revert "fix linter errors" This reverts commit `00aa943e91`. * Revert "fix cuda on windows" This reverts commit `7aea5256b1`. * fix windows sysconfig.get_config_var("MULTIARCH") is None	2025-12-29 12:58:22 -05:00
b1tg	63a1bb8507	multi custom kernel: support input mixed with copy and shard (#13748 )	2025-12-29 12:54:27 -05:00
chenyu	0a98fd38b3	fix tests that failed locally on mac (#13872 ) keccak output was silently broken without contiguous	2025-12-29 11:23:38 -05:00
Clément Verrier	0e409ff5ce	fix indentation in UOp pretty_print for repeated references (#13857 ) * fix correct indentation in UOp pretty_print for repeated references When a UOp was referenced multiple times, the walrus operator notation (e.g., x0:=) was correctly used for the first occurrence, but subsequent references had misaligned indentation due to an extra space character. Fix indentation misalignment in pretty_print() when UOps are referenced multiple times. * add simple unit tests for UOp repr --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-12-29 10:46:16 -05:00
George Hotz	f1471a3b99	speed up rdna3 unit tests + add to CI (#13871 ) * speed up rdna3 unit tests * add test to CI * faster and simpler * speedups * bugfixes * use helper * fix CI maybe * test fixes * llvm-21 on 24.04 * upd * llvm-21 * fix test * bring that back * merge gen into lib * test generators	2025-12-29 10:26:48 -05:00
h-vetinari	37720fd6c0	also look for linux libraries in RHEL-themed paths (#13863 )	2025-12-29 10:05:32 -05:00
George Hotz	25ef866e89	write python emulator from RDNA3 psuedocode in pdf (#13841 ) * write python emulator from RDNA3 psuedocode in pdf * emu2 * more emu * working * more psueod * progress * cleanups * delete junk * delete stale files * just emu * work * emu compare * bemu * cleanups and more failures * revert bench emu * fix emu cmp * four tests fail * bugfixes * dsl * ext * refactor * dsl * div scale fix * test_emu * fix emu tests * pcode * test pcode * top imports * fix test_emu to use run_asm * emu tests on real hardware * more tests * more emu tests * more * work * work * bug fix * bugfixes * fix fp16 gemm * all ops tests pass in emulator * fix llvm tests * fix a few more tests * fix mockgpu timeout	2025-12-29 07:39:53 -05:00
nimlgen	88eb230326	memory: correct pa allocator size (#13861 )	2025-12-29 14:49:44 +03:00
qazal	f541540129	variable N for asm gemm (#13869 ) * variable N for asm gemm * cleanup spacing	2025-12-29 19:35:50 +09:00
nimlgen	c6769badc2	mockgpu: async support (#13868 ) * mockgpu: async support * cpu	2025-12-29 13:18:37 +03:00
qazal	fc5278746f	mi350x assembly gemm cleanups (#13867 )	2025-12-29 18:47:23 +09:00
George Hotz	f07c39cfa4	hwtest fixes for rdna3 dsl (#13865 )	2025-12-28 20:42:29 -05:00
George Hotz	d9603c1bee	improve asm dsl syntax (#13864 ) * improve asm dsl syntax * improve asm dsl syntax	2025-12-28 20:04:59 -05:00
chenyu	f5090192c8	reorder AMD tensor core benchmark test (#13860 ) * reorder AMD tensor core benchmark test * disable that	2025-12-28 12:29:51 -05:00
qazal	066d96c397	print tflops in asm gemm test (#13859 ) * print tflops in asm gemm test * change order	2025-12-29 02:26:40 +09:00
chenyu	a03cd43e78	fix typing in compute_gradient (#13852 )	2025-12-28 11:52:14 -05:00
chenyu	cba05acadf	re-enable TYPED=1 import test (#13858 )	2025-12-28 11:49:06 -05:00
qazal	2cfbabdc34	mi350x 1tflop bf16 gemm in extra (#13702 )	2025-12-28 21:45:42 +09:00
qazal	2180eee5e4	use the asm dsl in remu hwtest.py (#13856 ) * remu hw test with the asm dsl * simpler * nthreads and exec mask * cmp/cmpx * assembler error in s_mov_b32 * vopd in dsl?	2025-12-28 11:32:41 +09:00
chenyu	784b919f7f	Revert "optim empty shard #13513 (#13598 )" (#13855 ) * Revert "optim empty shard #13513 (#13598)" This reverts commit `76d465dbc3`. * test_arange_shrink * update test	2025-12-27 21:10:23 -05:00
anu	9b4de8abc7	fix beam in python 3.14+ (#13836 ) * fix beam search on python 3.14 * add PickleableCount class to helpers * change name, add test, add step * tidy count init	2025-12-27 16:24:22 -05:00
chenyu	0f74909ae9	clean up rearrange (#13851 )	2025-12-27 11:06:10 -05:00
qazal	f6c660f7fa	simplify sqtt decoder infra (#13849 ) * more work * simpler	2025-12-28 00:31:16 +09:00
Clément Verrier	ae013beab8	handle empty VECTORIZE in UOp.render() (#13847 ) `UOp.render()` crashed with `IndexError: tuple index out of range` when the UOp graph contained a `VECTORIZE` with empty `src=()`. This occurs when reshaping to scalar shape `()`, e.g., `Tensor.ones(4).sum()`. The bug was in the renderer's VECTORIZE pattern: `all_same(())` returns `True` (vacuous truth), causing the code to access `x.src[0]` on an empty tuple. - Fix `IndexError` when calling `UOp.render()` on graphs containing empty `VECTORIZE` nodes. - Add test for empty `VECTORIZE` rendering.	2025-12-27 10:09:39 -05:00
qazal	a2da61d096	use new style amd compiler in viz (#13848 ) * working version, handcode gfx1100 arch * get target from device properties * lib in cfg test program spec	2025-12-27 23:59:30 +09:00
JINO ROHIT	1ee92003ea	minor typo (#13846 )	2025-12-27 09:34:57 -05:00
nimlgen	276159cb87	system: add base_class to pci_scan_bus (#13845 ) * system: add base_class to pci_scan_bus * fix	2025-12-27 13:22:21 +03:00

1 2 3 4 5 ...

11522 Commits