George Hotz
04c79505ec
no subnormal bf16 ( #13905 )
2025-12-30 13:02:53 -05:00
chenyu
39f99b207a
update IGNORE_OOB error message ( #13904 )
...
IGNORE_OOB=1 to disable
2025-12-30 12:25:55 -05:00
George Hotz
7e14cdcb06
assembly/amd: clean up clt/ctz hack ( #13901 )
...
* assembly/amd: clean up clt/ctz hack
* add breaks
2025-12-30 11:59:28 -05:00
George Hotz
69cdc8066d
assembly/amd: add dtype tests to AMD IDE CI ( #13899 )
...
* add dtype tests to AMD IDE CI
* more tests
* add trig preop
* regen done
* split to amd autogen
* simpler
2025-12-30 11:09:51 -05:00
George Hotz
9c89be5235
assembly/amd: fix v_perm_b32 + PC fixes ( #13897 )
...
* assembly/amd: fix v_perm_b32
* add pc support
2025-12-30 09:25:40 -05:00
George Hotz
2b838dc1d8
assembly/amd: fix AMD_LLVM=1 support in emulator ( #13881 )
...
* fix AMD_LLVM=1 support in emulator
* more llvm with dtype
* work
* more fixes
* fix dtype
2025-12-30 09:09:57 -05:00
nimlgen
a19d21ea9c
am: mi3xx smu clocks ( #13894 )
...
* am: mi3xx smu clocks
* x
2025-12-30 16:44:17 +03:00
qazal
b557c46233
assembly gemm clean ups, instructions for cli ( #13892 )
2025-12-30 16:14:06 +09:00
qazal
d7e1f26e3d
command line interface for sqtt viz ( #13891 )
...
* command line interface for sqtt viz
* cleanup
* api surface area
* this confuses the llms
* document
2025-12-30 12:33:21 +09:00
chenyu
ab58926b00
update sampling in test_float_cast_to_unsigned ( #13889 )
...
filter is slow for small dtypes
2025-12-29 21:35:46 -05:00
Christopher Milan
0497387e45
NIR: new-style (fix beam) ( #13887 )
...
* NIR: fix beam
* new reduce
* Revert "Revert "NIR: new-style compilers (#13875 )" (#13888 )"
This reverts commit fc4faed0b2 .
* oops
2025-12-29 18:41:29 -05:00
Christopher Milan
fc4faed0b2
Revert "NIR: new-style compilers ( #13875 )" ( #13888 )
...
This reverts commit 72236bbd3d .
2025-12-29 17:42:28 -05:00
George Hotz
94bca91f3e
assembly/amd: have asm go through the dsl ( #13886 )
...
* assembly/amd: have asm go through the dsl
* lil
2025-12-29 17:39:11 -05:00
George Hotz
7322d9ec4a
assembly/amd: add new instruction support to pcode ( #13885 )
...
* assembly/amd: add new instruction support
* more
* regen all
2025-12-29 17:30:17 -05:00
George Hotz
0d326f5b9b
fix missing instructions in psuedocode ( #13884 )
2025-12-29 16:11:22 -05:00
Christopher Milan
9c6850fc01
remove try-catches on llvm import ( #13883 )
2025-12-29 15:56:17 -05:00
George Hotz
9d8397be11
add CDNA3+RDNA4 support ( #13882 )
...
* fix CI
* remove junk
* rename lib to dsl
* correct
* cleanups
2025-12-29 15:51:29 -05:00
Christopher Milan
72236bbd3d
NIR: new-style compilers ( #13875 )
...
* NIR: new-style compilers
* mypy
* simplify NIR compilers
* lvp compiler too
* mypy
* simplify
* mypy
2025-12-29 15:31:41 -05:00
George Hotz
81cf9ea0ab
rename to extra.assembly.amd ( #13879 )
2025-12-29 14:10:55 -05:00
George Hotz
37f0fa11b6
rdna3 test cleanups ( #13878 )
...
* rdna3 test cleanups
* cleanups
* ugh DONT SKIP
2025-12-29 13:41:59 -05:00
George Hotz
35db73b231
add cdna4 support to parsers ( #13877 )
...
* add cdna4 support to parsers
* cdna4
2025-12-29 13:23:43 -05:00
Clément Verrier
d178235309
delete tree structure from CLAUDE.md ( #13876 )
...
Claude Code should be able to figure out the correct structure, and the
hardcoded tree structure might become outdated.
2025-12-29 13:23:20 -05:00
George Hotz
ff856a74cb
minor refactoring for rdna3 ( #13873 )
...
* minor refactoring for rdna3
* fix div scale stuff
* more bugfixes
2025-12-29 13:20:00 -05:00
C T
39923203ba
fix exception in cuda bindings code on windows ( #13823 )
...
* fix cuda on windows
* fix linter errors
* test github action install cuda-toolkit
* Revert "test github action install cuda-toolkit"
This reverts commit c18ad6f937 .
* Revert "fix linter errors"
This reverts commit 00aa943e91 .
* Revert "fix cuda on windows"
This reverts commit 7aea5256b1 .
* fix windows sysconfig.get_config_var("MULTIARCH") is None
2025-12-29 12:58:22 -05:00
b1tg
63a1bb8507
multi custom kernel: support input mixed with copy and shard ( #13748 )
2025-12-29 12:54:27 -05:00
chenyu
0a98fd38b3
fix tests that failed locally on mac ( #13872 )
...
keccak output was silently broken without contiguous
2025-12-29 11:23:38 -05:00
Clément Verrier
0e409ff5ce
fix indentation in UOp pretty_print for repeated references ( #13857 )
...
* fix correct indentation in UOp pretty_print for repeated references
When a UOp was referenced multiple times, the walrus operator notation
(e.g., x0:=) was correctly used for the first occurrence, but subsequent
references had misaligned indentation due to an extra space character.
Fix indentation misalignment in pretty_print() when UOps are referenced
multiple times.
* add simple unit tests for UOp repr
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-12-29 10:46:16 -05:00
George Hotz
f1471a3b99
speed up rdna3 unit tests + add to CI ( #13871 )
...
* speed up rdna3 unit tests
* add test to CI
* faster and simpler
* speedups
* bugfixes
* use helper
* fix CI maybe
* test fixes
* llvm-21 on 24.04
* upd
* llvm-21
* fix test
* bring that back
* merge gen into lib
* test generators
2025-12-29 10:26:48 -05:00
h-vetinari
37720fd6c0
also look for linux libraries in RHEL-themed paths ( #13863 )
2025-12-29 10:05:32 -05:00
George Hotz
25ef866e89
write python emulator from RDNA3 psuedocode in pdf ( #13841 )
...
* write python emulator from RDNA3 psuedocode in pdf
* emu2
* more emu
* working
* more psueod
* progress
* cleanups
* delete junk
* delete stale files
* just emu
* work
* emu compare
* bemu
* cleanups and more failures
* revert bench emu
* fix emu cmp
* four tests fail
* bugfixes
* dsl
* ext
* refactor
* dsl
* div scale fix
* test_emu
* fix emu tests
* pcode
* test pcode
* top imports
* fix test_emu to use run_asm
* emu tests on real hardware
* more tests
* more emu tests
* more
* work
* work
* bug fix
* bugfixes
* fix fp16 gemm
* all ops tests pass in emulator
* fix llvm tests
* fix a few more tests
* fix mockgpu timeout
2025-12-29 07:39:53 -05:00
nimlgen
88eb230326
memory: correct pa allocator size ( #13861 )
2025-12-29 14:49:44 +03:00
qazal
f541540129
variable N for asm gemm ( #13869 )
...
* variable N for asm gemm
* cleanup spacing
2025-12-29 19:35:50 +09:00
nimlgen
c6769badc2
mockgpu: async support ( #13868 )
...
* mockgpu: async support
* cpu
2025-12-29 13:18:37 +03:00
qazal
fc5278746f
mi350x assembly gemm cleanups ( #13867 )
2025-12-29 18:47:23 +09:00
George Hotz
f07c39cfa4
hwtest fixes for rdna3 dsl ( #13865 )
2025-12-28 20:42:29 -05:00
George Hotz
d9603c1bee
improve asm dsl syntax ( #13864 )
...
* improve asm dsl syntax
* improve asm dsl syntax
2025-12-28 20:04:59 -05:00
chenyu
f5090192c8
reorder AMD tensor core benchmark test ( #13860 )
...
* reorder AMD tensor core benchmark test
* disable that
2025-12-28 12:29:51 -05:00
qazal
066d96c397
print tflops in asm gemm test ( #13859 )
...
* print tflops in asm gemm test
* change order
2025-12-29 02:26:40 +09:00
chenyu
a03cd43e78
fix typing in compute_gradient ( #13852 )
2025-12-28 11:52:14 -05:00
chenyu
cba05acadf
re-enable TYPED=1 import test ( #13858 )
2025-12-28 11:49:06 -05:00
qazal
2cfbabdc34
mi350x 1tflop bf16 gemm in extra ( #13702 )
2025-12-28 21:45:42 +09:00
qazal
2180eee5e4
use the asm dsl in remu hwtest.py ( #13856 )
...
* remu hw test with the asm dsl
* simpler
* nthreads and exec mask
* cmp/cmpx
* assembler error in s_mov_b32
* vopd in dsl?
2025-12-28 11:32:41 +09:00
chenyu
784b919f7f
Revert "optim empty shard #13513 ( #13598 )" ( #13855 )
...
* Revert "optim empty shard #13513 (#13598 )"
This reverts commit 76d465dbc3 .
* test_arange_shrink
* update test
2025-12-27 21:10:23 -05:00
anu
9b4de8abc7
fix beam in python 3.14+ ( #13836 )
...
* fix beam search on python 3.14
* add PickleableCount class to helpers
* change name, add test, add step
* tidy count init
2025-12-27 16:24:22 -05:00
chenyu
0f74909ae9
clean up rearrange ( #13851 )
2025-12-27 11:06:10 -05:00
qazal
f6c660f7fa
simplify sqtt decoder infra ( #13849 )
...
* more work
* simpler
2025-12-28 00:31:16 +09:00
Clément Verrier
ae013beab8
handle empty VECTORIZE in UOp.render() ( #13847 )
...
`UOp.render()` crashed with `IndexError: tuple index out of range` when
the UOp graph contained a `VECTORIZE` with empty `src=()`. This occurs
when reshaping to scalar shape `()`, e.g., `Tensor.ones(4).sum()`.
The bug was in the renderer's VECTORIZE pattern: `all_same(())` returns
`True` (vacuous truth), causing the code to access `x.src[0]` on an
empty tuple.
- Fix `IndexError` when calling `UOp.render()` on graphs containing
empty `VECTORIZE` nodes.
- Add test for empty `VECTORIZE` rendering.
2025-12-27 10:09:39 -05:00
qazal
a2da61d096
use new style amd compiler in viz ( #13848 )
...
* working version, handcode gfx1100 arch
* get target from device properties
* lib in cfg test program spec
2025-12-27 23:59:30 +09:00
JINO ROHIT
1ee92003ea
minor typo ( #13846 )
2025-12-27 09:34:57 -05:00
nimlgen
276159cb87
system: add base_class to pci_scan_bus ( #13845 )
...
* system: add base_class to pci_scan_bus
* fix
2025-12-27 13:22:21 +03:00