11640 Commits

Author SHA1 Message Date
chenyu
a9a7b33404 IGNORE_OOB=0 in CI (#13903) 2025-12-31 12:56:59 -05:00
George Hotz
29402034a1 assembly/amd: cleanups to asm and emu (#13912)
* a bunch of cleanups

* ops are back

* bug fixes

* cleanups

* a lil simpler

* more refactors

* _disasm_vop1

* sops

* more

* continue

* more

* num_srcs

* simpler

* no _is16

* op cleanups

* isinstnace
2025-12-31 12:46:11 -05:00
chenyu
ba9aa5cd6f skip some PTX IGNORE_OOB validation (#13927) 2025-12-31 12:40:21 -05:00
chenyu
4968060ad4 fix IGNORE_OOB=0 for WEBGPU (#13926) 2025-12-31 10:41:28 -05:00
chenyu
35bd39e4ba update mypy and torch version in ci (#13925) 2025-12-31 10:29:28 -05:00
George Hotz
b998a80b5d assembly/amd: split generated stuff into enum/ins (#13924) 2025-12-31 10:10:52 -05:00
chenyu
404755bafd merge ci ruff tests and update ruff version (#13922) 2025-12-31 09:53:49 -05:00
nimlgen
25440f0f72 all2all (#13902)
* all2all

* um

* fix

* x

* um

* simler

* mypy

* fix

* t

* cmnts
2025-12-31 16:38:32 +03:00
nimlgen
f7ee644950 amd: lazy sdma queue allocation (#13920)
* ams: lazy queue

* nv

* linter

* f
2025-12-31 15:17:13 +03:00
nimlgen
b063518ea7 am: several sdmas (#13919)
* am: several sdmas

* fix
2025-12-31 14:19:22 +03:00
qazal
b23f4517ab prep mi350x gemm for python dsl (#13918)
* start by pruning existing asm

* better branch names

* split to template and real instructions
2025-12-31 20:00:57 +09:00
qazal
3f3786ded9 mmapeak: fix compiler import (#13915) 2025-12-31 16:52:23 +09:00
Christopher Milan
a14896fff2 refactor QCOM arg parsing (#13914)
* refactor QCOM arg parsing

* ruff

* mypy
2025-12-30 19:26:02 -05:00
Christopher Milan
c475c3a6d7 remove useless cast (#13911) 2025-12-30 19:24:29 -05:00
George Hotz
0221b96761 assembly/amd: fix all ops tests (#13910)
* assembly/amd: fix all ops tests

* test_ops with smaller sizes

* ds store/load 2addr
2025-12-30 18:01:34 -05:00
chenyu
dc27eb48ac remove PYTHONPATH="." from test.yml (#13909) 2025-12-30 17:00:16 -05:00
George Hotz
efc99d0c55 assembly/amd: more refactors (#13907)
* assembly/amd: more refactors

* more refactors

* more refactors

* simpler emu

* generate.py

* regen all

* cleanups

* more

* work

* more readme

* lil
2025-12-30 16:13:24 -05:00
George Hotz
49d1bf93d6 assembly/amd: refactor asm.py to be simpler (#13900)
* assembly/amd: refactor asm.py

* assembly/amd: refactor asm.py to be simpler

* multiple fxns

* fast

* more tests pass

* regen

* stop decode
2025-12-30 13:51:40 -05:00
George Hotz
04c79505ec no subnormal bf16 (#13905) 2025-12-30 13:02:53 -05:00
chenyu
39f99b207a update IGNORE_OOB error message (#13904)
IGNORE_OOB=1 to disable
2025-12-30 12:25:55 -05:00
George Hotz
7e14cdcb06 assembly/amd: clean up clt/ctz hack (#13901)
* assembly/amd: clean up clt/ctz hack

* add breaks
2025-12-30 11:59:28 -05:00
George Hotz
69cdc8066d assembly/amd: add dtype tests to AMD IDE CI (#13899)
* add dtype tests to AMD IDE CI

* more tests

* add trig preop

* regen done

* split to amd autogen

* simpler
2025-12-30 11:09:51 -05:00
George Hotz
9c89be5235 assembly/amd: fix v_perm_b32 + PC fixes (#13897)
* assembly/amd: fix v_perm_b32

* add pc support
2025-12-30 09:25:40 -05:00
George Hotz
2b838dc1d8 assembly/amd: fix AMD_LLVM=1 support in emulator (#13881)
* fix AMD_LLVM=1 support in emulator

* more llvm with dtype

* work

* more fixes

* fix dtype
2025-12-30 09:09:57 -05:00
nimlgen
a19d21ea9c am: mi3xx smu clocks (#13894)
* am: mi3xx smu clocks

* x
2025-12-30 16:44:17 +03:00
qazal
b557c46233 assembly gemm clean ups, instructions for cli (#13892) 2025-12-30 16:14:06 +09:00
qazal
d7e1f26e3d command line interface for sqtt viz (#13891)
* command line interface for sqtt viz

* cleanup

* api surface area

* this confuses the llms

* document
2025-12-30 12:33:21 +09:00
chenyu
ab58926b00 update sampling in test_float_cast_to_unsigned (#13889)
filter is slow for small dtypes
2025-12-29 21:35:46 -05:00
Christopher Milan
0497387e45 NIR: new-style (fix beam) (#13887)
* NIR: fix beam

* new reduce

* Revert "Revert "NIR: new-style compilers (#13875)" (#13888)"

This reverts commit fc4faed0b2.

* oops
2025-12-29 18:41:29 -05:00
Christopher Milan
fc4faed0b2 Revert "NIR: new-style compilers (#13875)" (#13888)
This reverts commit 72236bbd3d.
2025-12-29 17:42:28 -05:00
George Hotz
94bca91f3e assembly/amd: have asm go through the dsl (#13886)
* assembly/amd: have asm go through the dsl

* lil
2025-12-29 17:39:11 -05:00
George Hotz
7322d9ec4a assembly/amd: add new instruction support to pcode (#13885)
* assembly/amd: add new instruction support

* more

* regen all
2025-12-29 17:30:17 -05:00
George Hotz
0d326f5b9b fix missing instructions in psuedocode (#13884) 2025-12-29 16:11:22 -05:00
Christopher Milan
9c6850fc01 remove try-catches on llvm import (#13883) 2025-12-29 15:56:17 -05:00
George Hotz
9d8397be11 add CDNA3+RDNA4 support (#13882)
* fix CI

* remove junk

* rename lib to dsl

* correct

* cleanups
2025-12-29 15:51:29 -05:00
Christopher Milan
72236bbd3d NIR: new-style compilers (#13875)
* NIR: new-style compilers

* mypy

* simplify NIR compilers

* lvp compiler too

* mypy

* simplify

* mypy
2025-12-29 15:31:41 -05:00
George Hotz
81cf9ea0ab rename to extra.assembly.amd (#13879) 2025-12-29 14:10:55 -05:00
George Hotz
37f0fa11b6 rdna3 test cleanups (#13878)
* rdna3 test cleanups

* cleanups

* ugh DONT SKIP
2025-12-29 13:41:59 -05:00
George Hotz
35db73b231 add cdna4 support to parsers (#13877)
* add cdna4 support to parsers

* cdna4
2025-12-29 13:23:43 -05:00
Clément Verrier
d178235309 delete tree structure from CLAUDE.md (#13876)
Claude Code should be able to figure out the correct structure, and the
hardcoded tree structure might become outdated.
2025-12-29 13:23:20 -05:00
George Hotz
ff856a74cb minor refactoring for rdna3 (#13873)
* minor refactoring for rdna3

* fix div scale stuff

* more bugfixes
2025-12-29 13:20:00 -05:00
C T
39923203ba fix exception in cuda bindings code on windows (#13823)
* fix cuda on windows

* fix linter errors

* test github action install cuda-toolkit

* Revert "test github action install cuda-toolkit"

This reverts commit c18ad6f937.

* Revert "fix linter errors"

This reverts commit 00aa943e91.

* Revert "fix cuda on windows"

This reverts commit 7aea5256b1.

* fix windows sysconfig.get_config_var("MULTIARCH") is None
2025-12-29 12:58:22 -05:00
b1tg
63a1bb8507 multi custom kernel: support input mixed with copy and shard (#13748) 2025-12-29 12:54:27 -05:00
chenyu
0a98fd38b3 fix tests that failed locally on mac (#13872)
keccak output was silently broken without contiguous
2025-12-29 11:23:38 -05:00
Clément Verrier
0e409ff5ce fix indentation in UOp pretty_print for repeated references (#13857)
* fix correct indentation in UOp pretty_print for repeated references

When a UOp was referenced multiple times, the walrus operator notation
(e.g., x0:=) was correctly used for the first occurrence, but subsequent
references had misaligned indentation due to an extra space character.

Fix indentation misalignment in pretty_print() when UOps are referenced
multiple times.

* add simple unit tests for UOp repr

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-12-29 10:46:16 -05:00
George Hotz
f1471a3b99 speed up rdna3 unit tests + add to CI (#13871)
* speed up rdna3 unit tests

* add test to CI

* faster and simpler

* speedups

* bugfixes

* use helper

* fix CI maybe

* test fixes

* llvm-21 on 24.04

* upd

* llvm-21

* fix test

* bring that back

* merge gen into lib

* test generators
2025-12-29 10:26:48 -05:00
h-vetinari
37720fd6c0 also look for linux libraries in RHEL-themed paths (#13863) 2025-12-29 10:05:32 -05:00
George Hotz
25ef866e89 write python emulator from RDNA3 psuedocode in pdf (#13841)
* write python emulator from RDNA3 psuedocode in pdf

* emu2

* more emu

* working

* more psueod

* progress

* cleanups

* delete junk

* delete stale files

* just emu

* work

* emu compare

* bemu

* cleanups and more failures

* revert bench emu

* fix emu cmp

* four tests fail

* bugfixes

* dsl

* ext

* refactor

* dsl

* div scale fix

* test_emu

* fix emu tests

* pcode

* test pcode

* top imports

* fix test_emu to use run_asm

* emu tests on real hardware

* more tests

* more emu tests

* more

* work

* work

* bug fix

* bugfixes

* fix fp16 gemm

* all ops tests pass in emulator

* fix llvm tests

* fix a few more tests

* fix mockgpu timeout
2025-12-29 07:39:53 -05:00
nimlgen
88eb230326 memory: correct pa allocator size (#13861) 2025-12-29 14:49:44 +03:00
qazal
f541540129 variable N for asm gemm (#13869)
* variable N for asm gemm

* cleanup spacing
2025-12-29 19:35:50 +09:00