Commit Graph

11502 Commits

Author SHA1 Message Date
George Hotz
35db73b231 add cdna4 support to parsers (#13877)
* add cdna4 support to parsers

* cdna4
2025-12-29 13:23:43 -05:00
Clément Verrier
d178235309 delete tree structure from CLAUDE.md (#13876)
Claude Code should be able to figure out the correct structure, and the
hardcoded tree structure might become outdated.
2025-12-29 13:23:20 -05:00
George Hotz
ff856a74cb minor refactoring for rdna3 (#13873)
* minor refactoring for rdna3

* fix div scale stuff

* more bugfixes
2025-12-29 13:20:00 -05:00
C T
39923203ba fix exception in cuda bindings code on windows (#13823)
* fix cuda on windows

* fix linter errors

* test github action install cuda-toolkit

* Revert "test github action install cuda-toolkit"

This reverts commit c18ad6f937.

* Revert "fix linter errors"

This reverts commit 00aa943e91.

* Revert "fix cuda on windows"

This reverts commit 7aea5256b1.

* fix windows sysconfig.get_config_var("MULTIARCH") is None
2025-12-29 12:58:22 -05:00
b1tg
63a1bb8507 multi custom kernel: support input mixed with copy and shard (#13748) 2025-12-29 12:54:27 -05:00
chenyu
0a98fd38b3 fix tests that failed locally on mac (#13872)
keccak output was silently broken without contiguous
2025-12-29 11:23:38 -05:00
Clément Verrier
0e409ff5ce fix indentation in UOp pretty_print for repeated references (#13857)
* fix correct indentation in UOp pretty_print for repeated references

When a UOp was referenced multiple times, the walrus operator notation
(e.g., x0:=) was correctly used for the first occurrence, but subsequent
references had misaligned indentation due to an extra space character.

Fix indentation misalignment in pretty_print() when UOps are referenced
multiple times.

* add simple unit tests for UOp repr

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-12-29 10:46:16 -05:00
George Hotz
f1471a3b99 speed up rdna3 unit tests + add to CI (#13871)
* speed up rdna3 unit tests

* add test to CI

* faster and simpler

* speedups

* bugfixes

* use helper

* fix CI maybe

* test fixes

* llvm-21 on 24.04

* upd

* llvm-21

* fix test

* bring that back

* merge gen into lib

* test generators
2025-12-29 10:26:48 -05:00
h-vetinari
37720fd6c0 also look for linux libraries in RHEL-themed paths (#13863) 2025-12-29 10:05:32 -05:00
George Hotz
25ef866e89 write python emulator from RDNA3 psuedocode in pdf (#13841)
* write python emulator from RDNA3 psuedocode in pdf

* emu2

* more emu

* working

* more psueod

* progress

* cleanups

* delete junk

* delete stale files

* just emu

* work

* emu compare

* bemu

* cleanups and more failures

* revert bench emu

* fix emu cmp

* four tests fail

* bugfixes

* dsl

* ext

* refactor

* dsl

* div scale fix

* test_emu

* fix emu tests

* pcode

* test pcode

* top imports

* fix test_emu to use run_asm

* emu tests on real hardware

* more tests

* more emu tests

* more

* work

* work

* bug fix

* bugfixes

* fix fp16 gemm

* all ops tests pass in emulator

* fix llvm tests

* fix a few more tests

* fix mockgpu timeout
2025-12-29 07:39:53 -05:00
nimlgen
88eb230326 memory: correct pa allocator size (#13861) 2025-12-29 14:49:44 +03:00
qazal
f541540129 variable N for asm gemm (#13869)
* variable N for asm gemm

* cleanup spacing
2025-12-29 19:35:50 +09:00
nimlgen
c6769badc2 mockgpu: async support (#13868)
* mockgpu: async support

* cpu
2025-12-29 13:18:37 +03:00
qazal
fc5278746f mi350x assembly gemm cleanups (#13867) 2025-12-29 18:47:23 +09:00
George Hotz
f07c39cfa4 hwtest fixes for rdna3 dsl (#13865) 2025-12-28 20:42:29 -05:00
George Hotz
d9603c1bee improve asm dsl syntax (#13864)
* improve asm dsl syntax

* improve asm dsl syntax
2025-12-28 20:04:59 -05:00
chenyu
f5090192c8 reorder AMD tensor core benchmark test (#13860)
* reorder AMD tensor core benchmark test

* disable that
2025-12-28 12:29:51 -05:00
qazal
066d96c397 print tflops in asm gemm test (#13859)
* print tflops in asm gemm test

* change order
2025-12-29 02:26:40 +09:00
chenyu
a03cd43e78 fix typing in compute_gradient (#13852) 2025-12-28 11:52:14 -05:00
chenyu
cba05acadf re-enable TYPED=1 import test (#13858) 2025-12-28 11:49:06 -05:00
qazal
2cfbabdc34 mi350x 1tflop bf16 gemm in extra (#13702) 2025-12-28 21:45:42 +09:00
qazal
2180eee5e4 use the asm dsl in remu hwtest.py (#13856)
* remu hw test with the asm dsl

* simpler

* nthreads and exec mask

* cmp/cmpx

* assembler error in s_mov_b32

* vopd in dsl?
2025-12-28 11:32:41 +09:00
chenyu
784b919f7f Revert "optim empty shard #13513 (#13598)" (#13855)
* Revert "optim empty shard #13513 (#13598)"

This reverts commit 76d465dbc3.

* test_arange_shrink

* update test
2025-12-27 21:10:23 -05:00
anu
9b4de8abc7 fix beam in python 3.14+ (#13836)
* fix beam search on python 3.14

* add PickleableCount class to helpers

* change name, add test, add step

* tidy count init
2025-12-27 16:24:22 -05:00
chenyu
0f74909ae9 clean up rearrange (#13851) 2025-12-27 11:06:10 -05:00
qazal
f6c660f7fa simplify sqtt decoder infra (#13849)
* more work

* simpler
2025-12-28 00:31:16 +09:00
Clément Verrier
ae013beab8 handle empty VECTORIZE in UOp.render() (#13847)
`UOp.render()` crashed with `IndexError: tuple index out of range` when
the UOp graph contained a `VECTORIZE` with empty `src=()`. This occurs
when reshaping to scalar shape `()`, e.g., `Tensor.ones(4).sum()`.

The bug was in the renderer's VECTORIZE pattern: `all_same(())` returns
`True` (vacuous truth), causing the code to access `x.src[0]` on an
empty tuple.

- Fix `IndexError` when calling `UOp.render()` on graphs containing
  empty `VECTORIZE` nodes.
- Add test for empty `VECTORIZE` rendering.
2025-12-27 10:09:39 -05:00
qazal
a2da61d096 use new style amd compiler in viz (#13848)
* working version, handcode gfx1100 arch

* get target from device properties

* lib in cfg test program spec
2025-12-27 23:59:30 +09:00
JINO ROHIT
1ee92003ea minor typo (#13846) 2025-12-27 09:34:57 -05:00
nimlgen
276159cb87 system: add base_class to pci_scan_bus (#13845)
* system: add base_class to pci_scan_bus

* fix
2025-12-27 13:22:21 +03:00
Francis Lata
fac137779e remove flux1 seed image (#13843) 2025-12-27 00:45:11 -05:00
qazal
f6de9095a0 switch asm tests to dsl (#13840)
* switch asm tests to dsl

* labeled basic blocks also work

* indenting for basic blocks

* allow define from star import
2025-12-27 02:15:16 +09:00
chenyu
ba922094f2 remove redudant check in disk_supports_fast_copyout (#13838) 2025-12-26 11:30:55 -05:00
George Hotz
e9f2aaba2a simplify rdna3 asm (#13835)
* simplify rdna3 asm

* cleanups

* fix names

* fix tests

* fixes

* more test fixes

* type fixes

* tests pass + mypy passes

* 3.11 syntax
2025-12-26 11:21:03 -05:00
nimlgen
c44b4f9ae0 am: fix sdma warm boot (#13837) 2025-12-26 12:38:06 +03:00
George Hotz
c6937fa744 more work on RDNA3 asm (#13833)
* more llvm asm tests

* roundtrip test

* work

* more handwritten

* more handwritten

* work

* tests pass

* dual mov

* all tests pass

* all tests pass fast
2025-12-25 23:28:14 -05:00
George Hotz
f1111ac7de move amd compilers to new style (#13831)
* move amd compilers to new style

* simplest diff

* AMDHIPrenderer
2025-12-25 13:42:24 -05:00
George Hotz
9d94b8c6b2 python asm dsl in extra + python REMU (#13436)
* having fun with python asm dsl

* rdna3

* meh

* all in rdna3

* work

* more work

* work

* integration

* tests

* simpler

* simpler

* asm

* better

* simpler

* progress

* emu

* simpler

* emu

* tests

* types

* vopd

* cleaups

* work

* memory ranges

* add tracing

* refactors

* run_asm exit

* more readable

* compare to remu

* test gemm

* bug + stale

* more tests

* refactor

* tests fix

* more ins

* more instructions

* refactor

* faster

* match case

* match case

* simpler

* work

* tests

* run_asm

* work

* bug fixes

* more emu

* alu/emu

* refactor

* no pipeline emu yet

* alu direct

* fix

* bugfixes + new test

* fix exceptions in emulators

* update gen.py

* pylint

* no pdf

* improve bench_emu

* speedups

* cleanups

* more tests
2025-12-25 13:04:14 -05:00
nimlgen
b5f3a5ad79 am: cleanup comment (#13828) 2025-12-25 18:00:28 +03:00
chenyu
8985a4a023 one less branch in Buffer.view [pr] (#13829) 2025-12-25 09:34:15 -05:00
chenyu
094753b4e0 renderer arch version cleanup [pr] (#13830) 2025-12-25 09:32:56 -05:00
chenyu
54af29dbdb trange can just be a function (#13827) 2025-12-24 23:57:10 -05:00
qazal
a1c1684b91 set .amdhsa_kernarg_size in asm test (#13826) 2025-12-25 13:08:14 +09:00
chenyu
da1cb6a9ec update llama dataloader (#13825)
separate creating dataset from itererating over the dataset to not create eval data for each eval
2025-12-24 17:42:08 -05:00
chenyu
a7fc0c288b clean up BufferCopy init [pr] (#13824) 2025-12-24 10:40:15 -05:00
chenyu
903753c60c llama wandb logging (#13822) 2025-12-24 10:24:59 -05:00
qazal
e3a646dce3 viz: skip plaintext disassemble for cfg (#13821) 2025-12-24 23:16:59 +09:00
chenyu
cb07c5d0e8 fewer import annotations (#13819) 2025-12-23 18:45:50 -05:00
George Hotz
43c6e973d8 add optional compiler in Renderer (#13817)
* add optional compiler in Renderer [pr]

* fix

* late init

* remove precompiled

* cleanup
2025-12-23 17:58:46 -05:00
George Hotz
8eab6175ee get_program refactor (#13816)
* get_program refactor

* fix docs

* cleanup
2025-12-23 16:44:46 -05:00