Commit Graph

1238 Commits

Author SHA1 Message Date
chenyu
e8252e6e4f use offical gguf in test (#14872)
also deleted bad test_load_sample_mxfp4, added some hard coded simple tests
2026-02-18 19:55:09 -05:00
qazal
f590564bf7 gemm multiple is only for cdna4 asm (#14814)
* gemm multiple is only for cdna4 asm

* move to backend

* and arch

* path
2026-02-17 14:00:02 +09:00
nimlgen
131bbbbfd8 am: smu_v13_0_12 (#14800) 2026-02-16 22:58:10 +03:00
George Hotz
dff9cf35c2 amd asm emulator fixes + run it in CI (#14786)
* amd asm fix, try 2

* fix tests
2026-02-16 13:24:21 +08:00
qazal
55a4dfa2e0 cdna4 asm_gemm tests in CI on the null backend (#14785)
* cdna4 asm_gemm tests in CI on the null backend

* no .numpy() in null

* better

* gemm/asm: device comes from renderer
2026-02-16 14:06:23 +09:00
kevvz
33b2ade8cd Rdna4 emulator test_ops, dtypes pass (#14773)
* test_ops, test_dtypes pass

* merge cdna4

* ruff + more tests

* reorganize

* /backend

* again

* again...

* add rdna4
2026-02-16 10:13:39 +08:00
George Hotz
bd18217f32 add rdna3/rdna4/cdna4 to testamd (#14778)
* add rdna3/rdna4/cdna4 to testamd

* test simplify

* ci cleanups

* mergable

* skip slow
2026-02-16 09:45:16 +08:00
Christopher Milan
9c95a11f90 autogen: handle rocm bump and better error wording (#14776)
* autogen: handle rocm bump and better error wording

* regen
2026-02-15 19:23:47 -05:00
qazal
33b31d9cd6 tinykittens flash attention dtype fix, add CI (#14770)
* don't hardcdoe amd device

* add failing tests, ci too

* fix: fix for dtype mixin

* bump to rocm 7.1

---------

Co-authored-by: Woze Parrot <wozeparrot@gmail.com>
2026-02-16 01:15:11 +09:00
qazal
9da7f5e733 disable process replay for AMD emulator renderer [pr] (#14766)
* disable process replay for AMD emulator renderer [pr]

* line

* skip
2026-02-15 18:52:37 +09:00
George Hotz
5289b4e882 renderer/amd: add cdna emulator (#14721)
* renderer/amd: add cdna emulator

* fixes

* no predecode

* no early

* REMU_PATH

* delete that

* round

* Fix cache invalidation check in _compile_smem
2026-02-13 16:06:58 +08:00
George Hotz
d3adb8428e Revert "hotfix: skip test/amd in macpytest" (#14704)
* Revert "hotfix: skip test/amd in macpytest"

This reverts commit b7dade2adf.

* no llvm subprocess

* simpler

* sys.exec

* cleanup

* process safe

* diag

* arm ftz support

* 5 sec

* this one
2026-02-13 08:00:24 +08:00
Christopher Milan
084d0d0103 cleanup macos webgpu tests (#14715) 2026-02-12 17:56:34 -05:00
Christopher Milan
c30bb0f006 fix WEBGPU isnan check (#14711) 2026-02-12 17:01:18 -05:00
George Hotz
b7dade2adf hotfix: skip test/amd in macpytest 2026-02-12 18:16:04 +08:00
George Hotz
4680247e35 renderer/amd: move in tree (#14702)
* renderer/amd: move in tree

* fix paths in tests

* 24000 lines

* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
095a064ba8 test.yml explicitly says backend (#14700)
* test.yml explicitly says backend

* 1e-5
2026-02-12 16:03:44 +08:00
George Hotz
c331798201 move tests to test/backend (#14691)
* move tests to test/backend

* fix imports

* fix CI

* revert that one

* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
George Hotz
cc9bf8ccbc move more to null/unit tests (#14658)
* move more to null tests

* move test_gc

* no test fusion op
2026-02-10 13:35:17 +08:00
Christopher Milan
b36b62eb59 don't push docker cache for PRs (#14652) 2026-02-09 19:55:55 -05:00
Christopher Milan
396e1320fb bump cache version for z3 (#14650) 2026-02-09 19:32:07 -05:00
wozeparrot
d87ae1c84c feat: tinyfs load test in benchmark (#14602) 2026-02-06 18:00:00 -08:00
Garret Castro
cee7ef7ab2 disable threads (#14555) 2026-02-05 16:11:32 -05:00
chenyu
41a179f542 fix test_xlm_roberta_large (#14564)
onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too
2026-02-05 14:56:06 -05:00
Christopher Milan
b47397ab17 list ml_dtypes as dependency for DSP (#14562)
* pin onnxruntime to 1.23.2 for DSP

* list ml_dtypes instead

This reverts commit 84bb2cc0fc.
2026-02-05 14:27:50 -05:00
George Hotz
d59e6e7a37 move more tests to test/null, split some existing ones (#14512)
* move more tests to test/null, split some existing ones

* null work

* null work

* move more

* fixes

* move PIL

* PIL in CLIP

* don't move that
2026-02-03 20:20:20 +08:00
George Hotz
dc77b3318b move files that pass with NULL=1 to test/null (#14508)
* move files that pass with NULL=1 to test/null

* fix windows

* cpu 0

* bugfix + durations
2026-02-03 13:52:36 +08:00
George Hotz
85c7b23160 add pytest -nauto to benchmark for mac (#14458)
* add pytest -nauto to benchmark

* 3 minute timeout

* 3 min

* setup env

* comment

* fresh db

* in the pyenv
2026-02-03 12:26:09 +08:00
Christopher Milan
a5d7eb37db IR3 works on versions earlier than 3.14 (#14507) 2026-02-02 23:10:19 -05:00
George Hotz
33c886cafa disable copyout on NULL backend by default (#14506)
* disable copyout on NULL backend

* gate it

* allow copyout on some tests
2026-02-03 11:57:47 +08:00
George Hotz
6e958dbfd4 assembly/amd: add RDNA4 support to emulator (#14341)
* start new rdna4

* work

* plus works

* more pass

* rdna4

* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp

* stale

* rev

* rr

* rdna4 emu tests

* cleanup

* cleanup

* simp

* works

* better factorizaion

* hacks

* fix mockgpu

* guard both

* cleaner

* gate

* bug fix and a few tests

* all test_tiny
2026-02-02 21:35:59 +08:00
Christopher Milan
e575dd8275 prevent UB in long decomp and more emulated tests (#14447) 2026-01-30 19:38:41 -05:00
Christopher Milan
1803ee939d EMULATED_DTYPES=long works with CPU_LLVM (#14446) 2026-01-30 13:54:43 -05:00
Christopher Milan
88caf57ef4 ci: unify python versions (#14430) 2026-01-29 21:42:03 -05:00
Christopher Milan
e47f12f671 ci: replace testing_minimal with testing_unit (#14427) 2026-01-29 18:02:43 -05:00
Christopher Milan
0c855d6149 ci: remove unused pydeps (#14418) 2026-01-29 01:51:26 -05:00
chenyu
37cde4a01a add one line mypy report (#14415) 2026-01-28 20:39:32 -05:00
nimlgen
544928766d hcq_smi: kill mac pids (#14398) 2026-01-28 15:00:28 +03:00
qazal
5bffa17f82 llama train: better NULL=1 EMULATE=AMD_CDNA4 dev experience (#14395)
* beam opens devices

* switch to hip renderer

* amd: true?

* llvm true is for test_autogen
2026-01-28 17:31:22 +09:00
Christopher Milan
067e27857e nested composite actions don't work (#14393) 2026-01-28 00:13:30 -05:00
Christopher Milan
9dddf3d478 don't save caches for PRs, try 2 (#14391) 2026-01-27 23:30:17 -05:00
Christopher Milan
68fe5d8b36 Revert "don't save caches for PRs (#14389)" (#14390) 2026-01-27 23:22:26 -05:00
Christopher Milan
4ab228b498 don't save caches for PRs (#14389) 2026-01-27 23:21:31 -05:00
Christopher Milan
5e36482314 decompose long to ints where unsupported, try 2 (#14383) 2026-01-27 23:20:43 -05:00
George Hotz
88bc5ee212 assembly/amd: rename to better names (#14384)
* assembly/amd: rename to better names

* might help fuzzing segfault

* emu2 -> emu
2026-01-28 10:00:54 +08:00
chenyu
cd22ee9ed0 add InvalidType to ConstType [pr] (#14373)
* add InvalidType to ConstType [pr]

TYPED=1 python test/test_tiny.py passes.
added PyConst = float|int|bool for some Tensor level input types

* hcq
2026-01-27 14:09:34 -05:00
chenyu
db010a31be IGNORE_OOB -> CHECK_OOB [pr] (#14374)
flip the meaning
2026-01-27 12:20:59 -05:00
Christopher Milan
c9c533fc78 libclang path is homebrew on macos (#14357)
* libclang path is homebrew macos

* typo

* ugh

* typo

* regen

* no LIBCLANG_PATH
2026-01-26 17:32:09 -05:00
qazal
2d91fe6310 use amdgpu dsl in mmapeak (#14342)
* use amdgpu dsl in mmapeak

* don't rely on llvm for vgpr counting

* llvm roundtrip assert

* rm it, add ci

* vgpr_count

* move emulated test to amd, it needs comgr

* env

* arch

* inst._fields -> inst.operands

* vgpr offset
2026-01-26 22:03:43 +09:00
qazal
b2e2ace85b viz: remove ci check, it's VIZ=-1/-2 (#14343) 2026-01-26 20:36:23 +09:00