1297 Commits

Author SHA1 Message Date
George Hotz
f45199269b hotfix: regress NV cifar_10steps_half to 120 ms 2026-02-23 12:29:25 +08:00
qazal
60f90dd97c sqtt: fix jitted program deduping, failing test for graphed kernels (#14951)
* work

* hcq_profile fix, test with JIT=2 passes

* ci, -n=auto

* rm duplicate test

* less
2026-02-22 15:22:31 +09:00
nimlgen
6de15dc480 mockam usb (#14916)
* mockam usb

* f

* win

* x

* x
2026-02-21 23:05:54 +03:00
Christopher Milan
5ee654b0d9 test IMAGE=1 driving_vision in mac pytest (#14921)
* test IMAGE=1 driving_vision in mac pytest

* don't multiply array
2026-02-20 18:28:10 -05:00
chenyu
697d0b06c2 update env for testmacpytest (#14912)
CI: ""
CAPTURE_PROCESS_REPLAY: "0"
2026-02-20 13:42:50 -05:00
chenyu
07d145debd compile3 0.10.1 driving_vision in mac pytest (#14911)
* compile3 0.10.1 driving_vision in mac pytest

* sync before re-executing onetime kernels
2026-02-20 12:23:52 -05:00
chenyu
d895713116 remove temp onnx migration CI job (#14910) 2026-02-20 11:38:44 -05:00
George Hotz
55d3a5def9 preallocate all realized buffers (#14823)
* preallocate all realized buffers

* contiguous

* work

* comment that out

* move to schedule

* better

* correct fix

* just buffer

* disk bufs

* fixes disk tensor stuff

* fix symbolic stuff

* fix multi

* 162 failures

* bugfixes

* don't check that anymore

* fix schedule tests

* mnist should be contiguious

* type and buffer

* fix tests

* shrink axis correction

* mypy fixes

* tests skips

* same 37 failures

* dedup

* no shrink in the graph

* 29 failures

* skips

* fix custom kernel

* fix training

* those optimizations aren't supported currently

* simpler

* more correct

* tests

* 14 failures

* works

* fix that test

* broken

* 11 failures

* only kernel counts left

* fixes

* all tests pass

* remove tensor_map

* op test

* 200 -> 230

* test fixes

* fixes

* revert test_tiny thing

* guard

* revert that

* test tiny passes

* no contigs there

* base realize back

* Revert "no contigs there"

This reverts commit c45bb9fcfd.

* revert that

* chop many assigns

* 12 failures

* fix tests

* tests

* apply after

* pre-commit

* remove old code

* delete that

* fix types

* remove extra contig

* fix dataloader

* torch fix

* disk fix

* update kernel fusion numbres

* runs on amd

* restore kernel count

* add that rule back

* that

* disable that

* wrong

* add the correct rule for that folding

* more tests

* guard c1.arg

* no newlines

* realize those

* split into a different file

* remove detach/contig back

* skip 2

* update that
2026-02-20 20:05:54 +08:00
nimlgen
dbf894215a init mockam (#14889)
* mockam

* more tests

* linter

* x
2026-02-20 14:09:11 +03:00
chenyu
e8252e6e4f use offical gguf in test (#14872)
also deleted bad test_load_sample_mxfp4, added some hard coded simple tests
2026-02-18 19:55:09 -05:00
qazal
f590564bf7 gemm multiple is only for cdna4 asm (#14814)
* gemm multiple is only for cdna4 asm

* move to backend

* and arch

* path
2026-02-17 14:00:02 +09:00
nimlgen
131bbbbfd8 am: smu_v13_0_12 (#14800) 2026-02-16 22:58:10 +03:00
George Hotz
dff9cf35c2 amd asm emulator fixes + run it in CI (#14786)
* amd asm fix, try 2

* fix tests
2026-02-16 13:24:21 +08:00
qazal
55a4dfa2e0 cdna4 asm_gemm tests in CI on the null backend (#14785)
* cdna4 asm_gemm tests in CI on the null backend

* no .numpy() in null

* better

* gemm/asm: device comes from renderer
2026-02-16 14:06:23 +09:00
kevvz
33b2ade8cd Rdna4 emulator test_ops, dtypes pass (#14773)
* test_ops, test_dtypes pass

* merge cdna4

* ruff + more tests

* reorganize

* /backend

* again

* again...

* add rdna4
2026-02-16 10:13:39 +08:00
George Hotz
bd18217f32 add rdna3/rdna4/cdna4 to testamd (#14778)
* add rdna3/rdna4/cdna4 to testamd

* test simplify

* ci cleanups

* mergable

* skip slow
2026-02-16 09:45:16 +08:00
Christopher Milan
9c95a11f90 autogen: handle rocm bump and better error wording (#14776)
* autogen: handle rocm bump and better error wording

* regen
2026-02-15 19:23:47 -05:00
qazal
33b31d9cd6 tinykittens flash attention dtype fix, add CI (#14770)
* don't hardcdoe amd device

* add failing tests, ci too

* fix: fix for dtype mixin

* bump to rocm 7.1

---------

Co-authored-by: Woze Parrot <wozeparrot@gmail.com>
2026-02-16 01:15:11 +09:00
qazal
9da7f5e733 disable process replay for AMD emulator renderer [pr] (#14766)
* disable process replay for AMD emulator renderer [pr]

* line

* skip
2026-02-15 18:52:37 +09:00
George Hotz
5289b4e882 renderer/amd: add cdna emulator (#14721)
* renderer/amd: add cdna emulator

* fixes

* no predecode

* no early

* REMU_PATH

* delete that

* round

* Fix cache invalidation check in _compile_smem
2026-02-13 16:06:58 +08:00
George Hotz
d3adb8428e Revert "hotfix: skip test/amd in macpytest" (#14704)
* Revert "hotfix: skip test/amd in macpytest"

This reverts commit b7dade2adf.

* no llvm subprocess

* simpler

* sys.exec

* cleanup

* process safe

* diag

* arm ftz support

* 5 sec

* this one
2026-02-13 08:00:24 +08:00
Christopher Milan
084d0d0103 cleanup macos webgpu tests (#14715) 2026-02-12 17:56:34 -05:00
Christopher Milan
c30bb0f006 fix WEBGPU isnan check (#14711) 2026-02-12 17:01:18 -05:00
George Hotz
b7dade2adf hotfix: skip test/amd in macpytest 2026-02-12 18:16:04 +08:00
George Hotz
4680247e35 renderer/amd: move in tree (#14702)
* renderer/amd: move in tree

* fix paths in tests

* 24000 lines

* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
095a064ba8 test.yml explicitly says backend (#14700)
* test.yml explicitly says backend

* 1e-5
2026-02-12 16:03:44 +08:00
George Hotz
c331798201 move tests to test/backend (#14691)
* move tests to test/backend

* fix imports

* fix CI

* revert that one

* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
George Hotz
cc9bf8ccbc move more to null/unit tests (#14658)
* move more to null tests

* move test_gc

* no test fusion op
2026-02-10 13:35:17 +08:00
Christopher Milan
b36b62eb59 don't push docker cache for PRs (#14652) 2026-02-09 19:55:55 -05:00
Christopher Milan
396e1320fb bump cache version for z3 (#14650) 2026-02-09 19:32:07 -05:00
wozeparrot
d87ae1c84c feat: tinyfs load test in benchmark (#14602) 2026-02-06 18:00:00 -08:00
Garret Castro
cee7ef7ab2 disable threads (#14555) 2026-02-05 16:11:32 -05:00
chenyu
41a179f542 fix test_xlm_roberta_large (#14564)
onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too
2026-02-05 14:56:06 -05:00
Christopher Milan
b47397ab17 list ml_dtypes as dependency for DSP (#14562)
* pin onnxruntime to 1.23.2 for DSP

* list ml_dtypes instead

This reverts commit 84bb2cc0fc.
2026-02-05 14:27:50 -05:00
George Hotz
d59e6e7a37 move more tests to test/null, split some existing ones (#14512)
* move more tests to test/null, split some existing ones

* null work

* null work

* move more

* fixes

* move PIL

* PIL in CLIP

* don't move that
2026-02-03 20:20:20 +08:00
George Hotz
dc77b3318b move files that pass with NULL=1 to test/null (#14508)
* move files that pass with NULL=1 to test/null

* fix windows

* cpu 0

* bugfix + durations
2026-02-03 13:52:36 +08:00
George Hotz
85c7b23160 add pytest -nauto to benchmark for mac (#14458)
* add pytest -nauto to benchmark

* 3 minute timeout

* 3 min

* setup env

* comment

* fresh db

* in the pyenv
2026-02-03 12:26:09 +08:00
Christopher Milan
a5d7eb37db IR3 works on versions earlier than 3.14 (#14507) 2026-02-02 23:10:19 -05:00
George Hotz
33c886cafa disable copyout on NULL backend by default (#14506)
* disable copyout on NULL backend

* gate it

* allow copyout on some tests
2026-02-03 11:57:47 +08:00
George Hotz
6e958dbfd4 assembly/amd: add RDNA4 support to emulator (#14341)
* start new rdna4

* work

* plus works

* more pass

* rdna4

* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp

* stale

* rev

* rr

* rdna4 emu tests

* cleanup

* cleanup

* simp

* works

* better factorizaion

* hacks

* fix mockgpu

* guard both

* cleaner

* gate

* bug fix and a few tests

* all test_tiny
2026-02-02 21:35:59 +08:00
Christopher Milan
e575dd8275 prevent UB in long decomp and more emulated tests (#14447) 2026-01-30 19:38:41 -05:00
Christopher Milan
1803ee939d EMULATED_DTYPES=long works with CPU_LLVM (#14446) 2026-01-30 13:54:43 -05:00
Christopher Milan
88caf57ef4 ci: unify python versions (#14430) 2026-01-29 21:42:03 -05:00
Christopher Milan
e47f12f671 ci: replace testing_minimal with testing_unit (#14427) 2026-01-29 18:02:43 -05:00
Christopher Milan
0c855d6149 ci: remove unused pydeps (#14418) 2026-01-29 01:51:26 -05:00
chenyu
37cde4a01a add one line mypy report (#14415) 2026-01-28 20:39:32 -05:00
nimlgen
544928766d hcq_smi: kill mac pids (#14398) 2026-01-28 15:00:28 +03:00
qazal
5bffa17f82 llama train: better NULL=1 EMULATE=AMD_CDNA4 dev experience (#14395)
* beam opens devices

* switch to hip renderer

* amd: true?

* llvm true is for test_autogen
2026-01-28 17:31:22 +09:00
Christopher Milan
067e27857e nested composite actions don't work (#14393) 2026-01-28 00:13:30 -05:00
Christopher Milan
9dddf3d478 don't save caches for PRs, try 2 (#14391) 2026-01-27 23:30:17 -05:00