Commit Graph

1224 Commits

Author SHA1 Message Date
George Hotz
b7dade2adf hotfix: skip test/amd in macpytest 2026-02-12 18:16:04 +08:00
George Hotz
4680247e35 renderer/amd: move in tree (#14702)
* renderer/amd: move in tree

* fix paths in tests

* 24000 lines

* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
095a064ba8 test.yml explicitly says backend (#14700)
* test.yml explicitly says backend

* 1e-5
2026-02-12 16:03:44 +08:00
George Hotz
c331798201 move tests to test/backend (#14691)
* move tests to test/backend

* fix imports

* fix CI

* revert that one

* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
George Hotz
cc9bf8ccbc move more to null/unit tests (#14658)
* move more to null tests

* move test_gc

* no test fusion op
2026-02-10 13:35:17 +08:00
Christopher Milan
b36b62eb59 don't push docker cache for PRs (#14652) 2026-02-09 19:55:55 -05:00
Christopher Milan
396e1320fb bump cache version for z3 (#14650) 2026-02-09 19:32:07 -05:00
wozeparrot
d87ae1c84c feat: tinyfs load test in benchmark (#14602) 2026-02-06 18:00:00 -08:00
Garret Castro
cee7ef7ab2 disable threads (#14555) 2026-02-05 16:11:32 -05:00
chenyu
41a179f542 fix test_xlm_roberta_large (#14564)
onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too
2026-02-05 14:56:06 -05:00
Christopher Milan
b47397ab17 list ml_dtypes as dependency for DSP (#14562)
* pin onnxruntime to 1.23.2 for DSP

* list ml_dtypes instead

This reverts commit 84bb2cc0fc.
2026-02-05 14:27:50 -05:00
George Hotz
d59e6e7a37 move more tests to test/null, split some existing ones (#14512)
* move more tests to test/null, split some existing ones

* null work

* null work

* move more

* fixes

* move PIL

* PIL in CLIP

* don't move that
2026-02-03 20:20:20 +08:00
George Hotz
dc77b3318b move files that pass with NULL=1 to test/null (#14508)
* move files that pass with NULL=1 to test/null

* fix windows

* cpu 0

* bugfix + durations
2026-02-03 13:52:36 +08:00
George Hotz
85c7b23160 add pytest -nauto to benchmark for mac (#14458)
* add pytest -nauto to benchmark

* 3 minute timeout

* 3 min

* setup env

* comment

* fresh db

* in the pyenv
2026-02-03 12:26:09 +08:00
Christopher Milan
a5d7eb37db IR3 works on versions earlier than 3.14 (#14507) 2026-02-02 23:10:19 -05:00
George Hotz
33c886cafa disable copyout on NULL backend by default (#14506)
* disable copyout on NULL backend

* gate it

* allow copyout on some tests
2026-02-03 11:57:47 +08:00
George Hotz
6e958dbfd4 assembly/amd: add RDNA4 support to emulator (#14341)
* start new rdna4

* work

* plus works

* more pass

* rdna4

* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp

* stale

* rev

* rr

* rdna4 emu tests

* cleanup

* cleanup

* simp

* works

* better factorizaion

* hacks

* fix mockgpu

* guard both

* cleaner

* gate

* bug fix and a few tests

* all test_tiny
2026-02-02 21:35:59 +08:00
Christopher Milan
e575dd8275 prevent UB in long decomp and more emulated tests (#14447) 2026-01-30 19:38:41 -05:00
Christopher Milan
1803ee939d EMULATED_DTYPES=long works with CPU_LLVM (#14446) 2026-01-30 13:54:43 -05:00
Christopher Milan
88caf57ef4 ci: unify python versions (#14430) 2026-01-29 21:42:03 -05:00
Christopher Milan
e47f12f671 ci: replace testing_minimal with testing_unit (#14427) 2026-01-29 18:02:43 -05:00
Christopher Milan
0c855d6149 ci: remove unused pydeps (#14418) 2026-01-29 01:51:26 -05:00
chenyu
37cde4a01a add one line mypy report (#14415) 2026-01-28 20:39:32 -05:00
nimlgen
544928766d hcq_smi: kill mac pids (#14398) 2026-01-28 15:00:28 +03:00
qazal
5bffa17f82 llama train: better NULL=1 EMULATE=AMD_CDNA4 dev experience (#14395)
* beam opens devices

* switch to hip renderer

* amd: true?

* llvm true is for test_autogen
2026-01-28 17:31:22 +09:00
Christopher Milan
067e27857e nested composite actions don't work (#14393) 2026-01-28 00:13:30 -05:00
Christopher Milan
9dddf3d478 don't save caches for PRs, try 2 (#14391) 2026-01-27 23:30:17 -05:00
Christopher Milan
68fe5d8b36 Revert "don't save caches for PRs (#14389)" (#14390) 2026-01-27 23:22:26 -05:00
Christopher Milan
4ab228b498 don't save caches for PRs (#14389) 2026-01-27 23:21:31 -05:00
Christopher Milan
5e36482314 decompose long to ints where unsupported, try 2 (#14383) 2026-01-27 23:20:43 -05:00
George Hotz
88bc5ee212 assembly/amd: rename to better names (#14384)
* assembly/amd: rename to better names

* might help fuzzing segfault

* emu2 -> emu
2026-01-28 10:00:54 +08:00
chenyu
cd22ee9ed0 add InvalidType to ConstType [pr] (#14373)
* add InvalidType to ConstType [pr]

TYPED=1 python test/test_tiny.py passes.
added PyConst = float|int|bool for some Tensor level input types

* hcq
2026-01-27 14:09:34 -05:00
chenyu
db010a31be IGNORE_OOB -> CHECK_OOB [pr] (#14374)
flip the meaning
2026-01-27 12:20:59 -05:00
Christopher Milan
c9c533fc78 libclang path is homebrew on macos (#14357)
* libclang path is homebrew macos

* typo

* ugh

* typo

* regen

* no LIBCLANG_PATH
2026-01-26 17:32:09 -05:00
qazal
2d91fe6310 use amdgpu dsl in mmapeak (#14342)
* use amdgpu dsl in mmapeak

* don't rely on llvm for vgpr counting

* llvm roundtrip assert

* rm it, add ci

* vgpr_count

* move emulated test to amd, it needs comgr

* env

* arch

* inst._fields -> inst.operands

* vgpr offset
2026-01-26 22:03:43 +09:00
qazal
b2e2ace85b viz: remove ci check, it's VIZ=-1/-2 (#14343) 2026-01-26 20:36:23 +09:00
George Hotz
be23776ba7 assembly/amd: replace pcode with ucode (#14002)
* a bunch of todos for my boy claude

* uops have types

* lil cleanups

* simpler ucode

* isNAN

* calls

* move more

* cleanup pcode_parse

* cvt functions

* fix parser bugs

* no void

* minmax

* more pcode parse

* pretty print

* transform

* comments

* move to transform

* assign/declare

* simpler norm

* single PM

* just Uops

* simpler

* more typed

* all rewrite

* less verbose

* work

* spec

* transform

* work

* simpler spec

* less spec

* bitcast

* simpler

* simp ucode

* work

* more in pcode_transform

* remove junk

* more functions

* bug

* no void assign

* load/store

* wave

* fixes

* move denorm

* move more functions

* tests

* cat is shape None

* uop syntax

* move a few more

* program_spec

* cat stuff

* assign fix clear

* unused

* nans

* fp bits

* works with simplify

* remove junk

* special

* meh

* more

* more

* update test pcode parse

* improve parser

* parse some for loops

* merge master

* dead files

* tests pass

* emu2

* better emu2

* test_plus works

* uselessly write more instructions

* use pcode

* something

* something

* bench_emu

* progress

* ds works

* work

* work

* more passing

* run compare

* bench_emu

* more pcode

* a few more

* bugfixes

* bugfix

* test fixes

* tests pass without USE_HW

* all hw tests pass

* add more hw tests

* new hw tests

* bit

* less handcode

* parse more

* consolidate pcode

* fixes

* rsrc

* lane pcode

* cleanups

* simpler

* emu bugs

* one cmp test fails

* fix decode and upd name

* fix name and test harness

* _ftz_f32

* fix denorm

* fix VOPD and use load

* fix carry bug

* no load where / just invalid

* clean

* simpler

* merge sops

* refactoring

* simplifications

* bugfixes

* new tests

* f16 sin fix

* assertion and hw tests

* cvt functions

* one more failure

* bugfixes

* bugfix + regression

* more tests

* fmac

* no manual unrolling

* ordering

* LLVM backend is a lot faster

* compile inst

* more bugs

* f16

* bugfix

* fix regression

* one clang call

* 1M inst

* scratch works

* do scratch correctly

* cleanup

* regression

* cmp

* fmamk fixes

* merge

* fix vcmpx

* unify memory

* remove unused code

* ignore oob for test

* cleanups

* fix mbs

* unify cmp

* test

* minor cleanups

* bump timeout

* fix tests

* revert the CMPLE stuff

* remove opt

* less diff

* simpler

* revert

* support multiple backends

* memset is a lot faster

* split out in bench emu

* improve timing

* timing

* cache that

* cache that

* simpler and faster

* tokenize

* binop table

* simpler

* move to parser

* tok for lambda

* refactor

* expr_parser

* delete emu2_pcode

* import cleanup

* lil

* if parse

* work

* simpler

* no v

* trig preop is faster

* durations for tests

* fix cmp bug

* sdst

* remove scartch_size hack

* null behavior

* _MXCSRContext

* bugfixes

* DEBUG >= 3

* test smem crashes my gpu

* debug

* test

* test smem

* profiler

* full inst

* bugfix

* rtag(1)

* pc is 64-bit and word

* pc is real code now

* dynamic

* more dynamic

* fix oob access

* fix crash, more dyn

* all dyn

* really all dyn

* correct null mask

* lit + format

* 21s on the tests

* 13s on the tests

* canonical name

* simm16

* more dyn

* 14s

* proper saddr dedup

* dyn

* debug 5

* better 5

* revert dynamic stuff

* that can be dyn

* negative offsets

* dyn wmma

* f16 wmma support / ops / dtype / dtype_alu

* symbolic changes not needed

* ConstFloat

* more uop.const

* __eq__

* uop tests

* fix f16

* bf16 tensor cores

* whitespace

* remove cast roundtrip

* Revert "remove cast roundtrip"

This reverts commit c5bb0381c3.

* just the fix

* remove dead paths

* llvm runs
2026-01-26 18:04:29 +08:00
nimlgen
21ab23ae18 nv: add pma for ada (#14328)
* nv: add pma for ada

* um

* fix

* shorter

* mock
2026-01-25 17:33:37 +03:00
chenyu
7e41da1ae8 fix generate_dataset.sh (#14324)
added `set -e` so wrong pathes would fail the script, then fixed the path
2026-01-24 16:47:10 -05:00
George Hotz
52b989c6c8 don't place consts early + fixes from anthropic challenge (#14286)
* don't place consts early

* add anthropic challenge

* with ref

* do we still have to devectorize bools?

* tests pass

* just WHERE

* fine, revert that

* fine, revert

* only index

* z3 validator doesn't support vectorized

* Revert "z3 validator doesn't support vectorized"

This reverts commit 1b7930ecb3.

* z3 not for vec

* no spec

* VLIWRenderer

* loop unrolling

* better comments

* cleanups

* skip cast

* renderer

* cleanups

* prints

* no hack

* hacks

* bump to 11

* reg warning

* lil clean

* cleaner renderer
2026-01-23 10:48:39 +09:00
nimlgen
8cd22df2dd amd: alive wgps (#14149)
* amd: disabled wgps

* l

* wgp

* uoops

* mockgpu

* drm

* ad this

* fi

* reg
2026-01-23 00:08:45 +03:00
chenyu
e04767e39e run pre-commit in ci (#14253)
* run pre-commit in ci

prevents pre-commit regression

* IGNORE_OOB=1

* pytest

* unit test

* split
2026-01-20 12:24:33 -05:00
nimlgen
dc82856084 tbgpu: shim binary + remote apl pci dev (#14124)
* shim binary + remote pci dev

* v2

* rip out apl

* cmds

* rename

* clean

* remove

* rm gitignore

* ui

* install

* linter

* um

* cleaner

* assets

* normal install in ui

* cleaner app

* install script

* support fd mmap

* cleaner

* kill server when disconn

* rename + pcidevs

* sign

* install and reinstall

* no sip install

* will trigger update

* nv

* ugh

* this

* fix

* nv

* use nosip sign

* auto install

* remove

* mypy

* upd

* ditto

* print

* simpler

* ditto

* um

* simpler

* upd

* upd

* cleaner

* autogen

* cleaner

* move

* annotations

* server cleaner
2026-01-20 16:15:18 +03:00
qazal
b1c5a242b7 Revert "move is_dtype_supported logic to renderer (#14188)" (#14237)
This reverts commit 161fee9a48.
2026-01-20 12:19:14 +09:00
Christopher Milan
161fee9a48 move is_dtype_supported logic to renderer (#14188)
* move is_dtype_supported logic to renderer

* fix CPU_COUNT

* mypy happy

* early import libclang too with llvm

* run with debug

* skip autogen tests if MTLCompiler or llvm is loaded

* run autogen tests separately in CI

* lint
2026-01-18 22:37:04 -05:00
Christopher Milan
1eb110cd7d fix memory corruption in NIR, reenable process replay (#14204) 2026-01-18 02:05:12 -05:00
qazal
feaa804158 skip lvp process replay in CI [pr] (#14202) 2026-01-18 13:25:04 +09:00
chenyu
dc4ae7dd08 lower ASSERT_MIN_STEP_TIME for driving_policy to 3ms (#14184)
seems quite stable at 2.7ms now
2026-01-16 15:04:53 -05:00
George Hotz
e9ce12028e assembly/amd: amdxml cleanups, remove broken SDWA/DPP, merge in pdf.py (#14154)
* assembly/amd: amdxml cleanups, remove broken SDWA/DPP

* remove buf junk

* simplify

* simplify

* lil cleanup

* dead fixes

* strip non pcode extraction from pdf

* merge pdf.py into amdxml.py

* only amdxml
2026-01-15 09:23:19 +09:00
chenyu
986e865830 fix TINY_BACKEND=1 cumsum (#14138)
* fix TINY_BACKEND=1 cumsum

old hack was wrong, need to apply contiguous on the input

* test time

* test_linalg_svd is slow
2026-01-14 09:54:49 -05:00