George Hotz
b7dade2adf
hotfix: skip test/amd in macpytest
2026-02-12 18:16:04 +08:00
George Hotz
4680247e35
renderer/amd: move in tree ( #14702 )
...
* renderer/amd: move in tree
* fix paths in tests
* 24000 lines
* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
095a064ba8
test.yml explicitly says backend ( #14700 )
...
* test.yml explicitly says backend
* 1e-5
2026-02-12 16:03:44 +08:00
George Hotz
c331798201
move tests to test/backend ( #14691 )
...
* move tests to test/backend
* fix imports
* fix CI
* revert that one
* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
George Hotz
cc9bf8ccbc
move more to null/unit tests ( #14658 )
...
* move more to null tests
* move test_gc
* no test fusion op
2026-02-10 13:35:17 +08:00
Christopher Milan
b36b62eb59
don't push docker cache for PRs ( #14652 )
2026-02-09 19:55:55 -05:00
Christopher Milan
396e1320fb
bump cache version for z3 ( #14650 )
2026-02-09 19:32:07 -05:00
wozeparrot
d87ae1c84c
feat: tinyfs load test in benchmark ( #14602 )
2026-02-06 18:00:00 -08:00
Garret Castro
cee7ef7ab2
disable threads ( #14555 )
2026-02-05 16:11:32 -05:00
chenyu
41a179f542
fix test_xlm_roberta_large ( #14564 )
...
onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too
2026-02-05 14:56:06 -05:00
Christopher Milan
b47397ab17
list ml_dtypes as dependency for DSP ( #14562 )
...
* pin onnxruntime to 1.23.2 for DSP
* list ml_dtypes instead
This reverts commit 84bb2cc0fc .
2026-02-05 14:27:50 -05:00
George Hotz
d59e6e7a37
move more tests to test/null, split some existing ones ( #14512 )
...
* move more tests to test/null, split some existing ones
* null work
* null work
* move more
* fixes
* move PIL
* PIL in CLIP
* don't move that
2026-02-03 20:20:20 +08:00
George Hotz
dc77b3318b
move files that pass with NULL=1 to test/null ( #14508 )
...
* move files that pass with NULL=1 to test/null
* fix windows
* cpu 0
* bugfix + durations
2026-02-03 13:52:36 +08:00
George Hotz
85c7b23160
add pytest -nauto to benchmark for mac ( #14458 )
...
* add pytest -nauto to benchmark
* 3 minute timeout
* 3 min
* setup env
* comment
* fresh db
* in the pyenv
2026-02-03 12:26:09 +08:00
Christopher Milan
a5d7eb37db
IR3 works on versions earlier than 3.14 ( #14507 )
2026-02-02 23:10:19 -05:00
George Hotz
33c886cafa
disable copyout on NULL backend by default ( #14506 )
...
* disable copyout on NULL backend
* gate it
* allow copyout on some tests
2026-02-03 11:57:47 +08:00
George Hotz
6e958dbfd4
assembly/amd: add RDNA4 support to emulator ( #14341 )
...
* start new rdna4
* work
* plus works
* more pass
* rdna4
* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp
* stale
* rev
* rr
* rdna4 emu tests
* cleanup
* cleanup
* simp
* works
* better factorizaion
* hacks
* fix mockgpu
* guard both
* cleaner
* gate
* bug fix and a few tests
* all test_tiny
2026-02-02 21:35:59 +08:00
Christopher Milan
e575dd8275
prevent UB in long decomp and more emulated tests ( #14447 )
2026-01-30 19:38:41 -05:00
Christopher Milan
1803ee939d
EMULATED_DTYPES=long works with CPU_LLVM ( #14446 )
2026-01-30 13:54:43 -05:00
Christopher Milan
88caf57ef4
ci: unify python versions ( #14430 )
2026-01-29 21:42:03 -05:00
Christopher Milan
e47f12f671
ci: replace testing_minimal with testing_unit ( #14427 )
2026-01-29 18:02:43 -05:00
Christopher Milan
0c855d6149
ci: remove unused pydeps ( #14418 )
2026-01-29 01:51:26 -05:00
chenyu
37cde4a01a
add one line mypy report ( #14415 )
2026-01-28 20:39:32 -05:00
nimlgen
544928766d
hcq_smi: kill mac pids ( #14398 )
2026-01-28 15:00:28 +03:00
qazal
5bffa17f82
llama train: better NULL=1 EMULATE=AMD_CDNA4 dev experience ( #14395 )
...
* beam opens devices
* switch to hip renderer
* amd: true?
* llvm true is for test_autogen
2026-01-28 17:31:22 +09:00
Christopher Milan
067e27857e
nested composite actions don't work ( #14393 )
2026-01-28 00:13:30 -05:00
Christopher Milan
9dddf3d478
don't save caches for PRs, try 2 ( #14391 )
2026-01-27 23:30:17 -05:00
Christopher Milan
68fe5d8b36
Revert "don't save caches for PRs ( #14389 )" ( #14390 )
2026-01-27 23:22:26 -05:00
Christopher Milan
4ab228b498
don't save caches for PRs ( #14389 )
2026-01-27 23:21:31 -05:00
Christopher Milan
5e36482314
decompose long to ints where unsupported, try 2 ( #14383 )
2026-01-27 23:20:43 -05:00
George Hotz
88bc5ee212
assembly/amd: rename to better names ( #14384 )
...
* assembly/amd: rename to better names
* might help fuzzing segfault
* emu2 -> emu
2026-01-28 10:00:54 +08:00
chenyu
cd22ee9ed0
add InvalidType to ConstType [pr] ( #14373 )
...
* add InvalidType to ConstType [pr]
TYPED=1 python test/test_tiny.py passes.
added PyConst = float|int|bool for some Tensor level input types
* hcq
2026-01-27 14:09:34 -05:00
chenyu
db010a31be
IGNORE_OOB -> CHECK_OOB [pr] ( #14374 )
...
flip the meaning
2026-01-27 12:20:59 -05:00
Christopher Milan
c9c533fc78
libclang path is homebrew on macos ( #14357 )
...
* libclang path is homebrew macos
* typo
* ugh
* typo
* regen
* no LIBCLANG_PATH
2026-01-26 17:32:09 -05:00
qazal
2d91fe6310
use amdgpu dsl in mmapeak ( #14342 )
...
* use amdgpu dsl in mmapeak
* don't rely on llvm for vgpr counting
* llvm roundtrip assert
* rm it, add ci
* vgpr_count
* move emulated test to amd, it needs comgr
* env
* arch
* inst._fields -> inst.operands
* vgpr offset
2026-01-26 22:03:43 +09:00
qazal
b2e2ace85b
viz: remove ci check, it's VIZ=-1/-2 ( #14343 )
2026-01-26 20:36:23 +09:00
George Hotz
be23776ba7
assembly/amd: replace pcode with ucode ( #14002 )
...
* a bunch of todos for my boy claude
* uops have types
* lil cleanups
* simpler ucode
* isNAN
* calls
* move more
* cleanup pcode_parse
* cvt functions
* fix parser bugs
* no void
* minmax
* more pcode parse
* pretty print
* transform
* comments
* move to transform
* assign/declare
* simpler norm
* single PM
* just Uops
* simpler
* more typed
* all rewrite
* less verbose
* work
* spec
* transform
* work
* simpler spec
* less spec
* bitcast
* simpler
* simp ucode
* work
* more in pcode_transform
* remove junk
* more functions
* bug
* no void assign
* load/store
* wave
* fixes
* move denorm
* move more functions
* tests
* cat is shape None
* uop syntax
* move a few more
* program_spec
* cat stuff
* assign fix clear
* unused
* nans
* fp bits
* works with simplify
* remove junk
* special
* meh
* more
* more
* update test pcode parse
* improve parser
* parse some for loops
* merge master
* dead files
* tests pass
* emu2
* better emu2
* test_plus works
* uselessly write more instructions
* use pcode
* something
* something
* bench_emu
* progress
* ds works
* work
* work
* more passing
* run compare
* bench_emu
* more pcode
* a few more
* bugfixes
* bugfix
* test fixes
* tests pass without USE_HW
* all hw tests pass
* add more hw tests
* new hw tests
* bit
* less handcode
* parse more
* consolidate pcode
* fixes
* rsrc
* lane pcode
* cleanups
* simpler
* emu bugs
* one cmp test fails
* fix decode and upd name
* fix name and test harness
* _ftz_f32
* fix denorm
* fix VOPD and use load
* fix carry bug
* no load where / just invalid
* clean
* simpler
* merge sops
* refactoring
* simplifications
* bugfixes
* new tests
* f16 sin fix
* assertion and hw tests
* cvt functions
* one more failure
* bugfixes
* bugfix + regression
* more tests
* fmac
* no manual unrolling
* ordering
* LLVM backend is a lot faster
* compile inst
* more bugs
* f16
* bugfix
* fix regression
* one clang call
* 1M inst
* scratch works
* do scratch correctly
* cleanup
* regression
* cmp
* fmamk fixes
* merge
* fix vcmpx
* unify memory
* remove unused code
* ignore oob for test
* cleanups
* fix mbs
* unify cmp
* test
* minor cleanups
* bump timeout
* fix tests
* revert the CMPLE stuff
* remove opt
* less diff
* simpler
* revert
* support multiple backends
* memset is a lot faster
* split out in bench emu
* improve timing
* timing
* cache that
* cache that
* simpler and faster
* tokenize
* binop table
* simpler
* move to parser
* tok for lambda
* refactor
* expr_parser
* delete emu2_pcode
* import cleanup
* lil
* if parse
* work
* simpler
* no v
* trig preop is faster
* durations for tests
* fix cmp bug
* sdst
* remove scartch_size hack
* null behavior
* _MXCSRContext
* bugfixes
* DEBUG >= 3
* test smem crashes my gpu
* debug
* test
* test smem
* profiler
* full inst
* bugfix
* rtag(1)
* pc is 64-bit and word
* pc is real code now
* dynamic
* more dynamic
* fix oob access
* fix crash, more dyn
* all dyn
* really all dyn
* correct null mask
* lit + format
* 21s on the tests
* 13s on the tests
* canonical name
* simm16
* more dyn
* 14s
* proper saddr dedup
* dyn
* debug 5
* better 5
* revert dynamic stuff
* that can be dyn
* negative offsets
* dyn wmma
* f16 wmma support / ops / dtype / dtype_alu
* symbolic changes not needed
* ConstFloat
* more uop.const
* __eq__
* uop tests
* fix f16
* bf16 tensor cores
* whitespace
* remove cast roundtrip
* Revert "remove cast roundtrip"
This reverts commit c5bb0381c3 .
* just the fix
* remove dead paths
* llvm runs
2026-01-26 18:04:29 +08:00
nimlgen
21ab23ae18
nv: add pma for ada ( #14328 )
...
* nv: add pma for ada
* um
* fix
* shorter
* mock
2026-01-25 17:33:37 +03:00
chenyu
7e41da1ae8
fix generate_dataset.sh ( #14324 )
...
added `set -e` so wrong pathes would fail the script, then fixed the path
2026-01-24 16:47:10 -05:00
George Hotz
52b989c6c8
don't place consts early + fixes from anthropic challenge ( #14286 )
...
* don't place consts early
* add anthropic challenge
* with ref
* do we still have to devectorize bools?
* tests pass
* just WHERE
* fine, revert that
* fine, revert
* only index
* z3 validator doesn't support vectorized
* Revert "z3 validator doesn't support vectorized"
This reverts commit 1b7930ecb3 .
* z3 not for vec
* no spec
* VLIWRenderer
* loop unrolling
* better comments
* cleanups
* skip cast
* renderer
* cleanups
* prints
* no hack
* hacks
* bump to 11
* reg warning
* lil clean
* cleaner renderer
2026-01-23 10:48:39 +09:00
nimlgen
8cd22df2dd
amd: alive wgps ( #14149 )
...
* amd: disabled wgps
* l
* wgp
* uoops
* mockgpu
* drm
* ad this
* fi
* reg
2026-01-23 00:08:45 +03:00
chenyu
e04767e39e
run pre-commit in ci ( #14253 )
...
* run pre-commit in ci
prevents pre-commit regression
* IGNORE_OOB=1
* pytest
* unit test
* split
2026-01-20 12:24:33 -05:00
nimlgen
dc82856084
tbgpu: shim binary + remote apl pci dev ( #14124 )
...
* shim binary + remote pci dev
* v2
* rip out apl
* cmds
* rename
* clean
* remove
* rm gitignore
* ui
* install
* linter
* um
* cleaner
* assets
* normal install in ui
* cleaner app
* install script
* support fd mmap
* cleaner
* kill server when disconn
* rename + pcidevs
* sign
* install and reinstall
* no sip install
* will trigger update
* nv
* ugh
* this
* fix
* nv
* use nosip sign
* auto install
* remove
* mypy
* upd
* ditto
* print
* simpler
* ditto
* um
* simpler
* upd
* upd
* cleaner
* autogen
* cleaner
* move
* annotations
* server cleaner
2026-01-20 16:15:18 +03:00
qazal
b1c5a242b7
Revert "move is_dtype_supported logic to renderer ( #14188 )" ( #14237 )
...
This reverts commit 161fee9a48 .
2026-01-20 12:19:14 +09:00
Christopher Milan
161fee9a48
move is_dtype_supported logic to renderer ( #14188 )
...
* move is_dtype_supported logic to renderer
* fix CPU_COUNT
* mypy happy
* early import libclang too with llvm
* run with debug
* skip autogen tests if MTLCompiler or llvm is loaded
* run autogen tests separately in CI
* lint
2026-01-18 22:37:04 -05:00
Christopher Milan
1eb110cd7d
fix memory corruption in NIR, reenable process replay ( #14204 )
2026-01-18 02:05:12 -05:00
qazal
feaa804158
skip lvp process replay in CI [pr] ( #14202 )
2026-01-18 13:25:04 +09:00
chenyu
dc4ae7dd08
lower ASSERT_MIN_STEP_TIME for driving_policy to 3ms ( #14184 )
...
seems quite stable at 2.7ms now
2026-01-16 15:04:53 -05:00
George Hotz
e9ce12028e
assembly/amd: amdxml cleanups, remove broken SDWA/DPP, merge in pdf.py ( #14154 )
...
* assembly/amd: amdxml cleanups, remove broken SDWA/DPP
* remove buf junk
* simplify
* simplify
* lil cleanup
* dead fixes
* strip non pcode extraction from pdf
* merge pdf.py into amdxml.py
* only amdxml
2026-01-15 09:23:19 +09:00
chenyu
986e865830
fix TINY_BACKEND=1 cumsum ( #14138 )
...
* fix TINY_BACKEND=1 cumsum
old hack was wrong, need to apply contiguous on the input
* test time
* test_linalg_svd is slow
2026-01-14 09:54:49 -05:00