Christopher Milan
b2a0b9c551
autogen: dump patch in CI ( #14010 )
...
* autogen: don't fast-fail, produce patch artifact on differences
All verification steps now use continue-on-error to run completely.
Each job generates a patch artifact containing all differences found.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* add gen from header test
* fix tests
* fail if diff
* add forward decl autogen test
* remove confusing/wrong comments
* macos unittests set LIBCLANG_PATH
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-01-04 22:38:12 -05:00
chenyu
aae08b20e0
enable passed onnx tests ( #14017 )
2026-01-04 22:12:50 -05:00
chenyu
785d04d127
simpler einsum ( #14014 )
2026-01-04 20:38:59 -05:00
chenyu
f6a78a29e0
support einsum trace ( #14012 )
...
* support einsum trace
* test_einsum_scalar_cpu
2026-01-04 19:27:27 -05:00
George Hotz
404eed6172
assembly/amd: improve tests for asm ( #14007 )
...
* assembly/amd: improve tests for asm
* upd
* skip
* tests
* re bug
* more passing
* cleanups
* cdna fixups
* improve tests, better CDNA parsing
* fix CI
* no defs
* simpler
* all pass
* from pdf
* regen
2026-01-04 15:14:08 -08:00
wozeparrot
f550f9204c
fa: failing test for bwd jit ( #14009 )
...
* tk: failing test for bwd jit
* feat: mark expectedFailure
* clean: spaces
2026-01-04 16:57:43 -05:00
George Hotz
7abf4591ba
use bitsize on dtype ( #14011 )
...
* use bitsize on dtype [pr]
* bitsize
* bitsize in js export, but might be wrong
* reverts
* revert that
2026-01-04 12:16:21 -08:00
chenyu
cfb8bf5814
faster image load ( #13977 )
...
sometimes image load does not need to init with NAN
2026-01-04 13:09:59 -05:00
George Hotz
7ebda28692
assembly/amd: add CDNA support to asm ( #13982 )
...
* add CDNA support
* more cdna tests
* something
* fix more stuff
* more work
* simpler
* simplier
* cdna
* disasm
* less skip
* fixes
* simpler
2026-01-04 08:53:56 -08:00
chenyu
ad041416ca
delete unused rewrite rule [pr] ( #14006 )
2026-01-04 09:48:52 -05:00
nimlgen
bf356ae996
am: mi300 48bit address space ( #14004 )
...
* am: mi300 48bit address space
* fix
2026-01-04 15:19:25 +03:00
nimlgen
606786e152
am: do not sleep for each hive node during resets ( #14003 )
2026-01-04 14:02:11 +03:00
George Hotz
34ea053b26
assembly/amd: clean up pcode, jit pcode instead of static ( #14001 )
...
* assembly/amd: clean up pcode
* regen
* lil
* jit the pcode
* sendmsg
* cleanups
* inst prefetch lol
2026-01-03 23:06:15 -08:00
kamilisjon
280790e438
Reuse toposort in recursive_property ( #13993 )
2026-01-03 22:04:13 -08:00
kamilisjon
9a9564118c
[pr] Delete reverse_toposort ( #13987 )
...
* Delete reverse_toposort
* Update comment and profiler name
* Update profiler name
2026-01-03 22:03:44 -08:00
George Hotz
8328511808
assembly/amd: make the emu.py code shine ( #13996 )
...
* assembly/amd: make the code shine
* lil clean
* reg back in pcode
* cleanups
* gen fma_mix
* no writelane hacks
* fn cleanup
* dead vgpr_write
* readable
* smem
* cleanup bench_emu
* speedups
* simpler and faster
* direct inst._fn
* split fxn
* Revert "simpler and faster"
This reverts commit e85f6594b3 .
* move lds to wavestate
* dispatcher
* pc in dispatch
* literal isn't wavestate
* cleanups + program
* one readlane
* exec_vop3sd in exec_vop
* cleaner exec_vopd
* fully merge VOP3P
* no special paths
* no SliceProxy
* low=0
* no bigint
* failing tests
* fma on python 3.13
2026-01-03 20:33:09 -08:00
qazal
bdb421f13e
process_replay: passthrough sink arg for Ops.PROGRAM input ( #14000 )
2026-01-04 13:09:39 +09:00
Galax
66caa9fe1d
fix: library linking for fedora systems ( #13999 )
2026-01-03 17:40:56 -08:00
chenyu
8003db2a28
test case of NOOP store load folding ( #13997 )
2026-01-03 14:39:26 -05:00
chenyu
c1b8644a3f
test removing expander rules [pr] ( #13994 )
2026-01-03 12:38:01 -05:00
Christopher Milan
35c2870b1f
gate image_conv2d pitch hacks on IMAGE==1 ( #13995 )
...
* gate image_conv2d pitch hacks on IMAGE==1
* fix opencl image copies
* cleanup
2026-01-03 12:27:31 -05:00
nimlgen
a49924a0e9
hcq: _sleep report status ( #13992 )
...
* hcq: _sleep report status
* msg
* print all
2026-01-03 14:28:28 +03:00
nimlgen
3b354bc11f
hcq: better queue managment ( #13991 )
2026-01-03 13:11:15 +03:00
nimlgen
efb2ae87c6
hcq sync aql ( #13756 )
...
* hcq sync aql
* w
2026-01-03 12:59:24 +03:00
qazal
bd55507ee4
RDNA3 fp16 assembly gemm 85 TFLOPS ( #13990 )
2026-01-03 18:34:23 +09:00
wozeparrot
6242a9d151
tk: no global copy and clear ranges ( #13988 )
2026-01-02 23:45:15 -08:00
wozeparrot
9f082e8e25
fa: split kv bwd into 2 kernels ( #13981 )
2026-01-02 18:45:51 -08:00
qazal
2cc64d71b0
simplify mi350x gemm / viz asm tests ( #13984 )
...
* mi350x gemm cleanup
* asm tests work
* simpler asm tests
2026-01-03 11:11:07 +09:00
chenyu
7cbafb2ef1
update hypothesis min version ( #13983 )
...
there was a local_constants perf regression that made hypothesis related tests slow
2026-01-02 21:01:57 -05:00
Christopher Milan
9dc524536f
IMAGE=1 creates "dynamic" images ( #13769 )
...
* remove image from BufferSpec
* cl tiny_gemm (64) works
* mypy
* padding
* openpilot CL
* reshape properly
* remove extra qcom checks
* pad output
* mypy
* update compile test
* move undo
* TestImageCopy valid images
* TestImageRealization valid images
* TestImageDType valid images
* cleanups
* test_renderer_failures
* ruff
* mypy
* simplify ops_qcom
* bump step time
* Revert "bump step time"
This reverts commit 75a037c7d0 .
* "dynamic textures" are optional
* a start
* IMAGE=1 works, no FLOAT16
* fast but wrong
* mypy
* some fixes
* better
* works
* refactor
* oops
2026-01-02 16:22:39 -05:00
Christopher Milan
61dc70f1a8
add driving_vision IMAGE=1 benchmark ( #13979 )
2026-01-02 13:58:27 -05:00
George Hotz
0e282025ff
assembly/amd: split test_emu into hw tests ( #13966 )
...
* assmebly/amd: split test_emu into hw tests
* hw tests
* bugfixes
* more tests and fix
2026-01-02 08:04:56 -08:00
chenyu
2e2b5fed12
fix misspellings ( #13976 )
2026-01-02 10:37:38 -05:00
nietras
f49e4714af
Fix spelling errors in README for AMD assembly ( #13975 )
2026-01-02 10:15:20 -05:00
b1tg
a78fcc55a4
amd tc 1616128 ( #13439 )
...
* amd tc 1616128
* fix test
* remove hardcoded check in test
2026-01-02 09:01:05 -05:00
chenyu
fcbb896e05
remove unused to_struct [pr] ( #13973 )
2026-01-02 08:54:57 -05:00
nimlgen
ff7853a65a
am: fix aid doorbells ( #13971 )
2026-01-02 15:53:44 +03:00
nimlgen
42abb0586c
am: fix aid doorbells ( #13972 )
2026-01-02 15:53:13 +03:00
nimlgen
ebbaad6bfd
am: enable all sdma engines ( #13970 )
2026-01-02 15:25:15 +03:00
qazal
5f52266225
mi350x gemm: use Tensor.custom_kernel in asm test ( #13969 )
...
* mi350x gemm: use Tensor.custom_kernel in asm test
* A @ B for baseline
2026-01-02 18:30:50 +09:00
George Hotz
5a1a561e0f
assembly/amd: rdna4 autogen ( #13967 )
...
* assembly/amd: add pcode ds ops
* refactors
* fix ds op
* update autogen
* fix flat bug
* more tests
* fix emu test
* that's a hack
* generic
* fix all tests
* two tests
* fix test failure
* better
* remove __all__
* assembly/amd: fix autogen for RDNA4
2026-01-01 23:12:18 -05:00
wozeparrot
b27527f05a
fix: missed inner tracked range ( #13964 )
2026-01-01 18:09:57 -08:00
wozeparrot
ecbac8a338
tk: fa cleanups + causal test ( #13963 )
2026-01-01 18:05:00 -08:00
chenyu
af0392efea
only set DiskDevice.size if it opens successfully ( #13962 )
2026-01-01 19:33:26 -05:00
chenyu
e036d6df89
properly fix DiskDevice reuse ( #13961 )
2026-01-01 18:08:23 -05:00
George Hotz
dfb813b760
assembly/amd: add pcode ds ops ( #13939 )
...
* assembly/amd: add pcode ds ops
* refactors
* fix ds op
* update autogen
* fix flat bug
* more tests
* fix emu test
* that's a hack
* generic
* fix all tests
* two tests
* fix test failure
* better
* remove __all__
2026-01-01 16:24:13 -05:00
chenyu
cb7c76a3bd
update test_fuzz_failure to not contruct full UOp ( #13960 )
2026-01-01 15:09:58 -05:00
chenyu
51398edf9c
fix indirect import ( #13958 )
...
also deleted old external tests
2026-01-01 14:22:45 -05:00
chenyu
8e416df438
simpler InvalidType [pr] ( #13957 )
...
simpler singleton pattern
2026-01-01 13:55:51 -05:00
nimlgen
b8ea0d779c
am: remove pipe, queue from setup_ring ( #13947 )
2026-01-01 21:06:41 +03:00