chenyu
72a3f78d19
jit includes tensor inputs in containers ( #14043 )
...
* jit includes tensor inputs in containers
* cleanup
2026-01-06 19:42:06 -05:00
chenyu
c714881832
don't allow jit input to be const ( #14045 )
...
* don't allow jit input to be unbuffered like const
* just const to fix multi
* fix rnnt
2026-01-06 18:15:22 -05:00
chenyu
a8896f28e1
test_unrealized_const_input_frozen ( #14044 )
...
unrealized const is not replaced in jit
2026-01-06 14:17:43 -05:00
nimlgen
325f4006ff
amd: copies w/o sdma ( #14036 )
...
* amd: copies w/o sdma
* as_args
* fixes
* f
2026-01-06 21:15:58 +03:00
chenyu
7fb18f7e47
raise when jit fxn returns non-Tensor output ( #14042 )
2026-01-06 12:59:20 -05:00
chenyu
4491ec0c9e
JitError ( #14041 )
...
* JitError
* test_symbolic_jit
2026-01-06 12:19:50 -05:00
chenyu
6ddddc68af
test jit tolist failure ( #14040 )
...
also moved tests to test_jit_footguns
2026-01-06 11:16:57 -05:00
chenyu
b699b9f763
test case for jit a function with item call ( #14039 )
...
* test case for jit a function with item call
output is silently wrong now
* no dtype
2026-01-06 10:40:43 -05:00
qazal
3170365a5b
visualize SQTT with the same cfg infrastructure ( #13870 )
...
* start
* rough sketch
* post render dag
* art
* intro g key
* work
* custom color scale
* colors
* more blue
* better
* smaller
* use for loop in test
2026-01-06 14:53:20 +09:00
chenyu
83063cc3e4
onnx TensorScatter ( #14024 )
2026-01-05 09:05:22 -05:00
chenyu
9497ec00f2
fix onnx attention permute ( #14025 )
...
* fix onnx attention permute
* skip test_attention_4d_fp16_cpu too
2026-01-05 08:58:50 -05:00
chenyu
7a81a3cb98
more passed onnx tests ( #14022 )
2026-01-05 07:46:27 -05:00
Christopher Milan
b2a0b9c551
autogen: dump patch in CI ( #14010 )
...
* autogen: don't fast-fail, produce patch artifact on differences
All verification steps now use continue-on-error to run completely.
Each job generates a patch artifact containing all differences found.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* add gen from header test
* fix tests
* fail if diff
* add forward decl autogen test
* remove confusing/wrong comments
* macos unittests set LIBCLANG_PATH
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-01-04 22:38:12 -05:00
chenyu
aae08b20e0
enable passed onnx tests ( #14017 )
2026-01-04 22:12:50 -05:00
chenyu
f6a78a29e0
support einsum trace ( #14012 )
...
* support einsum trace
* test_einsum_scalar_cpu
2026-01-04 19:27:27 -05:00
wozeparrot
f550f9204c
fa: failing test for bwd jit ( #14009 )
...
* tk: failing test for bwd jit
* feat: mark expectedFailure
* clean: spaces
2026-01-04 16:57:43 -05:00
George Hotz
7abf4591ba
use bitsize on dtype ( #14011 )
...
* use bitsize on dtype [pr]
* bitsize
* bitsize in js export, but might be wrong
* reverts
* revert that
2026-01-04 12:16:21 -08:00
chenyu
cfb8bf5814
faster image load ( #13977 )
...
sometimes image load does not need to init with NAN
2026-01-04 13:09:59 -05:00
qazal
bdb421f13e
process_replay: passthrough sink arg for Ops.PROGRAM input ( #14000 )
2026-01-04 13:09:39 +09:00
chenyu
8003db2a28
test case of NOOP store load folding ( #13997 )
2026-01-03 14:39:26 -05:00
qazal
2cc64d71b0
simplify mi350x gemm / viz asm tests ( #13984 )
...
* mi350x gemm cleanup
* asm tests work
* simpler asm tests
2026-01-03 11:11:07 +09:00
Christopher Milan
9dc524536f
IMAGE=1 creates "dynamic" images ( #13769 )
...
* remove image from BufferSpec
* cl tiny_gemm (64) works
* mypy
* padding
* openpilot CL
* reshape properly
* remove extra qcom checks
* pad output
* mypy
* update compile test
* move undo
* TestImageCopy valid images
* TestImageRealization valid images
* TestImageDType valid images
* cleanups
* test_renderer_failures
* ruff
* mypy
* simplify ops_qcom
* bump step time
* Revert "bump step time"
This reverts commit 75a037c7d0 .
* "dynamic textures" are optional
* a start
* IMAGE=1 works, no FLOAT16
* fast but wrong
* mypy
* some fixes
* better
* works
* refactor
* oops
2026-01-02 16:22:39 -05:00
chenyu
2e2b5fed12
fix misspellings ( #13976 )
2026-01-02 10:37:38 -05:00
b1tg
a78fcc55a4
amd tc 1616128 ( #13439 )
...
* amd tc 1616128
* fix test
* remove hardcoded check in test
2026-01-02 09:01:05 -05:00
wozeparrot
ecbac8a338
tk: fa cleanups + causal test ( #13963 )
2026-01-01 18:05:00 -08:00
chenyu
af0392efea
only set DiskDevice.size if it opens successfully ( #13962 )
2026-01-01 19:33:26 -05:00
chenyu
e036d6df89
properly fix DiskDevice reuse ( #13961 )
2026-01-01 18:08:23 -05:00
chenyu
cb7c76a3bd
update test_fuzz_failure to not contruct full UOp ( #13960 )
2026-01-01 15:09:58 -05:00
chenyu
51398edf9c
fix indirect import ( #13958 )
...
also deleted old external tests
2026-01-01 14:22:45 -05:00
chenyu
8e416df438
simpler InvalidType [pr] ( #13957 )
...
simpler singleton pattern
2026-01-01 13:55:51 -05:00
chenyu
4d5c4d256d
update tqdm for edge case ( #13956 )
...
1.00kit/s and not 1000it/s for value 999.5
2026-01-01 11:37:26 -05:00
chenyu
ed222070f7
update xlog2 fp16 decomp to not use fp32 ( #13955 )
2026-01-01 11:18:29 -05:00
chenyu
c69470be52
fix test_symbolic_arange_sym_step ( #13952 )
2026-01-01 09:41:07 -05:00
chenyu
b91b46091c
delete test_tensor_uop ( #13951 )
...
old test for shape tracker. also update tests that refer shapetracker
names
2026-01-01 09:25:05 -05:00
chenyu
17ef4af72c
new ceildiv that fixed symbolic conv ( #13944 )
...
* new ceildiv that fixed symbolic conv
* smaller test case
2026-01-01 09:02:41 -05:00
haofei
526fd4ec71
Fix SVD rank‑1 Jacobi rotation when tau == 0 ( #13945 )
2026-01-01 00:30:18 -05:00
haofei
20777f30b9
Fix QR/SVD NaNs on zero/orthogonal inputs ( #13943 )
2025-12-31 23:40:09 -05:00
chenyu
52acadc160
consolidate IGNORE_OOB=0 tests ( #13937 )
...
add a new unit test file and add more cases
2025-12-31 15:24:20 -05:00
Christopher Milan
13973e4dea
refactor image pitch ( #13928 )
2025-12-31 13:22:38 -05:00
George Hotz
b998a80b5d
assembly/amd: split generated stuff into enum/ins ( #13924 )
2025-12-31 10:10:52 -05:00
nimlgen
25440f0f72
all2all ( #13902 )
...
* all2all
* um
* fix
* x
* um
* simler
* mypy
* fix
* t
* cmnts
2025-12-31 16:38:32 +03:00
George Hotz
0221b96761
assembly/amd: fix all ops tests ( #13910 )
...
* assembly/amd: fix all ops tests
* test_ops with smaller sizes
* ds store/load 2addr
2025-12-30 18:01:34 -05:00
George Hotz
efc99d0c55
assembly/amd: more refactors ( #13907 )
...
* assembly/amd: more refactors
* more refactors
* more refactors
* simpler emu
* generate.py
* regen all
* cleanups
* more
* work
* more readme
* lil
2025-12-30 16:13:24 -05:00
George Hotz
04c79505ec
no subnormal bf16 ( #13905 )
2025-12-30 13:02:53 -05:00
chenyu
ab58926b00
update sampling in test_float_cast_to_unsigned ( #13889 )
...
filter is slow for small dtypes
2025-12-29 21:35:46 -05:00
George Hotz
81cf9ea0ab
rename to extra.assembly.amd ( #13879 )
2025-12-29 14:10:55 -05:00
b1tg
63a1bb8507
multi custom kernel: support input mixed with copy and shard ( #13748 )
2025-12-29 12:54:27 -05:00
chenyu
0a98fd38b3
fix tests that failed locally on mac ( #13872 )
...
keccak output was silently broken without contiguous
2025-12-29 11:23:38 -05:00
Clément Verrier
0e409ff5ce
fix indentation in UOp pretty_print for repeated references ( #13857 )
...
* fix correct indentation in UOp pretty_print for repeated references
When a UOp was referenced multiple times, the walrus operator notation
(e.g., x0:=) was correctly used for the first occurrence, but subsequent
references had misaligned indentation due to an extra space character.
Fix indentation misalignment in pretty_print() when UOps are referenced
multiple times.
* add simple unit tests for UOp repr
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-12-29 10:46:16 -05:00
George Hotz
25ef866e89
write python emulator from RDNA3 psuedocode in pdf ( #13841 )
...
* write python emulator from RDNA3 psuedocode in pdf
* emu2
* more emu
* working
* more psueod
* progress
* cleanups
* delete junk
* delete stale files
* just emu
* work
* emu compare
* bemu
* cleanups and more failures
* revert bench emu
* fix emu cmp
* four tests fail
* bugfixes
* dsl
* ext
* refactor
* dsl
* div scale fix
* test_emu
* fix emu tests
* pcode
* test pcode
* top imports
* fix test_emu to use run_asm
* emu tests on real hardware
* more tests
* more emu tests
* more
* work
* work
* bug fix
* bugfixes
* fix fp16 gemm
* all ops tests pass in emulator
* fix llvm tests
* fix a few more tests
* fix mockgpu timeout
2025-12-29 07:39:53 -05:00