chenyu
5e6a72c33f
new Onnx Gather ( #14187 )
...
instead of assuming const indices, check if it showed as a const
2026-01-16 22:24:07 -05:00
chenyu
ab244c7f81
onnx Gather should not assume indices to be const ( #14185 )
...
* onnx Gather should not assume indices to be const
added a failed test case
* just list
2026-01-16 20:55:00 -05:00
wozeparrot
a879b54234
tk: fa jit fix ( #14170 )
2026-01-16 16:38:45 -08:00
Christopher Milan
a021b84604
autogen: fix enum ( #14171 )
2026-01-16 01:30:11 -05:00
chenyu
14e9a71a41
move test_assign to unit ( #14165 )
...
scheduling these should not depend on device
2026-01-15 17:10:13 -05:00
Christopher Milan
0cb024a5bb
remove ctypes.Structure ( #13651 )
2026-01-15 05:06:22 -05:00
qazal
164bc678a6
scheduler: sched_cache bugfix for different Tensor.custom_kernel schedules ( #14161 )
...
* simplest failing test
* min fix
* same function reuses the cache
* SPEC=2 never worked for custom_kernel
2026-01-15 14:59:14 +09:00
qazal
b46da603fe
codegen/custom_kernel: do not attach KernelInfo to user program ( #14160 )
2026-01-15 14:01:48 +09:00
chenyu
add7da268f
multiple slice assign test ( #14157 )
...
GANing test cases
2026-01-14 21:08:03 -05:00
chenyu
1381daac06
many more failed assign tests ( #14153 )
...
assign is quite broken
2026-01-14 16:20:28 -05:00
chenyu
899a56446e
failed assign test cases with write before read ( #14148 )
...
slice assign write before read fails now. this is why kv cache needs a realize
2026-01-14 10:30:50 -05:00
chenyu
2a2c1eacf6
disable fast_idiv on metal ( #14137 )
...
there's a metal compiler bug which was the root cause that keccak needs a contigous hack
2026-01-13 21:40:40 -05:00
wozeparrot
a92778aa0c
tk: fa multi fix ( #14134 )
2026-01-13 17:22:15 -08:00
chenyu
fe00682502
clean up svd tests ( #14133 )
...
removed from test_ops and added to TestTorchBackend
2026-01-13 16:32:21 -05:00
chenyu
e610821c52
Tensor.cummin and Tensor.nonzero ( #14131 )
2026-01-13 15:09:56 -05:00
chenyu
176a934ddd
Tensor.diagonal support offset and dims ( #14130 )
2026-01-13 14:49:06 -05:00
qazal
79d00521f8
viz: fix cfg err when endpgm is in the middle of stream ( #14128 )
...
* kernel from beautiful_mnist
* minimal test
* correct way to do this
* rm that
2026-01-14 02:00:34 +09:00
qazal
fd10fd245a
viz: cfg tokenizer fix and unit tests ( #14121 )
...
* output Ops.BINARY
* failing test for the cfg
* dsl renamed to offset and sz
* add better asserts
* move the note
2026-01-13 15:08:55 +09:00
chenyu
05fcb57696
also return index in Tensor.cummax ( #14117 )
...
* also return index in Tensor.cummax
* fix
2026-01-12 22:42:10 -05:00
wozeparrot
7c967399a4
tk: add failing test for fa multidevice ( #14116 )
2026-01-12 19:11:09 -08:00
George Hotz
330a0b686e
assembly/amd: clean up dsl and make type verification strict ( #14102 )
...
* assembly/amd: start newdsl
* work
* newdsl upd
* Reg is p nice
* cleaner
* work
* getting clean
* all fields
* more BitFields
* redo the pdfs with dsl2 syntax
* no lit
* cleanups
* more defaults
* fix get and remove crap
* aliases
* ugly but kind of works
* NULL, not rawimm
* clean up defaults
* only dsl
* asm fixes
* lit fixup
* more lit
* cleanups
* olddsl
* single pcode dict
* emu sort of works
* trash test
* global is global
* types property
* reg mods
* fix a few tests
* remove monkey patch
* fixes
* less hacks in tests
* less hacks in tests
* 4 test failures
* hw tests all pass
* fix compare emulator
* fix some tests
* 3 more
* fix and shorten sqtt
* handwritten
* fix validation
* test corrections
* all types validate
* fix dsl2 tests
* fix bugs in disasm
* skips on cdna
* work
* repr with reg[]
* fix bitfield tests
* merge pcodes in dsl
* remove override
* disasm uses inst.types
* simpler
2026-01-13 08:52:16 +09:00
C T
a8c821f45e
add Tensor.log10 with test\test_ops.py::TestOps::test_log10 ( #14113 )
2026-01-12 13:45:47 -05:00
chenyu
6b0a9f5ee6
don't strip sink in to_uops_list [pr] ( #14111 )
2026-01-12 11:19:03 -05:00
chenyu
cad7feec02
more onnx ops ( #14104 )
...
HannWindow, HammingWindow, BlackmanWindow, Hardmax, LpNormalization
2026-01-12 09:11:13 -05:00
chenyu
9973a81356
add channels_last to QLinearGlobalAveragePool ( #14094 )
...
and other minor cleanups
2026-01-10 18:38:19 -05:00
chenyu
35c9701df0
update outdated tests and comments ( #14090 )
2026-01-10 01:00:48 -05:00
chenyu
92246ea731
update tests, WEBGPU=1 pytest . passes ( #14089 )
...
* update tests, `WEBGPU=1 pytest .` passes
* minor update
2026-01-10 00:03:02 -05:00
chenyu
c34c6d9468
fix wgsl packed_store can drop valid ( #14088 )
...
* fix wgsl packed_store can drop valid
* fix
2026-01-09 15:22:06 -05:00
chenyu
eacccc5ace
more disk assign tests ( #14087 )
...
covers more edge cases
2026-01-09 14:14:52 -05:00
chenyu
ed295e74dc
don't skip gguf test if ggml is not installed ( #14086 )
...
* don't skip gguf test if ggml is not installed
should just let it fail
* fix
2026-01-09 12:05:58 -05:00
chenyu
cff33c8d78
add some disk assign tests ( #14085 )
2026-01-09 11:50:59 -05:00
chenyu
74fa3c7d09
decomp pow for LVP ( #14084 )
...
test failed due to undefined behavior, so use decomp instead
2026-01-09 10:50:28 -05:00
b1tg
0fbc551622
train bert with fp8 ( #13874 )
...
* fp8 train
* clean
* lint
* test fix from #13439
* skip first/last layer
* rm __init__, restore unroll <=32 check
* tests
* clean test, remove unused
* multi-gpu test, clean quantize_to_fp8
* remove bert contiguous
* run script
* test: better check
* run script search
* add seed in bert data shuffle
* move script to mi350x folder
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-01-09 09:21:59 -05:00
chenyu
efcb32f6a9
unique const when requires_grad is set to True ( #14075 )
...
* unique const when requires_grad is set to True
* fix pyrender
2026-01-08 16:30:45 -05:00
chenyu
b34c637767
support bfloat16 for CL ( #14073 )
2026-01-08 14:14:29 -05:00
Garret Castro
16b652302e
skip bf16 test if not supported by device ( #14070 )
2026-01-08 13:37:24 -05:00
wozeparrot
027b935269
tk: fix grouped load store ( #14035 )
2026-01-07 22:38:02 -08:00
chenyu
3caa1e2c98
fix cast HALF with PYTHON backend ( #14058 )
2026-01-07 16:52:05 -05:00
chenyu
5f1ede7f7e
clean up test_dtype ( #14055 )
...
use less lambda
2026-01-07 15:45:42 -05:00
chenyu
2833c5a54b
few more jit tests with multi tensor inputs ( #14047 )
2026-01-06 22:05:22 -05:00
chenyu
72a3f78d19
jit includes tensor inputs in containers ( #14043 )
...
* jit includes tensor inputs in containers
* cleanup
2026-01-06 19:42:06 -05:00
chenyu
c714881832
don't allow jit input to be const ( #14045 )
...
* don't allow jit input to be unbuffered like const
* just const to fix multi
* fix rnnt
2026-01-06 18:15:22 -05:00
chenyu
a8896f28e1
test_unrealized_const_input_frozen ( #14044 )
...
unrealized const is not replaced in jit
2026-01-06 14:17:43 -05:00
nimlgen
325f4006ff
amd: copies w/o sdma ( #14036 )
...
* amd: copies w/o sdma
* as_args
* fixes
* f
2026-01-06 21:15:58 +03:00
chenyu
7fb18f7e47
raise when jit fxn returns non-Tensor output ( #14042 )
2026-01-06 12:59:20 -05:00
chenyu
4491ec0c9e
JitError ( #14041 )
...
* JitError
* test_symbolic_jit
2026-01-06 12:19:50 -05:00
chenyu
6ddddc68af
test jit tolist failure ( #14040 )
...
also moved tests to test_jit_footguns
2026-01-06 11:16:57 -05:00
chenyu
b699b9f763
test case for jit a function with item call ( #14039 )
...
* test case for jit a function with item call
output is silently wrong now
* no dtype
2026-01-06 10:40:43 -05:00
qazal
3170365a5b
visualize SQTT with the same cfg infrastructure ( #13870 )
...
* start
* rough sketch
* post render dag
* art
* intro g key
* work
* custom color scale
* colors
* more blue
* better
* smaller
* use for loop in test
2026-01-06 14:53:20 +09:00
chenyu
83063cc3e4
onnx TensorScatter ( #14024 )
2026-01-05 09:05:22 -05:00