Commit Graph

4875 Commits

Author SHA1 Message Date
chenyu
5e6a72c33f new Onnx Gather (#14187)
instead of assuming const indices, check if it showed as a const
2026-01-16 22:24:07 -05:00
chenyu
ab244c7f81 onnx Gather should not assume indices to be const (#14185)
* onnx Gather should not assume indices to be const

added a failed test case

* just list
2026-01-16 20:55:00 -05:00
wozeparrot
a879b54234 tk: fa jit fix (#14170) 2026-01-16 16:38:45 -08:00
Christopher Milan
a021b84604 autogen: fix enum (#14171) 2026-01-16 01:30:11 -05:00
chenyu
14e9a71a41 move test_assign to unit (#14165)
scheduling these should not depend on device
2026-01-15 17:10:13 -05:00
Christopher Milan
0cb024a5bb remove ctypes.Structure (#13651) 2026-01-15 05:06:22 -05:00
qazal
164bc678a6 scheduler: sched_cache bugfix for different Tensor.custom_kernel schedules (#14161)
* simplest failing test

* min fix

* same function reuses the cache

* SPEC=2 never worked for custom_kernel
2026-01-15 14:59:14 +09:00
qazal
b46da603fe codegen/custom_kernel: do not attach KernelInfo to user program (#14160) 2026-01-15 14:01:48 +09:00
chenyu
add7da268f multiple slice assign test (#14157)
GANing test cases
2026-01-14 21:08:03 -05:00
chenyu
1381daac06 many more failed assign tests (#14153)
assign is quite broken
2026-01-14 16:20:28 -05:00
chenyu
899a56446e failed assign test cases with write before read (#14148)
slice assign write before read fails now. this is why kv cache needs a realize
2026-01-14 10:30:50 -05:00
chenyu
2a2c1eacf6 disable fast_idiv on metal (#14137)
there's a metal compiler bug which was the root cause that keccak needs a contigous hack
2026-01-13 21:40:40 -05:00
wozeparrot
a92778aa0c tk: fa multi fix (#14134) 2026-01-13 17:22:15 -08:00
chenyu
fe00682502 clean up svd tests (#14133)
removed from test_ops and added to TestTorchBackend
2026-01-13 16:32:21 -05:00
chenyu
e610821c52 Tensor.cummin and Tensor.nonzero (#14131) 2026-01-13 15:09:56 -05:00
chenyu
176a934ddd Tensor.diagonal support offset and dims (#14130) 2026-01-13 14:49:06 -05:00
qazal
79d00521f8 viz: fix cfg err when endpgm is in the middle of stream (#14128)
* kernel from beautiful_mnist

* minimal test

* correct way to do this

* rm that
2026-01-14 02:00:34 +09:00
qazal
fd10fd245a viz: cfg tokenizer fix and unit tests (#14121)
* output Ops.BINARY

* failing test for the cfg

* dsl renamed to offset and sz

* add better asserts

* move the note
2026-01-13 15:08:55 +09:00
chenyu
05fcb57696 also return index in Tensor.cummax (#14117)
* also return index in Tensor.cummax

* fix
2026-01-12 22:42:10 -05:00
wozeparrot
7c967399a4 tk: add failing test for fa multidevice (#14116) 2026-01-12 19:11:09 -08:00
George Hotz
330a0b686e assembly/amd: clean up dsl and make type verification strict (#14102)
* assembly/amd: start newdsl

* work

* newdsl upd

* Reg is p nice

* cleaner

* work

* getting clean

* all fields

* more BitFields

* redo the pdfs with dsl2 syntax

* no lit

* cleanups

* more defaults

* fix get and remove crap

* aliases

* ugly but kind of works

* NULL, not rawimm

* clean up defaults

* only dsl

* asm fixes

* lit fixup

* more lit

* cleanups

* olddsl

* single pcode dict

* emu sort of works

* trash test

* global is global

* types property

* reg mods

* fix a few tests

* remove monkey patch

* fixes

* less hacks in tests

* less hacks in tests

* 4 test failures

* hw tests all pass

* fix compare emulator

* fix some tests

* 3 more

* fix and shorten sqtt

* handwritten

* fix validation

* test corrections

* all types validate

* fix dsl2 tests

* fix bugs in disasm

* skips on cdna

* work

* repr with reg[]

* fix bitfield tests

* merge pcodes in dsl

* remove override

* disasm uses inst.types

* simpler
2026-01-13 08:52:16 +09:00
C T
a8c821f45e add Tensor.log10 with test\test_ops.py::TestOps::test_log10 (#14113) 2026-01-12 13:45:47 -05:00
chenyu
6b0a9f5ee6 don't strip sink in to_uops_list [pr] (#14111) 2026-01-12 11:19:03 -05:00
chenyu
cad7feec02 more onnx ops (#14104)
HannWindow, HammingWindow, BlackmanWindow, Hardmax, LpNormalization
2026-01-12 09:11:13 -05:00
chenyu
9973a81356 add channels_last to QLinearGlobalAveragePool (#14094)
and other minor cleanups
2026-01-10 18:38:19 -05:00
chenyu
35c9701df0 update outdated tests and comments (#14090) 2026-01-10 01:00:48 -05:00
chenyu
92246ea731 update tests, WEBGPU=1 pytest . passes (#14089)
* update tests, `WEBGPU=1 pytest .` passes

* minor update
2026-01-10 00:03:02 -05:00
chenyu
c34c6d9468 fix wgsl packed_store can drop valid (#14088)
* fix wgsl packed_store can drop valid

* fix
2026-01-09 15:22:06 -05:00
chenyu
eacccc5ace more disk assign tests (#14087)
covers more edge cases
2026-01-09 14:14:52 -05:00
chenyu
ed295e74dc don't skip gguf test if ggml is not installed (#14086)
* don't skip gguf test if ggml is not installed

should just let it fail

* fix
2026-01-09 12:05:58 -05:00
chenyu
cff33c8d78 add some disk assign tests (#14085) 2026-01-09 11:50:59 -05:00
chenyu
74fa3c7d09 decomp pow for LVP (#14084)
test failed due to undefined behavior, so use decomp instead
2026-01-09 10:50:28 -05:00
b1tg
0fbc551622 train bert with fp8 (#13874)
* fp8 train

* clean

* lint

* test fix from #13439

* skip first/last layer

* rm __init__, restore unroll <=32 check

* tests

* clean test, remove unused

* multi-gpu test, clean quantize_to_fp8

* remove bert contiguous

* run script

* test: better check

* run script search

* add seed in bert data shuffle

* move script to mi350x folder

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-09 09:21:59 -05:00
chenyu
efcb32f6a9 unique const when requires_grad is set to True (#14075)
* unique const when requires_grad is set to True

* fix pyrender
2026-01-08 16:30:45 -05:00
chenyu
b34c637767 support bfloat16 for CL (#14073) 2026-01-08 14:14:29 -05:00
Garret Castro
16b652302e skip bf16 test if not supported by device (#14070) 2026-01-08 13:37:24 -05:00
wozeparrot
027b935269 tk: fix grouped load store (#14035) 2026-01-07 22:38:02 -08:00
chenyu
3caa1e2c98 fix cast HALF with PYTHON backend (#14058) 2026-01-07 16:52:05 -05:00
chenyu
5f1ede7f7e clean up test_dtype (#14055)
use less lambda
2026-01-07 15:45:42 -05:00
chenyu
2833c5a54b few more jit tests with multi tensor inputs (#14047) 2026-01-06 22:05:22 -05:00
chenyu
72a3f78d19 jit includes tensor inputs in containers (#14043)
* jit includes tensor inputs in containers

* cleanup
2026-01-06 19:42:06 -05:00
chenyu
c714881832 don't allow jit input to be const (#14045)
* don't allow jit input to be unbuffered like const

* just const to fix multi

* fix rnnt
2026-01-06 18:15:22 -05:00
chenyu
a8896f28e1 test_unrealized_const_input_frozen (#14044)
unrealized const is not replaced in jit
2026-01-06 14:17:43 -05:00
nimlgen
325f4006ff amd: copies w/o sdma (#14036)
* amd: copies w/o sdma

* as_args

* fixes

* f
2026-01-06 21:15:58 +03:00
chenyu
7fb18f7e47 raise when jit fxn returns non-Tensor output (#14042) 2026-01-06 12:59:20 -05:00
chenyu
4491ec0c9e JitError (#14041)
* JitError

* test_symbolic_jit
2026-01-06 12:19:50 -05:00
chenyu
6ddddc68af test jit tolist failure (#14040)
also moved tests to test_jit_footguns
2026-01-06 11:16:57 -05:00
chenyu
b699b9f763 test case for jit a function with item call (#14039)
* test case for jit a function with item call

output is silently wrong now

* no dtype
2026-01-06 10:40:43 -05:00
qazal
3170365a5b visualize SQTT with the same cfg infrastructure (#13870)
* start

* rough sketch

* post render dag

* art

* intro g key

* work

* custom color scale

* colors

* more blue

* better

* smaller

* use for loop in test
2026-01-06 14:53:20 +09:00
chenyu
83063cc3e4 onnx TensorScatter (#14024) 2026-01-05 09:05:22 -05:00