Commit Graph

966 Commits

Author SHA1 Message Date
George Hotz
f081f154ae parameterize the CDNA asm gemm (#14813)
* parameterize the CDNA asm gemm

* fix llama test

* fix

* add more gemmt ests

* confirm all match

* test these asm gemms
2026-02-17 11:35:18 +08:00
George Hotz
bc3487d607 VIZ display cleanups (#14811)
* exclude reshape/expand broadcasts from viz

* limit src lines
2026-02-17 10:03:08 +08:00
qazal
9da7f5e733 disable process replay for AMD emulator renderer [pr] (#14766)
* disable process replay for AMD emulator renderer [pr]

* line

* skip
2026-02-15 18:52:37 +09:00
nimlgen
3bee6638e3 external_test_hive_reset (#14729)
* external_test_hive_reset

* add fault
2026-02-13 19:08:36 +03:00
George Hotz
4680247e35 renderer/amd: move in tree (#14702)
* renderer/amd: move in tree

* fix paths in tests

* 24000 lines

* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
befc1e800c assembly/amd: disasm is test only (#14694)
* assembly/amd: disasm is test only

* viz uses str
2026-02-12 12:33:46 +08:00
George Hotz
c331798201 move tests to test/backend (#14691)
* move tests to test/backend

* fix imports

* fix CI

* revert that one

* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
George Hotz
4565958792 some lil speedups (#14679) 2026-02-11 10:01:58 +08:00
George Hotz
2d4ad9e739 add a waitlist for graph rewrite (#14678)
* add a waitlist for graph rewrite

* cleaner

* one context on spec check
2026-02-11 09:30:13 +08:00
chenyu
884592f6c8 pin z3-solver version (#14605)
found exact input that crashes z3 4.15.4
2026-02-06 22:49:31 -05:00
George Hotz
7a2a3b5c71 Remove Ops.KERNEL, it's all Ops.CALL now (#14603) 2026-02-07 10:21:54 +08:00
chenyu
b9fe8b7591 fix opt in process replay [pr] (#14599) 2026-02-06 16:49:56 -05:00
chenyu
197ebcbbbc log seed with flush=True in fuzz_symbolic (#14597)
* log seed with flush=True in fuzz_symbolic

i think z3 can crash. added reading seed from argv to see if we repro later

* fuzz_symbolic_symbolic_div
2026-02-06 15:03:57 -05:00
chenyu
d57d24c7d4 Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535)
it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data
2026-02-04 11:31:11 -05:00
nimlgen
2f55005ad9 qcom: sync cpu cache when from_blob (#14518)
* um

* fx

* d

* x

* x

* x

* x

* f

* ren
2026-02-03 21:51:03 +03:00
George Hotz
dd2de4f838 rename all DEFINE_GLOBAL to PARAM (#14511) 2026-02-03 15:09:38 +08:00
chenyu
66d2b02f11 delete files that depends on extra.optimization.helpers (#14499) 2026-02-02 13:33:33 -05:00
George Hotz
ec0398fceb test amd gpu crashes (#14459)
* test amd gpu crashes

* cleanup

* less sketch tests
2026-02-02 18:57:47 +03:00
nimlgen
230d08ec70 test for am recovery and faults handling (#14421)
* test for am recovery and faults handling

* linter
2026-01-29 17:11:24 +03:00
George Hotz
88bc5ee212 assembly/amd: rename to better names (#14384)
* assembly/amd: rename to better names

* might help fuzzing segfault

* emu2 -> emu
2026-01-28 10:00:54 +08:00
George Hotz
984cdc4840 add wrapper class for the -0.0 != 0.0 issue (#14339)
* add wrapper class for the -0.0 != 0.0 issue

* fixes

* spec fix

* missed one
2026-01-26 16:52:37 +08:00
nimlgen
26220a472e no core_id (#14265)
* no core_id

* kwargs

* est

* linters

* ugh

* revert this

* deps

* glb

* should work?

* nn

* line

* fx

* ym

* z

* d

* um?

* revert

* this one?

* first half

* um p2

* all?

* um

* cleaner

* um
2026-01-23 21:30:12 +03:00
chenyu
073c6a81b5 raise if Tensor._buffer is called during jit (#14114)
* raise if Tensor._buffer is called during jit

* cleaner
2026-01-22 17:30:18 -05:00
chenyu
574d171fa6 fix onnx Pad constant_value=None (#14271)
also removed a dead branch in _resolve_pool_pads
2026-01-21 11:51:34 -05:00
chenyu
9ea63d7d52 failed test case for onnx IF with jit (#14235)
silently fails now since onnx treats IF cond as a const
2026-01-19 18:10:05 -05:00
chenyu
5e6a72c33f new Onnx Gather (#14187)
instead of assuming const indices, check if it showed as a const
2026-01-16 22:24:07 -05:00
chenyu
ab244c7f81 onnx Gather should not assume indices to be const (#14185)
* onnx Gather should not assume indices to be const

added a failed test case

* just list
2026-01-16 20:55:00 -05:00
chenyu
2a2c1eacf6 disable fast_idiv on metal (#14137)
there's a metal compiler bug which was the root cause that keccak needs a contigous hack
2026-01-13 21:40:40 -05:00
chenyu
cad7feec02 more onnx ops (#14104)
HannWindow, HammingWindow, BlackmanWindow, Hardmax, LpNormalization
2026-01-12 09:11:13 -05:00
chenyu
9973a81356 add channels_last to QLinearGlobalAveragePool (#14094)
and other minor cleanups
2026-01-10 18:38:19 -05:00
chenyu
83063cc3e4 onnx TensorScatter (#14024) 2026-01-05 09:05:22 -05:00
chenyu
9497ec00f2 fix onnx attention permute (#14025)
* fix onnx attention permute

* skip test_attention_4d_fp16_cpu too
2026-01-05 08:58:50 -05:00
chenyu
7a81a3cb98 more passed onnx tests (#14022) 2026-01-05 07:46:27 -05:00
chenyu
aae08b20e0 enable passed onnx tests (#14017) 2026-01-04 22:12:50 -05:00
chenyu
f6a78a29e0 support einsum trace (#14012)
* support einsum trace

* test_einsum_scalar_cpu
2026-01-04 19:27:27 -05:00
qazal
bdb421f13e process_replay: passthrough sink arg for Ops.PROGRAM input (#14000) 2026-01-04 13:09:39 +09:00
chenyu
51398edf9c fix indirect import (#13958)
also deleted old external tests
2026-01-01 14:22:45 -05:00
nimlgen
25440f0f72 all2all (#13902)
* all2all

* um

* fix

* x

* um

* simler

* mypy

* fix

* t

* cmnts
2025-12-31 16:38:32 +03:00
George Hotz
43c6e973d8 add optional compiler in Renderer (#13817)
* add optional compiler in Renderer [pr]

* fix

* late init

* remove precompiled

* cleanup
2025-12-23 17:58:46 -05:00
nimlgen
90b217896f am: xgmi p2p (#13811)
* system: use addr space

* am: xgmi

* fix

* ugh
2025-12-23 20:11:38 +03:00
George Hotz
8dcba2e2cc no full_rewrite [pr] (#13809)
* no full_rewrite [pr]

* fix

* fix docs
2025-12-22 23:20:01 -05:00
chenyu
7f1d41c9f9 delete files that import ShapeTracker (#13805) 2025-12-22 15:54:18 -05:00
George Hotz
45c459848d remove more stale stuff (#13765)
* remove more stale stuff

* remove disassemblers/adreno

* stale
2025-12-19 17:14:56 -04:00
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
George Hotz
3dbde178c1 mark slow tests as slow instead of as CI (#13736)
* mark slow tests as slow instead of as CI

* CI shouldn't have different behavior

* more skips / CI

* slow
2025-12-17 10:29:57 -04:00
George Hotz
4b741e893f remove REMOTE=1 (#13722)
* remove REMOTE=1

* leave ibverbs
2025-12-16 15:58:10 -04:00
nimlgen
e36385e570 am: support xgmi systems (#13659)
* am: support xgmi systems

* fake_am
2025-12-12 18:55:45 +03:00
Douglas Nyberg
947c6eefc3 add Swish op (#13541)
* add Swish ONNX operator

* add Swish regression test

* remove trailing whitespace

* upgrade ONNX to 1.20, add excludes for unimplemented ops

* upgrade ONNX to 1.19, add Swish op

* upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op

* exclude attention_3d and attention_4d_gqa tests

* exclude attention fp16 tests

* exclude all attention tests

* retrigger CI

* retrigger CI - worker crash
2025-12-08 12:41:18 -05:00
George Hotz
c5bd28e21d start work on schedule cache (#13529)
* start work on schedule cache

* local unique

* schedule cache works

* schedule cache cleanup

* fix tests

* preserve metadata

* oops, fix cache

* put that there

* fix spec

* always miss

* why is that broken?

* src[0].op

* fix process replay

* delete abstractions2

* reenable the actual schedule cache

* metadata is best effort

* fix JIT in examples/gradaccum_mnist.py

* full jit

* fixed and test is real
2025-12-04 17:24:49 -08:00
Douglas Nyberg
a8a62bc08e add max/min reduction support to ScatterND (#13562) 2025-12-04 00:53:47 -08:00