Commit Graph

11600 Commits

Author SHA1 Message Date
kamilisjon
9a9564118c [pr] Delete reverse_toposort (#13987)
* Delete reverse_toposort

* Update comment and profiler name

* Update profiler name
2026-01-03 22:03:44 -08:00
George Hotz
8328511808 assembly/amd: make the emu.py code shine (#13996)
* assembly/amd: make the code shine

* lil clean

* reg back in pcode

* cleanups

* gen fma_mix

* no writelane hacks

* fn cleanup

* dead vgpr_write

* readable

* smem

* cleanup bench_emu

* speedups

* simpler and faster

* direct inst._fn

* split fxn

* Revert "simpler and faster"

This reverts commit e85f6594b3.

* move lds to wavestate

* dispatcher

* pc in dispatch

* literal isn't wavestate

* cleanups + program

* one readlane

* exec_vop3sd in exec_vop

* cleaner exec_vopd

* fully merge VOP3P

* no special paths

* no SliceProxy

* low=0

* no bigint

* failing tests

* fma on python 3.13
2026-01-03 20:33:09 -08:00
qazal
bdb421f13e process_replay: passthrough sink arg for Ops.PROGRAM input (#14000) 2026-01-04 13:09:39 +09:00
Galax
66caa9fe1d fix: library linking for fedora systems (#13999) 2026-01-03 17:40:56 -08:00
chenyu
8003db2a28 test case of NOOP store load folding (#13997) 2026-01-03 14:39:26 -05:00
chenyu
c1b8644a3f test removing expander rules [pr] (#13994) 2026-01-03 12:38:01 -05:00
Christopher Milan
35c2870b1f gate image_conv2d pitch hacks on IMAGE==1 (#13995)
* gate image_conv2d pitch hacks on IMAGE==1

* fix opencl image copies

* cleanup
2026-01-03 12:27:31 -05:00
nimlgen
a49924a0e9 hcq: _sleep report status (#13992)
* hcq: _sleep report status

* msg

* print all
2026-01-03 14:28:28 +03:00
nimlgen
3b354bc11f hcq: better queue managment (#13991) 2026-01-03 13:11:15 +03:00
nimlgen
efb2ae87c6 hcq sync aql (#13756)
* hcq sync aql

* w
2026-01-03 12:59:24 +03:00
qazal
bd55507ee4 RDNA3 fp16 assembly gemm 85 TFLOPS (#13990) 2026-01-03 18:34:23 +09:00
wozeparrot
6242a9d151 tk: no global copy and clear ranges (#13988) 2026-01-02 23:45:15 -08:00
wozeparrot
9f082e8e25 fa: split kv bwd into 2 kernels (#13981) 2026-01-02 18:45:51 -08:00
qazal
2cc64d71b0 simplify mi350x gemm / viz asm tests (#13984)
* mi350x gemm cleanup

* asm tests work

* simpler asm tests
2026-01-03 11:11:07 +09:00
chenyu
7cbafb2ef1 update hypothesis min version (#13983)
there was a local_constants perf regression that made hypothesis related tests slow
2026-01-02 21:01:57 -05:00
Christopher Milan
9dc524536f IMAGE=1 creates "dynamic" images (#13769)
* remove image from BufferSpec

* cl tiny_gemm (64) works

* mypy

* padding

* openpilot CL

* reshape properly

* remove extra qcom checks

* pad output

* mypy

* update compile test

* move undo

* TestImageCopy valid images

* TestImageRealization valid images

* TestImageDType valid images

* cleanups

* test_renderer_failures

* ruff

* mypy

* simplify ops_qcom

* bump step time

* Revert "bump step time"

This reverts commit 75a037c7d0.

* "dynamic textures" are optional

* a start

* IMAGE=1 works, no FLOAT16

* fast but wrong

* mypy

* some fixes

* better

* works

* refactor

* oops
2026-01-02 16:22:39 -05:00
Christopher Milan
61dc70f1a8 add driving_vision IMAGE=1 benchmark (#13979) 2026-01-02 13:58:27 -05:00
George Hotz
0e282025ff assembly/amd: split test_emu into hw tests (#13966)
* assmebly/amd: split test_emu into hw tests

* hw tests

* bugfixes

* more tests and fix
2026-01-02 08:04:56 -08:00
chenyu
2e2b5fed12 fix misspellings (#13976) 2026-01-02 10:37:38 -05:00
nietras
f49e4714af Fix spelling errors in README for AMD assembly (#13975) 2026-01-02 10:15:20 -05:00
b1tg
a78fcc55a4 amd tc 1616128 (#13439)
* amd tc 1616128

* fix test

* remove hardcoded check in test
2026-01-02 09:01:05 -05:00
chenyu
fcbb896e05 remove unused to_struct [pr] (#13973) 2026-01-02 08:54:57 -05:00
nimlgen
ff7853a65a am: fix aid doorbells (#13971) 2026-01-02 15:53:44 +03:00
nimlgen
42abb0586c am: fix aid doorbells (#13972) 2026-01-02 15:53:13 +03:00
nimlgen
ebbaad6bfd am: enable all sdma engines (#13970) 2026-01-02 15:25:15 +03:00
qazal
5f52266225 mi350x gemm: use Tensor.custom_kernel in asm test (#13969)
* mi350x gemm: use Tensor.custom_kernel in asm test

* A @ B for baseline
2026-01-02 18:30:50 +09:00
George Hotz
5a1a561e0f assembly/amd: rdna4 autogen (#13967)
* assembly/amd: add pcode ds ops

* refactors

* fix ds op

* update autogen

* fix flat bug

* more tests

* fix emu test

* that's a hack

* generic

* fix all tests

* two tests

* fix test failure

* better

* remove __all__

* assembly/amd: fix autogen for RDNA4
2026-01-01 23:12:18 -05:00
wozeparrot
b27527f05a fix: missed inner tracked range (#13964) 2026-01-01 18:09:57 -08:00
wozeparrot
ecbac8a338 tk: fa cleanups + causal test (#13963) 2026-01-01 18:05:00 -08:00
chenyu
af0392efea only set DiskDevice.size if it opens successfully (#13962) 2026-01-01 19:33:26 -05:00
chenyu
e036d6df89 properly fix DiskDevice reuse (#13961) 2026-01-01 18:08:23 -05:00
George Hotz
dfb813b760 assembly/amd: add pcode ds ops (#13939)
* assembly/amd: add pcode ds ops

* refactors

* fix ds op

* update autogen

* fix flat bug

* more tests

* fix emu test

* that's a hack

* generic

* fix all tests

* two tests

* fix test failure

* better

* remove __all__
2026-01-01 16:24:13 -05:00
chenyu
cb7c76a3bd update test_fuzz_failure to not contruct full UOp (#13960) 2026-01-01 15:09:58 -05:00
chenyu
51398edf9c fix indirect import (#13958)
also deleted old external tests
2026-01-01 14:22:45 -05:00
chenyu
8e416df438 simpler InvalidType [pr] (#13957)
simpler singleton pattern
2026-01-01 13:55:51 -05:00
nimlgen
b8ea0d779c am: remove pipe, queue from setup_ring (#13947) 2026-01-01 21:06:41 +03:00
chenyu
4d5c4d256d update tqdm for edge case (#13956)
1.00kit/s and not 1000it/s for value 999.5
2026-01-01 11:37:26 -05:00
chenyu
ed222070f7 update xlog2 fp16 decomp to not use fp32 (#13955) 2026-01-01 11:18:29 -05:00
chenyu
ce84a23142 remove tee in benchmark (#13954) 2026-01-01 10:55:36 -05:00
b1tg
24723327ac fix tc_up in search (#13438)
* tensor_core is missing from Scheduler

* test upcast max

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-01 10:25:08 -05:00
qazal
9726500de8 enable using assembly in Tensor.custom_kernel (#13895) 2026-01-02 00:12:01 +09:00
qazal
c0f52c9dcb split assembly gemm to per arch directory (#13953) 2026-01-02 00:10:22 +09:00
chenyu
c69470be52 fix test_symbolic_arange_sym_step (#13952) 2026-01-01 09:41:07 -05:00
chenyu
b91b46091c delete test_tensor_uop (#13951)
old test for shape tracker. also update tests that refer shapetracker

names
2026-01-01 09:25:05 -05:00
chenyu
17ef4af72c new ceildiv that fixed symbolic conv (#13944)
* new ceildiv that fixed symbolic conv

* smaller test case
2026-01-01 09:02:41 -05:00
qazal
6a5430ab00 correct args order in mi350x gemm (#13949) 2026-01-01 23:01:46 +09:00
chenyu
baff10d32c clean up Tensor.svd slices (#13948) 2026-01-01 08:18:45 -05:00
nimlgen
1c5ed8e8b5 am: remove doorbells from setup_ring (#13946) 2026-01-01 14:39:21 +03:00
haofei
526fd4ec71 Fix SVD rank‑1 Jacobi rotation when tau == 0 (#13945) 2026-01-01 00:30:18 -05:00
haofei
20777f30b9 Fix QR/SVD NaNs on zero/orthogonal inputs (#13943) 2025-12-31 23:40:09 -05:00