qazal
|
bd55507ee4
|
RDNA3 fp16 assembly gemm 85 TFLOPS (#13990)
|
2026-01-03 18:34:23 +09:00 |
|
wozeparrot
|
6242a9d151
|
tk: no global copy and clear ranges (#13988)
|
2026-01-02 23:45:15 -08:00 |
|
wozeparrot
|
9f082e8e25
|
fa: split kv bwd into 2 kernels (#13981)
|
2026-01-02 18:45:51 -08:00 |
|
qazal
|
2cc64d71b0
|
simplify mi350x gemm / viz asm tests (#13984)
* mi350x gemm cleanup
* asm tests work
* simpler asm tests
|
2026-01-03 11:11:07 +09:00 |
|
chenyu
|
7cbafb2ef1
|
update hypothesis min version (#13983)
there was a local_constants perf regression that made hypothesis related tests slow
|
2026-01-02 21:01:57 -05:00 |
|
Christopher Milan
|
9dc524536f
|
IMAGE=1 creates "dynamic" images (#13769)
* remove image from BufferSpec
* cl tiny_gemm (64) works
* mypy
* padding
* openpilot CL
* reshape properly
* remove extra qcom checks
* pad output
* mypy
* update compile test
* move undo
* TestImageCopy valid images
* TestImageRealization valid images
* TestImageDType valid images
* cleanups
* test_renderer_failures
* ruff
* mypy
* simplify ops_qcom
* bump step time
* Revert "bump step time"
This reverts commit 75a037c7d0.
* "dynamic textures" are optional
* a start
* IMAGE=1 works, no FLOAT16
* fast but wrong
* mypy
* some fixes
* better
* works
* refactor
* oops
|
2026-01-02 16:22:39 -05:00 |
|
Christopher Milan
|
61dc70f1a8
|
add driving_vision IMAGE=1 benchmark (#13979)
|
2026-01-02 13:58:27 -05:00 |
|
George Hotz
|
0e282025ff
|
assembly/amd: split test_emu into hw tests (#13966)
* assmebly/amd: split test_emu into hw tests
* hw tests
* bugfixes
* more tests and fix
|
2026-01-02 08:04:56 -08:00 |
|
chenyu
|
2e2b5fed12
|
fix misspellings (#13976)
|
2026-01-02 10:37:38 -05:00 |
|
nietras
|
f49e4714af
|
Fix spelling errors in README for AMD assembly (#13975)
|
2026-01-02 10:15:20 -05:00 |
|
b1tg
|
a78fcc55a4
|
amd tc 1616128 (#13439)
* amd tc 1616128
* fix test
* remove hardcoded check in test
|
2026-01-02 09:01:05 -05:00 |
|
chenyu
|
fcbb896e05
|
remove unused to_struct [pr] (#13973)
|
2026-01-02 08:54:57 -05:00 |
|
nimlgen
|
ff7853a65a
|
am: fix aid doorbells (#13971)
|
2026-01-02 15:53:44 +03:00 |
|
nimlgen
|
42abb0586c
|
am: fix aid doorbells (#13972)
|
2026-01-02 15:53:13 +03:00 |
|
nimlgen
|
ebbaad6bfd
|
am: enable all sdma engines (#13970)
|
2026-01-02 15:25:15 +03:00 |
|
qazal
|
5f52266225
|
mi350x gemm: use Tensor.custom_kernel in asm test (#13969)
* mi350x gemm: use Tensor.custom_kernel in asm test
* A @ B for baseline
|
2026-01-02 18:30:50 +09:00 |
|
George Hotz
|
5a1a561e0f
|
assembly/amd: rdna4 autogen (#13967)
* assembly/amd: add pcode ds ops
* refactors
* fix ds op
* update autogen
* fix flat bug
* more tests
* fix emu test
* that's a hack
* generic
* fix all tests
* two tests
* fix test failure
* better
* remove __all__
* assembly/amd: fix autogen for RDNA4
|
2026-01-01 23:12:18 -05:00 |
|
wozeparrot
|
b27527f05a
|
fix: missed inner tracked range (#13964)
|
2026-01-01 18:09:57 -08:00 |
|
wozeparrot
|
ecbac8a338
|
tk: fa cleanups + causal test (#13963)
|
2026-01-01 18:05:00 -08:00 |
|
chenyu
|
af0392efea
|
only set DiskDevice.size if it opens successfully (#13962)
|
2026-01-01 19:33:26 -05:00 |
|
chenyu
|
e036d6df89
|
properly fix DiskDevice reuse (#13961)
|
2026-01-01 18:08:23 -05:00 |
|
George Hotz
|
dfb813b760
|
assembly/amd: add pcode ds ops (#13939)
* assembly/amd: add pcode ds ops
* refactors
* fix ds op
* update autogen
* fix flat bug
* more tests
* fix emu test
* that's a hack
* generic
* fix all tests
* two tests
* fix test failure
* better
* remove __all__
|
2026-01-01 16:24:13 -05:00 |
|
chenyu
|
cb7c76a3bd
|
update test_fuzz_failure to not contruct full UOp (#13960)
|
2026-01-01 15:09:58 -05:00 |
|
chenyu
|
51398edf9c
|
fix indirect import (#13958)
also deleted old external tests
|
2026-01-01 14:22:45 -05:00 |
|
chenyu
|
8e416df438
|
simpler InvalidType [pr] (#13957)
simpler singleton pattern
|
2026-01-01 13:55:51 -05:00 |
|
nimlgen
|
b8ea0d779c
|
am: remove pipe, queue from setup_ring (#13947)
|
2026-01-01 21:06:41 +03:00 |
|
chenyu
|
4d5c4d256d
|
update tqdm for edge case (#13956)
1.00kit/s and not 1000it/s for value 999.5
|
2026-01-01 11:37:26 -05:00 |
|
chenyu
|
ed222070f7
|
update xlog2 fp16 decomp to not use fp32 (#13955)
|
2026-01-01 11:18:29 -05:00 |
|
chenyu
|
ce84a23142
|
remove tee in benchmark (#13954)
|
2026-01-01 10:55:36 -05:00 |
|
b1tg
|
24723327ac
|
fix tc_up in search (#13438)
* tensor_core is missing from Scheduler
* test upcast max
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
|
2026-01-01 10:25:08 -05:00 |
|
qazal
|
9726500de8
|
enable using assembly in Tensor.custom_kernel (#13895)
|
2026-01-02 00:12:01 +09:00 |
|
qazal
|
c0f52c9dcb
|
split assembly gemm to per arch directory (#13953)
|
2026-01-02 00:10:22 +09:00 |
|
chenyu
|
c69470be52
|
fix test_symbolic_arange_sym_step (#13952)
|
2026-01-01 09:41:07 -05:00 |
|
chenyu
|
b91b46091c
|
delete test_tensor_uop (#13951)
old test for shape tracker. also update tests that refer shapetracker
names
|
2026-01-01 09:25:05 -05:00 |
|
chenyu
|
17ef4af72c
|
new ceildiv that fixed symbolic conv (#13944)
* new ceildiv that fixed symbolic conv
* smaller test case
|
2026-01-01 09:02:41 -05:00 |
|
qazal
|
6a5430ab00
|
correct args order in mi350x gemm (#13949)
|
2026-01-01 23:01:46 +09:00 |
|
chenyu
|
baff10d32c
|
clean up Tensor.svd slices (#13948)
|
2026-01-01 08:18:45 -05:00 |
|
nimlgen
|
1c5ed8e8b5
|
am: remove doorbells from setup_ring (#13946)
|
2026-01-01 14:39:21 +03:00 |
|
haofei
|
526fd4ec71
|
Fix SVD rank‑1 Jacobi rotation when tau == 0 (#13945)
|
2026-01-01 00:30:18 -05:00 |
|
haofei
|
20777f30b9
|
Fix QR/SVD NaNs on zero/orthogonal inputs (#13943)
|
2025-12-31 23:40:09 -05:00 |
|
chenyu
|
0ed58c1fcd
|
clean up some functions in helpers [pr] (#13942)
|
2025-12-31 18:29:16 -05:00 |
|
chenyu
|
e2987001ee
|
unify pre-commit mypy and ci mypy (#13940)
|
2025-12-31 17:51:51 -05:00 |
|
chenyu
|
8bf7c9c1d2
|
no-op cleanups for ptx [pr] (#13938)
|
2025-12-31 17:28:39 -05:00 |
|
George Hotz
|
2bb07d4824
|
assembly/amd: move Reg out of the psuedocode (#13934)
* assembly/amd: move Reg out of the psuedocode
* remove extra
* fix pcode tests
* simpler pcode
* simpler
* simpler
* cleaner
* fix mypy
|
2025-12-31 15:34:51 -05:00 |
|
chenyu
|
52acadc160
|
consolidate IGNORE_OOB=0 tests (#13937)
add a new unit test file and add more cases
|
2025-12-31 15:24:20 -05:00 |
|
chenyu
|
c0c1c1c8c8
|
remove unused validate rule (#13936)
|
2025-12-31 15:02:49 -05:00 |
|
chenyu
|
b6d08f247d
|
assert z3_xor input type (#13933)
|
2025-12-31 13:37:57 -05:00 |
|
George Hotz
|
f14428090f
|
assembly/amd: speed up emulator (#13932)
|
2025-12-31 13:32:25 -05:00 |
|
Christopher Milan
|
13973e4dea
|
refactor image pitch (#13928)
|
2025-12-31 13:22:38 -05:00 |
|
chenyu
|
051fe6c8bc
|
less toposort iteration in oob validate (#13929)
|
2025-12-31 13:16:34 -05:00 |
|