1297 Commits

Author SHA1 Message Date
Christopher Milan
68fe5d8b36 Revert "don't save caches for PRs (#14389)" (#14390) 2026-01-27 23:22:26 -05:00
Christopher Milan
4ab228b498 don't save caches for PRs (#14389) 2026-01-27 23:21:31 -05:00
Christopher Milan
5e36482314 decompose long to ints where unsupported, try 2 (#14383) 2026-01-27 23:20:43 -05:00
George Hotz
88bc5ee212 assembly/amd: rename to better names (#14384)
* assembly/amd: rename to better names

* might help fuzzing segfault

* emu2 -> emu
2026-01-28 10:00:54 +08:00
chenyu
cd22ee9ed0 add InvalidType to ConstType [pr] (#14373)
* add InvalidType to ConstType [pr]

TYPED=1 python test/test_tiny.py passes.
added PyConst = float|int|bool for some Tensor level input types

* hcq
2026-01-27 14:09:34 -05:00
chenyu
db010a31be IGNORE_OOB -> CHECK_OOB [pr] (#14374)
flip the meaning
2026-01-27 12:20:59 -05:00
Christopher Milan
c9c533fc78 libclang path is homebrew on macos (#14357)
* libclang path is homebrew macos

* typo

* ugh

* typo

* regen

* no LIBCLANG_PATH
2026-01-26 17:32:09 -05:00
qazal
2d91fe6310 use amdgpu dsl in mmapeak (#14342)
* use amdgpu dsl in mmapeak

* don't rely on llvm for vgpr counting

* llvm roundtrip assert

* rm it, add ci

* vgpr_count

* move emulated test to amd, it needs comgr

* env

* arch

* inst._fields -> inst.operands

* vgpr offset
2026-01-26 22:03:43 +09:00
qazal
b2e2ace85b viz: remove ci check, it's VIZ=-1/-2 (#14343) 2026-01-26 20:36:23 +09:00
George Hotz
be23776ba7 assembly/amd: replace pcode with ucode (#14002)
* a bunch of todos for my boy claude

* uops have types

* lil cleanups

* simpler ucode

* isNAN

* calls

* move more

* cleanup pcode_parse

* cvt functions

* fix parser bugs

* no void

* minmax

* more pcode parse

* pretty print

* transform

* comments

* move to transform

* assign/declare

* simpler norm

* single PM

* just Uops

* simpler

* more typed

* all rewrite

* less verbose

* work

* spec

* transform

* work

* simpler spec

* less spec

* bitcast

* simpler

* simp ucode

* work

* more in pcode_transform

* remove junk

* more functions

* bug

* no void assign

* load/store

* wave

* fixes

* move denorm

* move more functions

* tests

* cat is shape None

* uop syntax

* move a few more

* program_spec

* cat stuff

* assign fix clear

* unused

* nans

* fp bits

* works with simplify

* remove junk

* special

* meh

* more

* more

* update test pcode parse

* improve parser

* parse some for loops

* merge master

* dead files

* tests pass

* emu2

* better emu2

* test_plus works

* uselessly write more instructions

* use pcode

* something

* something

* bench_emu

* progress

* ds works

* work

* work

* more passing

* run compare

* bench_emu

* more pcode

* a few more

* bugfixes

* bugfix

* test fixes

* tests pass without USE_HW

* all hw tests pass

* add more hw tests

* new hw tests

* bit

* less handcode

* parse more

* consolidate pcode

* fixes

* rsrc

* lane pcode

* cleanups

* simpler

* emu bugs

* one cmp test fails

* fix decode and upd name

* fix name and test harness

* _ftz_f32

* fix denorm

* fix VOPD and use load

* fix carry bug

* no load where / just invalid

* clean

* simpler

* merge sops

* refactoring

* simplifications

* bugfixes

* new tests

* f16 sin fix

* assertion and hw tests

* cvt functions

* one more failure

* bugfixes

* bugfix + regression

* more tests

* fmac

* no manual unrolling

* ordering

* LLVM backend is a lot faster

* compile inst

* more bugs

* f16

* bugfix

* fix regression

* one clang call

* 1M inst

* scratch works

* do scratch correctly

* cleanup

* regression

* cmp

* fmamk fixes

* merge

* fix vcmpx

* unify memory

* remove unused code

* ignore oob for test

* cleanups

* fix mbs

* unify cmp

* test

* minor cleanups

* bump timeout

* fix tests

* revert the CMPLE stuff

* remove opt

* less diff

* simpler

* revert

* support multiple backends

* memset is a lot faster

* split out in bench emu

* improve timing

* timing

* cache that

* cache that

* simpler and faster

* tokenize

* binop table

* simpler

* move to parser

* tok for lambda

* refactor

* expr_parser

* delete emu2_pcode

* import cleanup

* lil

* if parse

* work

* simpler

* no v

* trig preop is faster

* durations for tests

* fix cmp bug

* sdst

* remove scartch_size hack

* null behavior

* _MXCSRContext

* bugfixes

* DEBUG >= 3

* test smem crashes my gpu

* debug

* test

* test smem

* profiler

* full inst

* bugfix

* rtag(1)

* pc is 64-bit and word

* pc is real code now

* dynamic

* more dynamic

* fix oob access

* fix crash, more dyn

* all dyn

* really all dyn

* correct null mask

* lit + format

* 21s on the tests

* 13s on the tests

* canonical name

* simm16

* more dyn

* 14s

* proper saddr dedup

* dyn

* debug 5

* better 5

* revert dynamic stuff

* that can be dyn

* negative offsets

* dyn wmma

* f16 wmma support / ops / dtype / dtype_alu

* symbolic changes not needed

* ConstFloat

* more uop.const

* __eq__

* uop tests

* fix f16

* bf16 tensor cores

* whitespace

* remove cast roundtrip

* Revert "remove cast roundtrip"

This reverts commit c5bb0381c3.

* just the fix

* remove dead paths

* llvm runs
2026-01-26 18:04:29 +08:00
nimlgen
21ab23ae18 nv: add pma for ada (#14328)
* nv: add pma for ada

* um

* fix

* shorter

* mock
2026-01-25 17:33:37 +03:00
chenyu
7e41da1ae8 fix generate_dataset.sh (#14324)
added `set -e` so wrong pathes would fail the script, then fixed the path
2026-01-24 16:47:10 -05:00
George Hotz
52b989c6c8 don't place consts early + fixes from anthropic challenge (#14286)
* don't place consts early

* add anthropic challenge

* with ref

* do we still have to devectorize bools?

* tests pass

* just WHERE

* fine, revert that

* fine, revert

* only index

* z3 validator doesn't support vectorized

* Revert "z3 validator doesn't support vectorized"

This reverts commit 1b7930ecb3.

* z3 not for vec

* no spec

* VLIWRenderer

* loop unrolling

* better comments

* cleanups

* skip cast

* renderer

* cleanups

* prints

* no hack

* hacks

* bump to 11

* reg warning

* lil clean

* cleaner renderer
2026-01-23 10:48:39 +09:00
nimlgen
8cd22df2dd amd: alive wgps (#14149)
* amd: disabled wgps

* l

* wgp

* uoops

* mockgpu

* drm

* ad this

* fi

* reg
2026-01-23 00:08:45 +03:00
chenyu
e04767e39e run pre-commit in ci (#14253)
* run pre-commit in ci

prevents pre-commit regression

* IGNORE_OOB=1

* pytest

* unit test

* split
2026-01-20 12:24:33 -05:00
nimlgen
dc82856084 tbgpu: shim binary + remote apl pci dev (#14124)
* shim binary + remote pci dev

* v2

* rip out apl

* cmds

* rename

* clean

* remove

* rm gitignore

* ui

* install

* linter

* um

* cleaner

* assets

* normal install in ui

* cleaner app

* install script

* support fd mmap

* cleaner

* kill server when disconn

* rename + pcidevs

* sign

* install and reinstall

* no sip install

* will trigger update

* nv

* ugh

* this

* fix

* nv

* use nosip sign

* auto install

* remove

* mypy

* upd

* ditto

* print

* simpler

* ditto

* um

* simpler

* upd

* upd

* cleaner

* autogen

* cleaner

* move

* annotations

* server cleaner
2026-01-20 16:15:18 +03:00
qazal
b1c5a242b7 Revert "move is_dtype_supported logic to renderer (#14188)" (#14237)
This reverts commit 161fee9a48.
2026-01-20 12:19:14 +09:00
Christopher Milan
161fee9a48 move is_dtype_supported logic to renderer (#14188)
* move is_dtype_supported logic to renderer

* fix CPU_COUNT

* mypy happy

* early import libclang too with llvm

* run with debug

* skip autogen tests if MTLCompiler or llvm is loaded

* run autogen tests separately in CI

* lint
2026-01-18 22:37:04 -05:00
Christopher Milan
1eb110cd7d fix memory corruption in NIR, reenable process replay (#14204) 2026-01-18 02:05:12 -05:00
qazal
feaa804158 skip lvp process replay in CI [pr] (#14202) 2026-01-18 13:25:04 +09:00
chenyu
dc4ae7dd08 lower ASSERT_MIN_STEP_TIME for driving_policy to 3ms (#14184)
seems quite stable at 2.7ms now
2026-01-16 15:04:53 -05:00
George Hotz
e9ce12028e assembly/amd: amdxml cleanups, remove broken SDWA/DPP, merge in pdf.py (#14154)
* assembly/amd: amdxml cleanups, remove broken SDWA/DPP

* remove buf junk

* simplify

* simplify

* lil cleanup

* dead fixes

* strip non pcode extraction from pdf

* merge pdf.py into amdxml.py

* only amdxml
2026-01-15 09:23:19 +09:00
chenyu
986e865830 fix TINY_BACKEND=1 cumsum (#14138)
* fix TINY_BACKEND=1 cumsum

old hack was wrong, need to apply contiguous on the input

* test time

* test_linalg_svd is slow
2026-01-14 09:54:49 -05:00
nimlgen
f9147422a3 ci: add setcap (#14143) 2026-01-14 13:15:01 +03:00
Christopher Milan
e0eea0d833 autogen: verify all files in CI (#14140)
* autogen: verify all files in CI

* dont delete libclang
2026-01-14 02:35:54 -05:00
George Hotz
2ab18ea7e3 assembly/amd: use xml instead of pdf (#14118)
* assembly/amd: use xml instead of pdf

* use amdxml to generate info about op sizes

* fix many tests with invalid instructions

* fix info generation

* chad xml fixes many bugs

* rename to operands

* simplify

* amdxml

* bug fix
2026-01-14 10:03:37 +09:00
George Hotz
8b1b15aec0 assembly/amd: SQTT support (#14099)
* assembly/amd: SQTT support

* simpler

* cmp wave

* instruction compare

* rocprof decode

* simpler

* no llvm

* no strcmp
2026-01-12 05:07:17 +09:00
chenyu
92246ea731 update tests, WEBGPU=1 pytest . passes (#14089)
* update tests, `WEBGPU=1 pytest .` passes

* minor update
2026-01-10 00:03:02 -05:00
nimlgen
e372c841ba hevc: beam in decode (#14067)
* hevc: beam in decode

* fine

* g
2026-01-08 15:47:16 +03:00
Christopher Milan
0120d69caa autogen: avcodec (and simplify workflow) (#14031)
* simplify autogen workflow and add avcodec verification

- Consolidate all regeneration into single steps (delete + import)
- Remove continue-on-error and individual diff checks
- Use git diff at end to catch all differences
- Show artifact URL in failure message
- Add avcodec.py verification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* patch avcodec

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 23:30:25 -05:00
George Hotz
20653d2996 assembly/amd: make pdf.py code shine (#14029)
* assembly/amd: make pdf.py code shine

* no merge

* pdf2 is the future

* something

* regen enums

* test

* work

* remove junk

* write

* pcode extraction

* pdf2 passes all tests

* simplify

* simpler pdf

* late filter

* remove hacks

* simplify pdf2.py

* field type

* remove defaults

* don't export srcenum

* simple pdf.py

* simpler

* cleaner

* less hack in PDF
2026-01-05 18:49:40 -08:00
Christopher Milan
b2a0b9c551 autogen: dump patch in CI (#14010)
* autogen: don't fast-fail, produce patch artifact on differences

All verification steps now use continue-on-error to run completely.
Each job generates a patch artifact containing all differences found.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* add gen from header test

* fix tests

* fail if diff

* add forward decl autogen test

* remove confusing/wrong comments

* macos unittests set LIBCLANG_PATH

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 22:38:12 -05:00
George Hotz
8328511808 assembly/amd: make the emu.py code shine (#13996)
* assembly/amd: make the code shine

* lil clean

* reg back in pcode

* cleanups

* gen fma_mix

* no writelane hacks

* fn cleanup

* dead vgpr_write

* readable

* smem

* cleanup bench_emu

* speedups

* simpler and faster

* direct inst._fn

* split fxn

* Revert "simpler and faster"

This reverts commit e85f6594b3.

* move lds to wavestate

* dispatcher

* pc in dispatch

* literal isn't wavestate

* cleanups + program

* one readlane

* exec_vop3sd in exec_vop

* cleaner exec_vopd

* fully merge VOP3P

* no special paths

* no SliceProxy

* low=0

* no bigint

* failing tests

* fma on python 3.13
2026-01-03 20:33:09 -08:00
Christopher Milan
35c2870b1f gate image_conv2d pitch hacks on IMAGE==1 (#13995)
* gate image_conv2d pitch hacks on IMAGE==1

* fix opencl image copies

* cleanup
2026-01-03 12:27:31 -05:00
Christopher Milan
9dc524536f IMAGE=1 creates "dynamic" images (#13769)
* remove image from BufferSpec

* cl tiny_gemm (64) works

* mypy

* padding

* openpilot CL

* reshape properly

* remove extra qcom checks

* pad output

* mypy

* update compile test

* move undo

* TestImageCopy valid images

* TestImageRealization valid images

* TestImageDType valid images

* cleanups

* test_renderer_failures

* ruff

* mypy

* simplify ops_qcom

* bump step time

* Revert "bump step time"

This reverts commit 75a037c7d0.

* "dynamic textures" are optional

* a start

* IMAGE=1 works, no FLOAT16

* fast but wrong

* mypy

* some fixes

* better

* works

* refactor

* oops
2026-01-02 16:22:39 -05:00
Christopher Milan
61dc70f1a8 add driving_vision IMAGE=1 benchmark (#13979) 2026-01-02 13:58:27 -05:00
George Hotz
dfb813b760 assembly/amd: add pcode ds ops (#13939)
* assembly/amd: add pcode ds ops

* refactors

* fix ds op

* update autogen

* fix flat bug

* more tests

* fix emu test

* that's a hack

* generic

* fix all tests

* two tests

* fix test failure

* better

* remove __all__
2026-01-01 16:24:13 -05:00
chenyu
ce84a23142 remove tee in benchmark (#13954) 2026-01-01 10:55:36 -05:00
chenyu
e2987001ee unify pre-commit mypy and ci mypy (#13940) 2025-12-31 17:51:51 -05:00
chenyu
a9a7b33404 IGNORE_OOB=0 in CI (#13903) 2025-12-31 12:56:59 -05:00
chenyu
ba9aa5cd6f skip some PTX IGNORE_OOB validation (#13927) 2025-12-31 12:40:21 -05:00
chenyu
4968060ad4 fix IGNORE_OOB=0 for WEBGPU (#13926) 2025-12-31 10:41:28 -05:00
chenyu
404755bafd merge ci ruff tests and update ruff version (#13922) 2025-12-31 09:53:49 -05:00
chenyu
dc27eb48ac remove PYTHONPATH="." from test.yml (#13909) 2025-12-30 17:00:16 -05:00
George Hotz
efc99d0c55 assembly/amd: more refactors (#13907)
* assembly/amd: more refactors

* more refactors

* more refactors

* simpler emu

* generate.py

* regen all

* cleanups

* more

* work

* more readme

* lil
2025-12-30 16:13:24 -05:00
George Hotz
69cdc8066d assembly/amd: add dtype tests to AMD IDE CI (#13899)
* add dtype tests to AMD IDE CI

* more tests

* add trig preop

* regen done

* split to amd autogen

* simpler
2025-12-30 11:09:51 -05:00
George Hotz
2b838dc1d8 assembly/amd: fix AMD_LLVM=1 support in emulator (#13881)
* fix AMD_LLVM=1 support in emulator

* more llvm with dtype

* work

* more fixes

* fix dtype
2025-12-30 09:09:57 -05:00
George Hotz
9d8397be11 add CDNA3+RDNA4 support (#13882)
* fix CI

* remove junk

* rename lib to dsl

* correct

* cleanups
2025-12-29 15:51:29 -05:00
George Hotz
81cf9ea0ab rename to extra.assembly.amd (#13879) 2025-12-29 14:10:55 -05:00
George Hotz
37f0fa11b6 rdna3 test cleanups (#13878)
* rdna3 test cleanups

* cleanups

* ugh DONT SKIP
2025-12-29 13:41:59 -05:00