Commit Graph

11856 Commits

Author SHA1 Message Date
qazal
bf2d9d138f viz: simplify amdgpu cfg (#14326)
* viz: replace llvm disasm with our disasm

* it starts with more code

* then it becomes less

* simpler, cdna disassembles with decimal simm16

* s_branch is upper case, add test

* simm16s and others
2026-01-25 15:21:45 +09:00
qazal
647e527a7e viz: replace llvm disasm with our disasm (#14325) 2026-01-25 13:56:56 +09:00
nimlgen
4280a8eef2 am: update fw (#14323) 2026-01-25 01:08:47 +03:00
chenyu
7e41da1ae8 fix generate_dataset.sh (#14324)
added `set -e` so wrong pathes would fail the script, then fixed the path
2026-01-24 16:47:10 -05:00
chenyu
311bfd91d6 clean up where_on_load [pr] (#14322)
no repeated split_uop and general cleanup
2026-01-24 14:43:43 -05:00
nimlgen
8b282ba6d2 memory: reserved vram (#14318) 2026-01-24 19:39:24 +03:00
chenyu
00e9ba0b82 update type for split_uop and where_on_load [pr] (#14319)
also variable names in where_on_load, before logic update
2026-01-24 11:17:41 -05:00
chenyu
cb69b7b2b2 comment out fold_where_closure (#14316) 2026-01-24 10:15:42 -05:00
wozeparrot
d74587f16d fa multi fix 2 (#14314) 2026-01-23 23:35:02 -08:00
chenyu
d9f0ad1d87 update return type for Tensor.tolist (#14313)
since sequence is incorrect since it can be list of list, use Any to avoid recursive type
2026-01-23 23:21:49 -05:00
qazal
807bc40931 assembly/amd: dsl and disasm cleanup (#14311)
* rdna4 inst helper

* remove dsl aliases
2026-01-24 11:36:12 +09:00
Christopher Milan
e782d44918 WEBGPU/NIR truncates ints (#14307)
* WEBGPU truncates ints

* nir has this bug too
2026-01-23 19:28:06 -05:00
nimlgen
26220a472e no core_id (#14265)
* no core_id

* kwargs

* est

* linters

* ugh

* revert this

* deps

* glb

* should work?

* nn

* line

* fx

* ym

* z

* d

* um?

* revert

* this one?

* first half

* um p2

* all?

* um

* cleaner

* um
2026-01-23 21:30:12 +03:00
chenyu
e65bc7a7c5 where closure folding (#14304) 2026-01-23 10:55:13 -05:00
chenyu
d5a3b02a9c clean up xpow (#14295)
mostly for `ret * (base < 0).where(adj, ret.const_like(1))` -> `(base < 0).where(neg_base, ret)`, since it's good for NAN neg_base but not generic
2026-01-23 10:19:47 -05:00
qazal
b913c910c5 assembly/amd: rdna4 passing test_roundtrip (#14300)
* test_roundtrip on different archs

* failing tests

* take RDNA4 xml changes from the emu branch

* work

* min diff to disasm flat

* test_add passes, rdna4 first

* correct vgpr field for the multi dword store stuff

* amdllvm

* recompile in roundtrip, get sources from emulator

* amdllvm, 2

* clean clean

* note, don't rely on that os.environ

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2026-01-23 21:33:53 +09:00
qazal
f3b0e42863 remove extra sqtt pickles in gfx1200 (#14302) 2026-01-23 20:13:48 +09:00
George Hotz
d116312b1a get cdna sqtt working (#14301)
* get cdna sqtt working

* cnd aprser

* wavestart/waveend

* names

* cdna

* test that
2026-01-23 18:46:15 +08:00
George Hotz
a5c4fa39d1 RDNA4 support in SQTT (#14299)
* table test

* cleanups

* dead file

* delta short

* tests

* delta test

* work

* l4 tests pass

* l0

* cnda

* print

* reverT

* wave failure

* wave failure

* test

* encs

* no l0 crap

* L4

* rdna4 sqtt

* notes

* linter
2026-01-23 16:16:45 +08:00
wozeparrot
963c59ebdb fix: pull fixes from gradacc branch (#14296) 2026-01-22 23:07:54 -08:00
Christopher Milan
68668b8f28 fix WEBGPU NEG (#14298)
* fix WEBGPU NEG

* add test

* parenthesize
2026-01-23 01:44:52 -05:00
qazal
3b8a7bb8c9 use existing roc.py infra for sqtt tests (#14297)
* add pc, per kernel tracing

* work

* remove those imports

* min diff
2026-01-23 14:07:11 +09:00
chenyu
5f32f7a06b fix winograd padding order (#14294) 2026-01-22 23:00:14 -05:00
George Hotz
52b989c6c8 don't place consts early + fixes from anthropic challenge (#14286)
* don't place consts early

* add anthropic challenge

* with ref

* do we still have to devectorize bools?

* tests pass

* just WHERE

* fine, revert that

* fine, revert

* only index

* z3 validator doesn't support vectorized

* Revert "z3 validator doesn't support vectorized"

This reverts commit 1b7930ecb3.

* z3 not for vec

* no spec

* VLIWRenderer

* loop unrolling

* better comments

* cleanups

* skip cast

* renderer

* cleanups

* prints

* no hack

* hacks

* bump to 11

* reg warning

* lil clean

* cleaner renderer
2026-01-23 10:48:39 +09:00
chenyu
0903782bc0 remove few dead or unneeded codes [pr] (#14275) 2026-01-22 20:05:43 -05:00
chenyu
3eb5cd7d32 stronger test_rand_is_lazy (#14293) 2026-01-22 18:58:53 -05:00
chenyu
c15b6e6709 update test_randn_finite skipped device (#14292) 2026-01-22 18:26:02 -05:00
chenyu
073c6a81b5 raise if Tensor._buffer is called during jit (#14114)
* raise if Tensor._buffer is called during jit

* cleaner
2026-01-22 17:30:18 -05:00
nimlgen
8cd22df2dd amd: alive wgps (#14149)
* amd: disabled wgps

* l

* wgp

* uoops

* mockgpu

* drm

* ad this

* fi

* reg
2026-01-23 00:08:45 +03:00
chenyu
a738c4bb22 test symbolic view broken with jit (#14290) 2026-01-22 13:44:47 -05:00
chenyu
f22fa6a5be test rand is lazy (#14289) 2026-01-22 13:07:55 -05:00
chenyu
1726b884f2 update test_jit_v_nojit_random_regen (#14288)
current behavior is that jit and non-jit consume random seed differently, still the random values are different
2026-01-22 12:21:47 -05:00
chenyu
fbed36fa15 jit graph handle input==output aliasing (#14287)
a position that wasn't an input during capture should never become an input during execution, but graph cannot tell this by jit_cache and input_buffers only
2026-01-22 11:37:41 -05:00
chenyu
8bb61c2490 stronger test_graph_input_output_aliasing (#14282)
* stronger test_graph_input_output_aliasing

* comfirmed failure
2026-01-22 09:59:34 -05:00
qazal
d7afa02085 clean up the extra/sqtt directory (#14284)
* remove legacy test_timing stuff

* remove legacy test_pmc, update active_sqtt_parse
2026-01-22 19:10:59 +09:00
qazal
dff5f361b0 support rendering assembly kernels on the NULL backend (#14283)
* assembly custom kernels in DEV=NULL, use renderer arch

* update mmapeak

* llvm
2026-01-22 15:49:07 +09:00
qazal
dfefeddeed add tflops to cdna gemm custom kernel (#14281) 2026-01-22 12:48:28 +09:00
qazal
18f408a35a custom assembly kernel with variable tests (#14280)
* custom assembly kernel with variable tests

* different threads

* sink

* zeros like / flatten
2026-01-22 11:34:17 +09:00
chenyu
4de107b764 jit graph bug when input is output (#14278)
* jit graph bug when input is output

wrong result in llm

* not just metal
2026-01-21 18:49:52 -05:00
wozeparrot
76a9242a66 fa: merge kv bwd into one kernel (#14277) 2026-01-21 15:24:41 -08:00
chenyu
6279ae4a94 remove llm generate always reset start_pos (#14276)
* remove llm generate always reset start_pos

by itself seems like a bug, also added a test to repro forward_jit.reset() issue

* issue is jit graph, so revert that test
2026-01-21 16:54:30 -05:00
nimlgen
da1fedc3c8 working ioctls (#14272) 2026-01-21 20:29:04 +03:00
chenyu
574d171fa6 fix onnx Pad constant_value=None (#14271)
also removed a dead branch in _resolve_pool_pads
2026-01-21 11:51:34 -05:00
chenyu
a18d34be1e simpler split_store outer range check [pr] (#14273)
also fixed comment
2026-01-21 11:51:14 -05:00
chenyu
e64111ad08 update all_same [pr] (#14270)
add type annotation and unit test
2026-01-21 11:26:15 -05:00
chenyu
9ad3c865ac fix bug in logsumexp keepdim=True (#14268) 2026-01-21 09:49:55 -05:00
George Hotz
41d00a046d add device to local, fix PCONTIG=2 (#14266)
* add device to local, fix PCONTIG=2

* regression test

* remove the device when we render

* viz slowness

* no long
2026-01-21 22:12:18 +09:00
wozeparrot
c1d14ea832 llama8b train fixes (#14264) 2026-01-20 20:34:47 -08:00
qazal
549dbabfcb move ALLOW_DEVICE_USAGE=0 to get_program [pr] (#14263) 2026-01-21 12:56:05 +09:00
qazal
78a28227c6 assembly/amd: cdna4 mfma support (#14206) 2026-01-21 09:12:05 +09:00