11648 Commits

Author SHA1 Message Date
qazal
019e71f8ca lds bank count tests from pmc counters (#13667)
* lds bank count tests from pmc counters

* these tests run on the RDNA3 card too

* rename duration to cycles, other rename comment

* add SQ_LDS_IDX_ACTIVE to gfx9 defaults
2025-12-13 17:39:32 +08:00
qazal
a6dfd8a672 viz server cleanups (#13668)
* viz server cleanups

* comment
2025-12-13 17:27:53 +08:00
Christopher Milan
f6cc3b13b9 autogen: use wrapped CDLL with custom findlib (#13666)
* wrap CDLL with custom findlib

* lint

* regen

* fix

* mypy

* hardcode libc on macos

* fix frameworks

* fix webgpu win

* remove supports

* regen metal

* regen libclang

* regen

* simpler

* regen

* regen

* find nvrtc

* fix

* regen

* fix

* typo

* regen

* split

* rsplit one

* typo
2025-12-13 01:31:30 -05:00
George Hotz
55845f7de7 schedule: cache unbinds for consistent cache keys (#13664)
* schedule: cache unbinds for consistent cache keys

strip BIND values before computing cache key so different bound values
(e.g. KV cache positions) hit the same schedule cache entry.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* spec: allow single-src BIND for schedule cache key normalization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add lessons learned to CLAUDE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* more claude.md

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 17:27:42 -05:00
George Hotz
27845353a0 add CLAUDE.md 2025-12-12 16:50:11 -05:00
George Hotz
8c87a0bf8d Revert "schedule: cache unbinds for consistent cache keys (#13662)"
This reverts commit af86cae10c.
2025-12-12 16:49:50 -05:00
George Hotz
443b7fea80 Revert "add notes about jit to claude.md"
This reverts commit 429f82e6a9.
2025-12-12 16:49:48 -05:00
George Hotz
429f82e6a9 add notes about jit to claude.md 2025-12-12 16:48:23 -05:00
George Hotz
af86cae10c schedule: cache unbinds for consistent cache keys (#13662)
* schedule: cache unbinds for consistent cache keys

different bound variable values (e.g. kv cache positions) now produce
the same schedule cache key by unbinding BIND(DEFINE_VAR, CONST) before
computing the cache key and rebinding after lookup.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* schedule: cache unbinds for consistent cache keys

When scheduling, BIND(DEFINE_VAR, CONST) nodes are now unbound to
tagged DEFINE_VARs before computing the cache key. This ensures that
the same computation with different bound values (e.g., different
KV cache positions in LLM) gets the same cache key and reuses the
cached schedule.

The fix:
- pm_pre_sched_cache: replaces BIND with tagged DEFINE_VAR
- pm_post_sched_cache: restores tagged DEFINE_VAR back to original BIND
- pm_remove_rangeify_tags: excludes DEFINE_VAR to preserve tags through rangeify
- var_vals extracted from BINDs before cache key computation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* schedule: fix BIND handling and add CLAUDE.md

- Handle BIND to RANGE in create_schedule (not matched by CONST pattern)
- Assert all BINDs on same variable have same value
- Add CLAUDE.md codebase guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 16:40:10 -05:00
chenyu
fcaed1e1dd don't use empty in bert fake data (#13661)
somehow jit does not count empty as input
2025-12-12 15:59:50 -05:00
George Hotz
316da9f7ff llm: add created/model fields, non-streaming support, and tests (#13660)
* llm: add created/model fields, non-streaming support, and tests

- Add `created` timestamp and `model` fields to response (required by OpenAI spec)
- Add non-streaming mode support for /v1/chat/completions
- Add `send_data` helper to HTTPRequestHandler for responses with Content-Length
- Refactor viz/serve.py to use send_data
- Add integration tests using real OpenAI client

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* add openai to testing

* toml

* Remove 'openai' from dependencies

Removed 'openai' from the dependencies list.

* bump cache

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 14:50:36 -05:00
George Hotz
9604773e45 add model choosing support to llm (#13656) 2025-12-12 11:22:11 -05:00
nimlgen
e36385e570 am: support xgmi systems (#13659)
* am: support xgmi systems

* fake_am
2025-12-12 18:55:45 +03:00
nimlgen
b4796e2d32 amd: set queue prio to normal (#13658) 2025-12-12 18:25:41 +03:00
nimlgen
a1de7787bf am: xcc/inst support (#13657) 2025-12-12 17:40:42 +03:00
George Hotz
f0fa9bcd98 openai api for llm (#13648)
* openai api for llm

* responds to simple request

* schedule cache needs to unbind

* stream works

* share stream code

* 20k

* one print

* cid
2025-12-12 08:25:33 -05:00
qazal
93ad1f7732 viz: readable pmc print, share unpacker with tests (#13655)
* viz: readable pmc print, share unpacker with tests

* sections

* static analyzer

* rm that
2025-12-12 19:29:59 +08:00
Christopher Milan
760e508c3a autogen: no deep walk (#13654)
* no deep walk

* reset init

* delete walk

* remove print

* regen

* linkage spec

* cleanup
2025-12-12 01:04:35 -05:00
wozeparrot
8f60b8dd1e fix: cast on transpose (#13653) 2025-12-11 21:03:49 -08:00
Christopher Milan
950d8de00e automatically inline anonymous (#13652) 2025-12-12 00:02:44 -05:00
chenyu
01e9ad0d52 clean up bert next_data (#13650)
train iter was designed to never stop for both real and fake data
2025-12-11 22:56:28 -05:00
Jakob Sachs
ab2220b834 Handle missing bfloat16 natives on CPU architectures (#13553)
* CPU: fix compiler-rt libcall by adding intermediate casts for bfloat16

* fix lint

* remove old manual bypass of bf16 for CPU tests, and add diversion converstion from bf16 to/from fp16

---------

Co-authored-by: Jakob Sachs <jakobs99@purelymail.com>
2025-12-11 15:38:43 -05:00
nimlgen
cbae33003d ci: add usb4 (#13643)
* ci: add usb4

* debug=3

* undef

* revert
2025-12-11 19:41:41 +03:00
chenyu
03600aef1e failed test case when init jit with empty inputs (#13641)
not related to bert grad acc, but still seems to be a bug
2025-12-10 22:03:06 -05:00
nimlgen
51f3c9f615 am: use va_base as base (#13640) 2025-12-10 21:09:35 +03:00
chenyu
5034c6fb37 reenable FREE_INTERMEDIATE for bert (#13639)
* reenable FREE_INTERMEDIATE for bert

* comment
2025-12-10 12:08:09 -05:00
qazal
be6d538351 viz: add kernel walltime to pmc scoreboard (#13638)
* viz: add kernel walltime to pmc scoreboard

* fix typing

* tiny TracingKey refactor

* key on kernel name
2025-12-10 20:16:42 +08:00
qazal
1666c4aaab viz: fix counter names ordering (#13637) 2025-12-10 17:05:27 +08:00
qazal
c801bb7054 viz: show all kernel pmcs (#13635) 2025-12-10 07:16:02 +08:00
wozeparrot
4854a0c02c fix: getattr returns AttributeError not ImportError when missing (#13633) 2025-12-09 14:26:54 -08:00
chenyu
016a59cafa remove contiguous and use where in EmbeddingBert (#13632) 2025-12-09 15:49:21 -05:00
nimlgen
ddecba300f amd: use getattr for autogen (#13630)
* amd: use getattr for autogen

* fi
2025-12-09 20:36:26 +03:00
Nino Risteski
76d465dbc3 optim empty shard #13513 (#13598)
* optim empty shard

* remove tuple

* simplify

* lint

* lint2

* test

* remove original buffer unique id

* new rule

* reset shard

* update

* reset shard
2025-12-09 12:28:36 -05:00
ayanhan
47a170be2e test: enable cummax scalar IndexError test (#13625) 2025-12-09 12:25:56 -05:00
Christopher Milan
9eae9dc3be regen smu_v13 with stdint (#13631)
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-12-09 12:20:01 -05:00
nimlgen
7cd8852f60 autogen: do no return tuples (#13629) 2025-12-09 20:08:13 +03:00
nimlgen
9e484b5b1c hcq: check size is None, do not read the whole size for 0s (#13628) 2025-12-09 19:37:44 +03:00
nimlgen
1329033b8c am: fix hot-queue restarts, only dequeue (#13627) 2025-12-09 19:37:21 +03:00
nimlgen
b07839493d proclogs with xccs (#13626) 2025-12-09 16:46:08 +03:00
qazal
2c333818f4 simplify UOp stringifier [pr] (#13618)
* simplify UOp stringifier [pr]

* fix tuple
2025-12-09 05:06:16 +08:00
chenyu
2471b49e45 minor bert / llama change from grad acc branch (#13622)
* minor bert / llama change from grad acc branch

* revert those
2025-12-08 16:04:14 -05:00
Christopher Milan
cb3d756547 NAK compile-only test (#13621) 2025-12-08 15:53:46 -05:00
Christopher Milan
a4c3d48aa9 compile-only test for IR3 actually works (#13619) 2025-12-08 15:07:49 -05:00
Christopher Milan
a17077d1d9 skip test_double_assign in CI LVP (#13620) 2025-12-08 14:54:02 -05:00
Christopher Milan
1c16b6e082 Mesa: freedreno (#12746)
* ir3 init

* got a program

* 1 + 1 works

* use isa_disasm instead of shader_disasm

* wip

* matmul works

* works on py3.14

* fix const loading

* skip QCOM failing tests

* cleanup

* args actually work

* add compile-only tests

* fix typo and install tinymesa

* IR3 NULL backend

* (float32) images work

* autogen fix

* fix compile only test

* typo

* mypy happy

* compile-only uses py3.14

* bump mesa

* unify qcom disassembler

* float16 works

* disasm shows in viz

* save a line

* add real del

* variable workgroup sizes

* simplify diff

* bump line count

* properly set wgsz

* regen mesa

* no preamble

* bump lines
2025-12-08 14:02:08 -05:00
Douglas Nyberg
947c6eefc3 add Swish op (#13541)
* add Swish ONNX operator

* add Swish regression test

* remove trailing whitespace

* upgrade ONNX to 1.20, add excludes for unimplemented ops

* upgrade ONNX to 1.19, add Swish op

* upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op

* exclude attention_3d and attention_4d_gqa tests

* exclude attention fp16 tests

* exclude all attention tests

* retrigger CI

* retrigger CI - worker crash
2025-12-08 12:41:18 -05:00
nimlgen
dd8a1a10d4 amd: tiny cleanups (#13616) 2025-12-08 13:15:56 +03:00
qazal
2b07336c82 viz server cleanups (#13615)
* depths start at 0

* rename the api path
2025-12-08 17:44:43 +08:00
wozeparrot
89c4206e22 fix: typing (#13614) 2025-12-07 20:10:30 -08:00
qazal
572dfd5506 add static amd program info to viz (#13594)
* llvm-readelf

* amd_readelf + soft_err

* cleanup

* multiple metadata

* max wgp size, may be less
2025-12-08 04:08:14 +08:00