Commit Graph

11849 Commits

Author SHA1 Message Date
chenyu
cb69b7b2b2 comment out fold_where_closure (#14316) 2026-01-24 10:15:42 -05:00
wozeparrot
d74587f16d fa multi fix 2 (#14314) 2026-01-23 23:35:02 -08:00
chenyu
d9f0ad1d87 update return type for Tensor.tolist (#14313)
since sequence is incorrect since it can be list of list, use Any to avoid recursive type
2026-01-23 23:21:49 -05:00
qazal
807bc40931 assembly/amd: dsl and disasm cleanup (#14311)
* rdna4 inst helper

* remove dsl aliases
2026-01-24 11:36:12 +09:00
Christopher Milan
e782d44918 WEBGPU/NIR truncates ints (#14307)
* WEBGPU truncates ints

* nir has this bug too
2026-01-23 19:28:06 -05:00
nimlgen
26220a472e no core_id (#14265)
* no core_id

* kwargs

* est

* linters

* ugh

* revert this

* deps

* glb

* should work?

* nn

* line

* fx

* ym

* z

* d

* um?

* revert

* this one?

* first half

* um p2

* all?

* um

* cleaner

* um
2026-01-23 21:30:12 +03:00
chenyu
e65bc7a7c5 where closure folding (#14304) 2026-01-23 10:55:13 -05:00
chenyu
d5a3b02a9c clean up xpow (#14295)
mostly for `ret * (base < 0).where(adj, ret.const_like(1))` -> `(base < 0).where(neg_base, ret)`, since it's good for NAN neg_base but not generic
2026-01-23 10:19:47 -05:00
qazal
b913c910c5 assembly/amd: rdna4 passing test_roundtrip (#14300)
* test_roundtrip on different archs

* failing tests

* take RDNA4 xml changes from the emu branch

* work

* min diff to disasm flat

* test_add passes, rdna4 first

* correct vgpr field for the multi dword store stuff

* amdllvm

* recompile in roundtrip, get sources from emulator

* amdllvm, 2

* clean clean

* note, don't rely on that os.environ

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2026-01-23 21:33:53 +09:00
qazal
f3b0e42863 remove extra sqtt pickles in gfx1200 (#14302) 2026-01-23 20:13:48 +09:00
George Hotz
d116312b1a get cdna sqtt working (#14301)
* get cdna sqtt working

* cnd aprser

* wavestart/waveend

* names

* cdna

* test that
2026-01-23 18:46:15 +08:00
George Hotz
a5c4fa39d1 RDNA4 support in SQTT (#14299)
* table test

* cleanups

* dead file

* delta short

* tests

* delta test

* work

* l4 tests pass

* l0

* cnda

* print

* reverT

* wave failure

* wave failure

* test

* encs

* no l0 crap

* L4

* rdna4 sqtt

* notes

* linter
2026-01-23 16:16:45 +08:00
wozeparrot
963c59ebdb fix: pull fixes from gradacc branch (#14296) 2026-01-22 23:07:54 -08:00
Christopher Milan
68668b8f28 fix WEBGPU NEG (#14298)
* fix WEBGPU NEG

* add test

* parenthesize
2026-01-23 01:44:52 -05:00
qazal
3b8a7bb8c9 use existing roc.py infra for sqtt tests (#14297)
* add pc, per kernel tracing

* work

* remove those imports

* min diff
2026-01-23 14:07:11 +09:00
chenyu
5f32f7a06b fix winograd padding order (#14294) 2026-01-22 23:00:14 -05:00
George Hotz
52b989c6c8 don't place consts early + fixes from anthropic challenge (#14286)
* don't place consts early

* add anthropic challenge

* with ref

* do we still have to devectorize bools?

* tests pass

* just WHERE

* fine, revert that

* fine, revert

* only index

* z3 validator doesn't support vectorized

* Revert "z3 validator doesn't support vectorized"

This reverts commit 1b7930ecb3.

* z3 not for vec

* no spec

* VLIWRenderer

* loop unrolling

* better comments

* cleanups

* skip cast

* renderer

* cleanups

* prints

* no hack

* hacks

* bump to 11

* reg warning

* lil clean

* cleaner renderer
2026-01-23 10:48:39 +09:00
chenyu
0903782bc0 remove few dead or unneeded codes [pr] (#14275) 2026-01-22 20:05:43 -05:00
chenyu
3eb5cd7d32 stronger test_rand_is_lazy (#14293) 2026-01-22 18:58:53 -05:00
chenyu
c15b6e6709 update test_randn_finite skipped device (#14292) 2026-01-22 18:26:02 -05:00
chenyu
073c6a81b5 raise if Tensor._buffer is called during jit (#14114)
* raise if Tensor._buffer is called during jit

* cleaner
2026-01-22 17:30:18 -05:00
nimlgen
8cd22df2dd amd: alive wgps (#14149)
* amd: disabled wgps

* l

* wgp

* uoops

* mockgpu

* drm

* ad this

* fi

* reg
2026-01-23 00:08:45 +03:00
chenyu
a738c4bb22 test symbolic view broken with jit (#14290) 2026-01-22 13:44:47 -05:00
chenyu
f22fa6a5be test rand is lazy (#14289) 2026-01-22 13:07:55 -05:00
chenyu
1726b884f2 update test_jit_v_nojit_random_regen (#14288)
current behavior is that jit and non-jit consume random seed differently, still the random values are different
2026-01-22 12:21:47 -05:00
chenyu
fbed36fa15 jit graph handle input==output aliasing (#14287)
a position that wasn't an input during capture should never become an input during execution, but graph cannot tell this by jit_cache and input_buffers only
2026-01-22 11:37:41 -05:00
chenyu
8bb61c2490 stronger test_graph_input_output_aliasing (#14282)
* stronger test_graph_input_output_aliasing

* comfirmed failure
2026-01-22 09:59:34 -05:00
qazal
d7afa02085 clean up the extra/sqtt directory (#14284)
* remove legacy test_timing stuff

* remove legacy test_pmc, update active_sqtt_parse
2026-01-22 19:10:59 +09:00
qazal
dff5f361b0 support rendering assembly kernels on the NULL backend (#14283)
* assembly custom kernels in DEV=NULL, use renderer arch

* update mmapeak

* llvm
2026-01-22 15:49:07 +09:00
qazal
dfefeddeed add tflops to cdna gemm custom kernel (#14281) 2026-01-22 12:48:28 +09:00
qazal
18f408a35a custom assembly kernel with variable tests (#14280)
* custom assembly kernel with variable tests

* different threads

* sink

* zeros like / flatten
2026-01-22 11:34:17 +09:00
chenyu
4de107b764 jit graph bug when input is output (#14278)
* jit graph bug when input is output

wrong result in llm

* not just metal
2026-01-21 18:49:52 -05:00
wozeparrot
76a9242a66 fa: merge kv bwd into one kernel (#14277) 2026-01-21 15:24:41 -08:00
chenyu
6279ae4a94 remove llm generate always reset start_pos (#14276)
* remove llm generate always reset start_pos

by itself seems like a bug, also added a test to repro forward_jit.reset() issue

* issue is jit graph, so revert that test
2026-01-21 16:54:30 -05:00
nimlgen
da1fedc3c8 working ioctls (#14272) 2026-01-21 20:29:04 +03:00
chenyu
574d171fa6 fix onnx Pad constant_value=None (#14271)
also removed a dead branch in _resolve_pool_pads
2026-01-21 11:51:34 -05:00
chenyu
a18d34be1e simpler split_store outer range check [pr] (#14273)
also fixed comment
2026-01-21 11:51:14 -05:00
chenyu
e64111ad08 update all_same [pr] (#14270)
add type annotation and unit test
2026-01-21 11:26:15 -05:00
chenyu
9ad3c865ac fix bug in logsumexp keepdim=True (#14268) 2026-01-21 09:49:55 -05:00
George Hotz
41d00a046d add device to local, fix PCONTIG=2 (#14266)
* add device to local, fix PCONTIG=2

* regression test

* remove the device when we render

* viz slowness

* no long
2026-01-21 22:12:18 +09:00
wozeparrot
c1d14ea832 llama8b train fixes (#14264) 2026-01-20 20:34:47 -08:00
qazal
549dbabfcb move ALLOW_DEVICE_USAGE=0 to get_program [pr] (#14263) 2026-01-21 12:56:05 +09:00
qazal
78a28227c6 assembly/amd: cdna4 mfma support (#14206) 2026-01-21 09:12:05 +09:00
George Hotz
1baefed530 assembly/amd: add hw tests from ucode branch (#14259)
* assembly/amd: add hw tests from ucode branch

* fix is per lane
2026-01-21 08:53:54 +09:00
wozeparrot
ba90e1b52e feat: script to run llama8b training (#14239) 2026-01-20 12:44:06 -08:00
Christopher Milan
daf9414bff fix nullptr arg to CUDA_KERNEL_NODE_PARAMS_v1 (#14256)
* fix nullptr arg to CUDA_KERNEL_NODE_PARAMS_v1

* ruff
2026-01-20 12:30:07 -05:00
chenyu
e04767e39e run pre-commit in ci (#14253)
* run pre-commit in ci

prevents pre-commit regression

* IGNORE_OOB=1

* pytest

* unit test

* split
2026-01-20 12:24:33 -05:00
nimlgen
22af7132cd fix test_dev_jitter_matrix (#14255) 2026-01-20 20:07:51 +03:00
Robbe Derks
c7fbd177d4 USBGPU: debug script for comma chestnut (#14252)
* initial debug script

* improvements
2026-01-20 18:52:25 +03:00
C T
26f8b12e01 Whisper audio helpers (mel filters in tinygrad) (#13478)
* add whisper audio helpers for stft/mel/resample

* cleanup

* add whisper stft test

* make only stft test explicitly depend on librosa

* extract sinc_window_kernel

* dehardcode device

* use same device argument

* simplify

* type annotate

* ruff format audio_helpers.py

* ruff format test_whisper.py

* add WHISPER_NEW_STFT

* rename

* undo ruff format changes

* use new stft and mel for whisper

* remove stft test that depends on librosa

* remove whitespace

* add Tensor.log10 with test\test_ops.py::TestOps::test_log10

* use Tensor.log10

* fix lint

* future: remove unused STFT class

* future: remove resample code since it isn't used (yet)

* match openai with pad_mode="reflect"

* pad_to

* future: cut resample leftovers

* cleanup

* add mel tests

* future: cut stft

* future: cut non-mel prep_audio changes

* reduce diff

* move audio_helpers.py to examples

* reduce whitespace

* fix imports

* reduce whitespace

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-20 10:50:02 -05:00