Commit Graph

4909 Commits

Author SHA1 Message Date
qazal
bf2d9d138f viz: simplify amdgpu cfg (#14326)
* viz: replace llvm disasm with our disasm

* it starts with more code

* then it becomes less

* simpler, cdna disassembles with decimal simm16

* s_branch is upper case, add test

* simm16s and others
2026-01-25 15:21:45 +09:00
chenyu
cb69b7b2b2 comment out fold_where_closure (#14316) 2026-01-24 10:15:42 -05:00
wozeparrot
d74587f16d fa multi fix 2 (#14314) 2026-01-23 23:35:02 -08:00
Christopher Milan
e782d44918 WEBGPU/NIR truncates ints (#14307)
* WEBGPU truncates ints

* nir has this bug too
2026-01-23 19:28:06 -05:00
nimlgen
26220a472e no core_id (#14265)
* no core_id

* kwargs

* est

* linters

* ugh

* revert this

* deps

* glb

* should work?

* nn

* line

* fx

* ym

* z

* d

* um?

* revert

* this one?

* first half

* um p2

* all?

* um

* cleaner

* um
2026-01-23 21:30:12 +03:00
chenyu
e65bc7a7c5 where closure folding (#14304) 2026-01-23 10:55:13 -05:00
Christopher Milan
68668b8f28 fix WEBGPU NEG (#14298)
* fix WEBGPU NEG

* add test

* parenthesize
2026-01-23 01:44:52 -05:00
chenyu
5f32f7a06b fix winograd padding order (#14294) 2026-01-22 23:00:14 -05:00
chenyu
3eb5cd7d32 stronger test_rand_is_lazy (#14293) 2026-01-22 18:58:53 -05:00
chenyu
c15b6e6709 update test_randn_finite skipped device (#14292) 2026-01-22 18:26:02 -05:00
chenyu
073c6a81b5 raise if Tensor._buffer is called during jit (#14114)
* raise if Tensor._buffer is called during jit

* cleaner
2026-01-22 17:30:18 -05:00
nimlgen
8cd22df2dd amd: alive wgps (#14149)
* amd: disabled wgps

* l

* wgp

* uoops

* mockgpu

* drm

* ad this

* fi

* reg
2026-01-23 00:08:45 +03:00
chenyu
a738c4bb22 test symbolic view broken with jit (#14290) 2026-01-22 13:44:47 -05:00
chenyu
f22fa6a5be test rand is lazy (#14289) 2026-01-22 13:07:55 -05:00
chenyu
1726b884f2 update test_jit_v_nojit_random_regen (#14288)
current behavior is that jit and non-jit consume random seed differently, still the random values are different
2026-01-22 12:21:47 -05:00
chenyu
fbed36fa15 jit graph handle input==output aliasing (#14287)
a position that wasn't an input during capture should never become an input during execution, but graph cannot tell this by jit_cache and input_buffers only
2026-01-22 11:37:41 -05:00
chenyu
8bb61c2490 stronger test_graph_input_output_aliasing (#14282)
* stronger test_graph_input_output_aliasing

* comfirmed failure
2026-01-22 09:59:34 -05:00
chenyu
4de107b764 jit graph bug when input is output (#14278)
* jit graph bug when input is output

wrong result in llm

* not just metal
2026-01-21 18:49:52 -05:00
chenyu
6279ae4a94 remove llm generate always reset start_pos (#14276)
* remove llm generate always reset start_pos

by itself seems like a bug, also added a test to repro forward_jit.reset() issue

* issue is jit graph, so revert that test
2026-01-21 16:54:30 -05:00
chenyu
574d171fa6 fix onnx Pad constant_value=None (#14271)
also removed a dead branch in _resolve_pool_pads
2026-01-21 11:51:34 -05:00
chenyu
e64111ad08 update all_same [pr] (#14270)
add type annotation and unit test
2026-01-21 11:26:15 -05:00
chenyu
9ad3c865ac fix bug in logsumexp keepdim=True (#14268) 2026-01-21 09:49:55 -05:00
George Hotz
41d00a046d add device to local, fix PCONTIG=2 (#14266)
* add device to local, fix PCONTIG=2

* regression test

* remove the device when we render

* viz slowness

* no long
2026-01-21 22:12:18 +09:00
nimlgen
22af7132cd fix test_dev_jitter_matrix (#14255) 2026-01-20 20:07:51 +03:00
C T
26f8b12e01 Whisper audio helpers (mel filters in tinygrad) (#13478)
* add whisper audio helpers for stft/mel/resample

* cleanup

* add whisper stft test

* make only stft test explicitly depend on librosa

* extract sinc_window_kernel

* dehardcode device

* use same device argument

* simplify

* type annotate

* ruff format audio_helpers.py

* ruff format test_whisper.py

* add WHISPER_NEW_STFT

* rename

* undo ruff format changes

* use new stft and mel for whisper

* remove stft test that depends on librosa

* remove whitespace

* add Tensor.log10 with test\test_ops.py::TestOps::test_log10

* use Tensor.log10

* fix lint

* future: remove unused STFT class

* future: remove resample code since it isn't used (yet)

* match openai with pad_mode="reflect"

* pad_to

* future: cut resample leftovers

* cleanup

* add mel tests

* future: cut stft

* future: cut non-mel prep_audio changes

* reduce diff

* move audio_helpers.py to examples

* reduce whitespace

* fix imports

* reduce whitespace

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-20 10:50:02 -05:00
George Hotz
5e24643889 minor import speedups (#14244)
* minor import speedups

* server stuff in server places

* pre-commit

* fix
2026-01-20 15:05:36 +09:00
qazal
b1c5a242b7 Revert "move is_dtype_supported logic to renderer (#14188)" (#14237)
This reverts commit 161fee9a48.
2026-01-20 12:19:14 +09:00
chenyu
9ea63d7d52 failed test case for onnx IF with jit (#14235)
silently fails now since onnx treats IF cond as a const
2026-01-19 18:10:05 -05:00
George Hotz
31bcbed6bb AMD_DISABLE_SDMA for testing with -n12 (#14216) 2026-01-19 16:10:30 +09:00
Christopher Milan
161fee9a48 move is_dtype_supported logic to renderer (#14188)
* move is_dtype_supported logic to renderer

* fix CPU_COUNT

* mypy happy

* early import libclang too with llvm

* run with debug

* skip autogen tests if MTLCompiler or llvm is loaded

* run autogen tests separately in CI

* lint
2026-01-18 22:37:04 -05:00
chenyu
67d9712ef6 jit copy aliased output if it's read later (#14210) 2026-01-18 18:48:59 -05:00
chenyu
97333b1954 jit footguns test case on assign with same buffer outputs (#14209)
related https://github.com/tinygrad/tinygrad/issues/13364
2026-01-18 16:01:09 -05:00
chenyu
e7c2df9113 improve consecutive Tensor indexing (#14208)
* improve consecutive Tensor indexing

instead of O(idx_counts*src_dims), it can just be O(idx_counts)

* test correctness
2026-01-18 15:14:33 -05:00
chenyu
c7b8f6496f remove dtypes.index_like and dtypes.fields [pr] (#14207)
barely used, so just use inline and DTYPES_DICT
2026-01-18 11:49:01 -05:00
chenyu
5e6a72c33f new Onnx Gather (#14187)
instead of assuming const indices, check if it showed as a const
2026-01-16 22:24:07 -05:00
chenyu
ab244c7f81 onnx Gather should not assume indices to be const (#14185)
* onnx Gather should not assume indices to be const

added a failed test case

* just list
2026-01-16 20:55:00 -05:00
wozeparrot
a879b54234 tk: fa jit fix (#14170) 2026-01-16 16:38:45 -08:00
Christopher Milan
a021b84604 autogen: fix enum (#14171) 2026-01-16 01:30:11 -05:00
chenyu
14e9a71a41 move test_assign to unit (#14165)
scheduling these should not depend on device
2026-01-15 17:10:13 -05:00
Christopher Milan
0cb024a5bb remove ctypes.Structure (#13651) 2026-01-15 05:06:22 -05:00
qazal
164bc678a6 scheduler: sched_cache bugfix for different Tensor.custom_kernel schedules (#14161)
* simplest failing test

* min fix

* same function reuses the cache

* SPEC=2 never worked for custom_kernel
2026-01-15 14:59:14 +09:00
qazal
b46da603fe codegen/custom_kernel: do not attach KernelInfo to user program (#14160) 2026-01-15 14:01:48 +09:00
chenyu
add7da268f multiple slice assign test (#14157)
GANing test cases
2026-01-14 21:08:03 -05:00
chenyu
1381daac06 many more failed assign tests (#14153)
assign is quite broken
2026-01-14 16:20:28 -05:00
chenyu
899a56446e failed assign test cases with write before read (#14148)
slice assign write before read fails now. this is why kv cache needs a realize
2026-01-14 10:30:50 -05:00
chenyu
2a2c1eacf6 disable fast_idiv on metal (#14137)
there's a metal compiler bug which was the root cause that keccak needs a contigous hack
2026-01-13 21:40:40 -05:00
wozeparrot
a92778aa0c tk: fa multi fix (#14134) 2026-01-13 17:22:15 -08:00
chenyu
fe00682502 clean up svd tests (#14133)
removed from test_ops and added to TestTorchBackend
2026-01-13 16:32:21 -05:00
chenyu
e610821c52 Tensor.cummin and Tensor.nonzero (#14131) 2026-01-13 15:09:56 -05:00
chenyu
176a934ddd Tensor.diagonal support offset and dims (#14130) 2026-01-13 14:49:06 -05:00