George Hotz
41d00a046d
add device to local, fix PCONTIG=2 ( #14266 )
...
* add device to local, fix PCONTIG=2
* regression test
* remove the device when we render
* viz slowness
* no long
2026-01-21 22:12:18 +09:00
nimlgen
22af7132cd
fix test_dev_jitter_matrix ( #14255 )
2026-01-20 20:07:51 +03:00
C T
26f8b12e01
Whisper audio helpers (mel filters in tinygrad) ( #13478 )
...
* add whisper audio helpers for stft/mel/resample
* cleanup
* add whisper stft test
* make only stft test explicitly depend on librosa
* extract sinc_window_kernel
* dehardcode device
* use same device argument
* simplify
* type annotate
* ruff format audio_helpers.py
* ruff format test_whisper.py
* add WHISPER_NEW_STFT
* rename
* undo ruff format changes
* use new stft and mel for whisper
* remove stft test that depends on librosa
* remove whitespace
* add Tensor.log10 with test\test_ops.py::TestOps::test_log10
* use Tensor.log10
* fix lint
* future: remove unused STFT class
* future: remove resample code since it isn't used (yet)
* match openai with pad_mode="reflect"
* pad_to
* future: cut resample leftovers
* cleanup
* add mel tests
* future: cut stft
* future: cut non-mel prep_audio changes
* reduce diff
* move audio_helpers.py to examples
* reduce whitespace
* fix imports
* reduce whitespace
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-01-20 10:50:02 -05:00
George Hotz
5e24643889
minor import speedups ( #14244 )
...
* minor import speedups
* server stuff in server places
* pre-commit
* fix
2026-01-20 15:05:36 +09:00
qazal
b1c5a242b7
Revert "move is_dtype_supported logic to renderer ( #14188 )" ( #14237 )
...
This reverts commit 161fee9a48 .
2026-01-20 12:19:14 +09:00
chenyu
9ea63d7d52
failed test case for onnx IF with jit ( #14235 )
...
silently fails now since onnx treats IF cond as a const
2026-01-19 18:10:05 -05:00
George Hotz
31bcbed6bb
AMD_DISABLE_SDMA for testing with -n12 ( #14216 )
2026-01-19 16:10:30 +09:00
Christopher Milan
161fee9a48
move is_dtype_supported logic to renderer ( #14188 )
...
* move is_dtype_supported logic to renderer
* fix CPU_COUNT
* mypy happy
* early import libclang too with llvm
* run with debug
* skip autogen tests if MTLCompiler or llvm is loaded
* run autogen tests separately in CI
* lint
2026-01-18 22:37:04 -05:00
chenyu
67d9712ef6
jit copy aliased output if it's read later ( #14210 )
2026-01-18 18:48:59 -05:00
chenyu
97333b1954
jit footguns test case on assign with same buffer outputs ( #14209 )
...
related https://github.com/tinygrad/tinygrad/issues/13364
2026-01-18 16:01:09 -05:00
chenyu
e7c2df9113
improve consecutive Tensor indexing ( #14208 )
...
* improve consecutive Tensor indexing
instead of O(idx_counts*src_dims), it can just be O(idx_counts)
* test correctness
2026-01-18 15:14:33 -05:00
chenyu
c7b8f6496f
remove dtypes.index_like and dtypes.fields [pr] ( #14207 )
...
barely used, so just use inline and DTYPES_DICT
2026-01-18 11:49:01 -05:00
chenyu
5e6a72c33f
new Onnx Gather ( #14187 )
...
instead of assuming const indices, check if it showed as a const
2026-01-16 22:24:07 -05:00
chenyu
ab244c7f81
onnx Gather should not assume indices to be const ( #14185 )
...
* onnx Gather should not assume indices to be const
added a failed test case
* just list
2026-01-16 20:55:00 -05:00
wozeparrot
a879b54234
tk: fa jit fix ( #14170 )
2026-01-16 16:38:45 -08:00
Christopher Milan
a021b84604
autogen: fix enum ( #14171 )
2026-01-16 01:30:11 -05:00
chenyu
14e9a71a41
move test_assign to unit ( #14165 )
...
scheduling these should not depend on device
2026-01-15 17:10:13 -05:00
Christopher Milan
0cb024a5bb
remove ctypes.Structure ( #13651 )
2026-01-15 05:06:22 -05:00
qazal
164bc678a6
scheduler: sched_cache bugfix for different Tensor.custom_kernel schedules ( #14161 )
...
* simplest failing test
* min fix
* same function reuses the cache
* SPEC=2 never worked for custom_kernel
2026-01-15 14:59:14 +09:00
qazal
b46da603fe
codegen/custom_kernel: do not attach KernelInfo to user program ( #14160 )
2026-01-15 14:01:48 +09:00
chenyu
add7da268f
multiple slice assign test ( #14157 )
...
GANing test cases
2026-01-14 21:08:03 -05:00
chenyu
1381daac06
many more failed assign tests ( #14153 )
...
assign is quite broken
2026-01-14 16:20:28 -05:00
chenyu
899a56446e
failed assign test cases with write before read ( #14148 )
...
slice assign write before read fails now. this is why kv cache needs a realize
2026-01-14 10:30:50 -05:00
chenyu
2a2c1eacf6
disable fast_idiv on metal ( #14137 )
...
there's a metal compiler bug which was the root cause that keccak needs a contigous hack
2026-01-13 21:40:40 -05:00
wozeparrot
a92778aa0c
tk: fa multi fix ( #14134 )
2026-01-13 17:22:15 -08:00
chenyu
fe00682502
clean up svd tests ( #14133 )
...
removed from test_ops and added to TestTorchBackend
2026-01-13 16:32:21 -05:00
chenyu
e610821c52
Tensor.cummin and Tensor.nonzero ( #14131 )
2026-01-13 15:09:56 -05:00
chenyu
176a934ddd
Tensor.diagonal support offset and dims ( #14130 )
2026-01-13 14:49:06 -05:00
qazal
79d00521f8
viz: fix cfg err when endpgm is in the middle of stream ( #14128 )
...
* kernel from beautiful_mnist
* minimal test
* correct way to do this
* rm that
2026-01-14 02:00:34 +09:00
qazal
fd10fd245a
viz: cfg tokenizer fix and unit tests ( #14121 )
...
* output Ops.BINARY
* failing test for the cfg
* dsl renamed to offset and sz
* add better asserts
* move the note
2026-01-13 15:08:55 +09:00
chenyu
05fcb57696
also return index in Tensor.cummax ( #14117 )
...
* also return index in Tensor.cummax
* fix
2026-01-12 22:42:10 -05:00
wozeparrot
7c967399a4
tk: add failing test for fa multidevice ( #14116 )
2026-01-12 19:11:09 -08:00
George Hotz
330a0b686e
assembly/amd: clean up dsl and make type verification strict ( #14102 )
...
* assembly/amd: start newdsl
* work
* newdsl upd
* Reg is p nice
* cleaner
* work
* getting clean
* all fields
* more BitFields
* redo the pdfs with dsl2 syntax
* no lit
* cleanups
* more defaults
* fix get and remove crap
* aliases
* ugly but kind of works
* NULL, not rawimm
* clean up defaults
* only dsl
* asm fixes
* lit fixup
* more lit
* cleanups
* olddsl
* single pcode dict
* emu sort of works
* trash test
* global is global
* types property
* reg mods
* fix a few tests
* remove monkey patch
* fixes
* less hacks in tests
* less hacks in tests
* 4 test failures
* hw tests all pass
* fix compare emulator
* fix some tests
* 3 more
* fix and shorten sqtt
* handwritten
* fix validation
* test corrections
* all types validate
* fix dsl2 tests
* fix bugs in disasm
* skips on cdna
* work
* repr with reg[]
* fix bitfield tests
* merge pcodes in dsl
* remove override
* disasm uses inst.types
* simpler
2026-01-13 08:52:16 +09:00
C T
a8c821f45e
add Tensor.log10 with test\test_ops.py::TestOps::test_log10 ( #14113 )
2026-01-12 13:45:47 -05:00
chenyu
6b0a9f5ee6
don't strip sink in to_uops_list [pr] ( #14111 )
2026-01-12 11:19:03 -05:00
chenyu
cad7feec02
more onnx ops ( #14104 )
...
HannWindow, HammingWindow, BlackmanWindow, Hardmax, LpNormalization
2026-01-12 09:11:13 -05:00
chenyu
9973a81356
add channels_last to QLinearGlobalAveragePool ( #14094 )
...
and other minor cleanups
2026-01-10 18:38:19 -05:00
chenyu
35c9701df0
update outdated tests and comments ( #14090 )
2026-01-10 01:00:48 -05:00
chenyu
92246ea731
update tests, WEBGPU=1 pytest . passes ( #14089 )
...
* update tests, `WEBGPU=1 pytest .` passes
* minor update
2026-01-10 00:03:02 -05:00
chenyu
c34c6d9468
fix wgsl packed_store can drop valid ( #14088 )
...
* fix wgsl packed_store can drop valid
* fix
2026-01-09 15:22:06 -05:00
chenyu
eacccc5ace
more disk assign tests ( #14087 )
...
covers more edge cases
2026-01-09 14:14:52 -05:00
chenyu
ed295e74dc
don't skip gguf test if ggml is not installed ( #14086 )
...
* don't skip gguf test if ggml is not installed
should just let it fail
* fix
2026-01-09 12:05:58 -05:00
chenyu
cff33c8d78
add some disk assign tests ( #14085 )
2026-01-09 11:50:59 -05:00
chenyu
74fa3c7d09
decomp pow for LVP ( #14084 )
...
test failed due to undefined behavior, so use decomp instead
2026-01-09 10:50:28 -05:00
b1tg
0fbc551622
train bert with fp8 ( #13874 )
...
* fp8 train
* clean
* lint
* test fix from #13439
* skip first/last layer
* rm __init__, restore unroll <=32 check
* tests
* clean test, remove unused
* multi-gpu test, clean quantize_to_fp8
* remove bert contiguous
* run script
* test: better check
* run script search
* add seed in bert data shuffle
* move script to mi350x folder
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-01-09 09:21:59 -05:00
chenyu
efcb32f6a9
unique const when requires_grad is set to True ( #14075 )
...
* unique const when requires_grad is set to True
* fix pyrender
2026-01-08 16:30:45 -05:00
chenyu
b34c637767
support bfloat16 for CL ( #14073 )
2026-01-08 14:14:29 -05:00
Garret Castro
16b652302e
skip bf16 test if not supported by device ( #14070 )
2026-01-08 13:37:24 -05:00
wozeparrot
027b935269
tk: fix grouped load store ( #14035 )
2026-01-07 22:38:02 -08:00
chenyu
3caa1e2c98
fix cast HALF with PYTHON backend ( #14058 )
2026-01-07 16:52:05 -05:00