Commit Graph

12034 Commits

Author SHA1 Message Date
nimlgen
ec2b6bbda8 hcq: update signal logic (#14531) 2026-02-04 19:32:56 +03:00
nimlgen
62786d488a am: mi3xx perf (#14529) 2026-02-04 19:32:43 +03:00
chenyu
d57d24c7d4 Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535)
it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data
2026-02-04 11:31:11 -05:00
chenyu
024f57ecf5 jit input_buffers cleanup [pr] (#14532) 2026-02-04 10:14:38 -05:00
chenyu
67f91e897b UOp.is_contiguous -> UOp.has_buffer_identity [pr] (#14530)
one more confusing buffer related method, but it's definitely not is_contiguous
2026-02-04 09:21:26 -05:00
George Hotz
fb9df1e031 pretty print binary (#14520) 2026-02-04 18:04:35 +08:00
Christopher Milan
8c3c026d86 decomp float16 to float32 (#14417)
* decomp float16 to float32

* denormals arent zero

* add test

* denormals are zero

* fix

* oops

* bitcast works

* fix LOADs

* test_dtype passing

* cleanup

* mypy

* debug print

* only emulate if EMULATED

* very ugly, but passes spec

* add test_dtype_alu tests

* Revert "very ugly, but passes spec"

This reverts commit fdc3999b65.

* bottom up decompositions

* that should have symbolic

* simplify a bit

* SPEC really works

* run with DEBUG

* debug=4

* rm debug
2026-02-04 01:37:47 -05:00
Christopher Milan
ecbce5269e PYTHONREMU properly supports S_PACK_LL_B32_B16 (#14527)
* PYTHONREMU properly supports S_PACK_LL_B32_B16

* default
2026-02-03 23:45:33 -05:00
wozeparrot
720c9597a9 feat: llama uses is_causal on sdpa during training (#14528) 2026-02-03 20:24:30 -08:00
chenyu
9c2fc118ef relax setitem target check (#14526)
old check was too conservative
2026-02-03 22:32:49 -05:00
qazal
d1bfbe9ce3 isolate slow llama gemm (#14525) 2026-02-04 12:20:10 +09:00
nimlgen
2f55005ad9 qcom: sync cpu cache when from_blob (#14518)
* um

* fx

* d

* x

* x

* x

* x

* f

* ren
2026-02-03 21:51:03 +03:00
chenyu
ee9d6a1f36 remove DEFINE_VAR in to_define_global [pr] (#14522)
not needed
2026-02-03 10:12:33 -05:00
Nino Risteski
af4c74bb41 delete extra cast (#14517) 2026-02-03 08:29:04 -05:00
chenyu
9d1e9e643e removed a duplicated remove_bufferize rule [pr] (#14519) 2026-02-03 08:28:07 -05:00
George Hotz
d59e6e7a37 move more tests to test/null, split some existing ones (#14512)
* move more tests to test/null, split some existing ones

* null work

* null work

* move more

* fixes

* move PIL

* PIL in CLIP

* don't move that
2026-02-03 20:20:20 +08:00
qazal
a98c53769a ASM_GEMM=1 runs the UOp gemm on non cdna (#14516)
* ASM_GEMM=1 runs the UOp gemm on non cdna

tests run on mac in 3 seconds

* min diff
2026-02-03 20:42:02 +09:00
qazal
5c1d21349e viz: profiler command line tool (#14515) 2026-02-03 19:51:25 +09:00
George Hotz
dd2de4f838 rename all DEFINE_GLOBAL to PARAM (#14511) 2026-02-03 15:09:38 +08:00
George Hotz
dc77b3318b move files that pass with NULL=1 to test/null (#14508)
* move files that pass with NULL=1 to test/null

* fix windows

* cpu 0

* bugfix + durations
2026-02-03 13:52:36 +08:00
George Hotz
888819ee09 call autodiff gradient (#14510) 2026-02-03 13:51:02 +08:00
wozeparrot
bbcd3d67a3 fa: faster (#14453) 2026-02-02 21:34:17 -08:00
Christopher Milan
e579613b90 IR3 has aux (#14509) 2026-02-02 23:46:41 -05:00
George Hotz
85c7b23160 add pytest -nauto to benchmark for mac (#14458)
* add pytest -nauto to benchmark

* 3 minute timeout

* 3 min

* setup env

* comment

* fresh db

* in the pyenv
2026-02-03 12:26:09 +08:00
Christopher Milan
a5d7eb37db IR3 works on versions earlier than 3.14 (#14507) 2026-02-02 23:10:19 -05:00
George Hotz
33c886cafa disable copyout on NULL backend by default (#14506)
* disable copyout on NULL backend

* gate it

* allow copyout on some tests
2026-02-03 11:57:47 +08:00
chenyu
3c5845e8a5 remove cut_store_range (#14505)
special scheduling for CPU
2026-02-02 21:58:36 -05:00
chenyu
4f2e7aed24 fix multiple REDUCE on same RANGE (#14504)
each RANGE maps to one END, but reduce_to_acc is local and would not know this
2026-02-02 20:42:09 -05:00
chenyu
93c41a78fa clean up NOOP [pr] (#14503)
should not be used as a COPY, started with removing from ALWAYS_RUN_OPS
2026-02-02 19:46:45 -05:00
chenyu
66d2b02f11 delete files that depends on extra.optimization.helpers (#14499) 2026-02-02 13:33:33 -05:00
George Hotz
ec0398fceb test amd gpu crashes (#14459)
* test amd gpu crashes

* cleanup

* less sketch tests
2026-02-02 18:57:47 +03:00
nimlgen
6e4238c016 amd: recovery (#14461)
* rec

* ?

* rv

* cleaner

* post merge

* not used

* um

* clnr

* x

* x

* d

* move
2026-02-02 18:57:35 +03:00
chenyu
61ca19ff24 after with empty src is self [pr] (#14496) 2026-02-02 10:19:05 -05:00
George Hotz
6e958dbfd4 assembly/amd: add RDNA4 support to emulator (#14341)
* start new rdna4

* work

* plus works

* more pass

* rdna4

* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp

* stale

* rev

* rr

* rdna4 emu tests

* cleanup

* cleanup

* simp

* works

* better factorizaion

* hacks

* fix mockgpu

* guard both

* cleaner

* gate

* bug fix and a few tests

* all test_tiny
2026-02-02 21:35:59 +08:00
chenyu
a908f447d5 remove disk special case in mstack_early_shrink [pr] (#14494) 2026-02-02 08:34:45 -05:00
qazal
965940dd00 sqtt: update examples after event field change (#14493)
* regen sqtt examples

* cdna

* rdna4

* packet counts for rdna3

* sqttmap work
2026-02-02 21:39:48 +09:00
George Hotz
965149a46d assembly/amd: add ds perm instructions (#14486)
* assembly/amd: add ds perm instructions

* NO SKIP

* fix preexisting RDNA3 issues

* pcode

* assert

* asserts

* unify

* simp

* good fix
2026-02-02 16:02:00 +08:00
qazal
1746d1f997 remove SPEC=0 context in custom_kernel tests, pyrender always skips it (#14489) 2026-02-02 16:32:01 +09:00
George Hotz
d4007f36e0 remove DEFINE_GLOBAL (it is PARAM now) (#14488) 2026-02-02 14:56:37 +08:00
qazal
6c487656f9 viz: kernel metadata from rodata entry (#14487) 2026-02-02 15:41:42 +09:00
Robbe Derks
d75a1b0d5a usbgpu: use BOT interface for patch.py (#13644)
* BOT usage

* cleanup

* fix lint

* fix ruff

* fix -7?
2026-02-02 11:54:46 +08:00
Christopher Milan
2931b52875 skip autogen if MTLCompiler is loaded (#14466) 2026-02-01 22:12:27 -05:00
George Hotz
9a32d6e090 add depth limit for SPEC=2 (#14485)
* make SPEC=2 work for everything

* that's a horrible fix

* add depth limit
2026-02-02 10:43:28 +08:00
George Hotz
368a692e1a make SPEC=2 work for everything (#14476)
* make SPEC=2 work for everything

* that's a horrible fix
2026-02-02 10:30:56 +08:00
chenyu
ea1f1d2b9d test_assign_to_bitcast_view (#14483)
currently disk allows assign same size dtype into a bitcasted view
2026-02-01 16:46:04 -05:00
chenyu
6deeccc192 fix RING with single dest (#14482) 2026-02-01 12:14:46 -05:00
chenyu
3ff390159b don't implicitly change dtype in assign (#14481)
broadcast shape is fine, but implicitly cast dtype is hard to find
2026-02-01 11:48:54 -05:00
imaolo
2111762a48 failed test case for RING output device (#14191)
* Add enable/disable scheduler cache ContextVar

* add allreduce ring and naive to() tests

* clearer test comparing native vs ring allreduce

* split tests, add helper

* removing trailing whitespace

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-01 11:48:43 -05:00
chenyu
02afae04f4 atol in test_call_gemm (#14480)
flaky
2026-02-01 11:24:58 -05:00
chenyu
5705398a1f assign cleanup [pr] (#14479)
share more code path between disk and non-disk. also raise RuntimeError instead of Assert for mismatches
2026-02-01 09:10:22 -05:00