chenyu
17db43ab46
remove some contiguous call in frontend ( #14772 )
...
these should work without contiguous
2026-02-15 16:33:56 -05:00
nimlgen
26193cbf9a
nv: prof cpu_access for nvd only ( #14769 )
2026-02-15 21:42:04 +03:00
qazal
33b31d9cd6
tinykittens flash attention dtype fix, add CI ( #14770 )
...
* don't hardcdoe amd device
* add failing tests, ci too
* fix: fix for dtype mixin
* bump to rocm 7.1
---------
Co-authored-by: Woze Parrot <wozeparrot@gmail.com >
2026-02-16 01:15:11 +09:00
chenyu
352845d8cc
update cast to uint tests ( #14768 )
...
result in valid range should work, add intermediate cast to NIRRenderer since it's UB for [128, 256)
2026-02-15 10:55:13 -05:00
qazal
ceccc8eb86
unskip now passing multi tests [pr] ( #14759 )
2026-02-15 20:30:00 +09:00
George Hotz
713143a46a
more mixins pt 2 ( #14765 )
...
* more mixins pt 2
* lil cleanups
2026-02-15 17:57:04 +08:00
qazal
9da7f5e733
disable process replay for AMD emulator renderer [pr] ( #14766 )
...
* disable process replay for AMD emulator renderer [pr]
* line
* skip
2026-02-15 18:52:37 +09:00
George Hotz
9759fd6193
dtype mixin ( #14763 )
...
* dtype mixin
* dtype mixin methods
2026-02-15 16:03:48 +08:00
qazal
42b6bf0b7a
fix sdpa causal failing test on multi ( #14762 )
...
* simple failing test
* device is from xq
2026-02-15 16:54:33 +09:00
George Hotz
8091661df3
more more to mixins ( #14761 )
2026-02-15 15:18:37 +08:00
George Hotz
0e215c433d
remove hack from cast ( #14760 )
...
* remove hack from cast
* skip tests
* linters to 3.12, another skip
* fix rand
* m_
2026-02-15 13:56:38 +08:00
George Hotz
d176af6269
start outerworld call test, fix gate ( #14758 )
2026-02-15 12:35:01 +08:00
qazal
9bb6014900
keep existing profile trace in viz cli ( #14757 )
2026-02-15 13:16:32 +09:00
chenyu
ca68037f26
lazy basic setitem to unrealized Tensor ( #14756 )
...
undo the view and make it a mask, this fuses the setitem with any pending compute too.
one behavior change is that for target not backed by a buffer (const and arange), rangeify makes output contiguous under the hood.
this is stricter better than raise and ask user to call contiguous, as that would no longer be fuse-able.
2026-02-14 20:27:03 -05:00
George Hotz
32980c74d1
hotfix: skip flaky tests, looped many times on tinymac3
2026-02-15 07:46:29 +08:00
chenyu
902dc7c09c
fix test_numpy_parity_and_backward_2d ( #14755 )
...
test setup issue, test failed locally with `RUN_SLOW=1`
2026-02-14 17:59:00 -05:00
chenyu
043f5dbfa0
fix write-after-read tracking ( #14754 )
...
AFTER-AFTER was silently dropped, which breaks write-after-read
2026-02-14 17:23:05 -05:00
chenyu
d79c63a0ff
test_multi_step_assign_read_write_same_buffer ( #14752 )
...
pattern in LAMB that can be off subtly
2026-02-14 16:39:08 -05:00
chenyu
95f4c7e90a
fix limit_bufs to not limit index ( #14751 )
...
index is not real buffer. also made MAX_KERNEL_BUFFERS a ContextVar
2026-02-14 16:00:03 -05:00
chenyu
0ce4a55dad
clean up test_setitem_slice ( #14750 )
...
moved to test_setitem_schedule, and use contiguous zeros as scheduler handles empty differently now
2026-02-14 14:29:16 -05:00
chenyu
8f6772fd8c
more setitem kernel mem tests ( #14749 )
...
* more setitem kernel mem tests
test only the slice is accessed
* update
2026-02-14 11:01:03 -05:00
chenyu
446909fb7a
more setitem kernel tests ( #14748 )
...
check where realize happened
2026-02-14 09:57:46 -05:00
nimlgen
4ab51b55bd
stream pma decoder ( #14746 )
2026-02-14 17:40:18 +03:00
nimlgen
e1a18dadae
fix devices for copies ( #14747 )
...
* fix devices for copies
* add test
2026-02-14 17:39:41 +03:00
George Hotz
e35bd960e8
Revert "use zip_extract and tar_extract in torch load ( #14734 )" ( #14745 )
...
This reverts commit 9d9ef81608 .
2026-02-14 13:24:01 +08:00
Christopher Milan
eaa9506a00
disallow subnormals in emulated test_dtype ( #14744 )
2026-02-14 00:11:57 -05:00
Bautista Garcia
9d9ef81608
use zip_extract and tar_extract in torch load ( #14734 )
...
* faster zip_extract + usage in torch load
* clean zip in torch load
* working zipextract in torchload
* tar_extract in tar path
* faster tar path
* tests passing, cleanup needed
* faster tar with 1MB buffer
* comments
* unify storage_source with all paths
* use bufferedreader in zip path
* fix ruff
* clean
* removed unnecessary string conversion
2026-02-14 12:57:28 +08:00
qazal
c88bb075f0
hotfix: correct way to get renderer arch ( #14743 )
2026-02-14 12:38:20 +08:00
George Hotz
f9d2eca91a
clean up amd/elf.py ( #14741 )
2026-02-14 12:09:05 +08:00
qazal
6dc7ea58fd
make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4 ( #14742 )
...
* make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4
* no if CI, this is just the arch
2026-02-14 12:24:37 +09:00
George Hotz
e8bd432bf6
move amd emulator out of tree ( #14740 )
...
* move amd emulator out of tree
* move the readme too
2026-02-14 10:32:00 +08:00
chenyu
dca7819f76
more setitem into unrealized tests ( #14737 )
...
* more setitem into unrealized tests
into empty, const with alu, and arange
* typo
2026-02-13 20:28:51 -05:00
chenyu
9f607cf84f
disk setitem does not need realize either ( #14736 )
...
disk base is a COPY and is_realized is always False for now, disk assign is still eager
2026-02-13 12:57:58 -05:00
chenyu
8b205a007e
lazy setitem for realized target ( #14735 )
2026-02-13 12:20:14 -05:00
nimlgen
3bee6638e3
external_test_hive_reset ( #14729 )
...
* external_test_hive_reset
* add fault
2026-02-13 19:08:36 +03:00
nimlgen
7d88626068
nv: fix pma_bytes to be system memory ( #14733 )
2026-02-13 17:55:46 +03:00
George Hotz
c0fe78f73b
BUG: metadata is lost with partial assign ( #14732 )
2026-02-13 21:35:21 +08:00
qazal
d0543063dd
viz: wave color is locally scoped ( #14728 )
2026-02-13 18:22:20 +09:00
nimlgen
ba67425680
am: reset mi300 with pm4 ( #14727 )
2026-02-13 11:22:32 +03:00
George Hotz
c0de4f75b1
improve mmapeak, print names with sqtt ( #14726 )
2026-02-13 16:07:06 +08:00
George Hotz
5289b4e882
renderer/amd: add cdna emulator ( #14721 )
...
* renderer/amd: add cdna emulator
* fixes
* no predecode
* no early
* REMU_PATH
* delete that
* round
* Fix cache invalidation check in _compile_smem
2026-02-13 16:06:58 +08:00
Christopher Milan
08a555c875
skip test_expand_buffer_before_cast on WEBGPU metal ( #14724 )
2026-02-13 00:01:05 -05:00
Christopher Milan
7993f3a277
autogen: use snapshot.debian.org for linux src ( #14718 )
2026-02-12 23:36:38 -05:00
wozeparrot
0613c0ac0c
hipkittens fa forward ( #14692 )
2026-02-12 20:16:43 -08:00
chenyu
50cb40be88
clean up test/null/test_indexing.py ( #14720 )
2026-02-12 22:36:53 -05:00
qazal
5b624b5e93
viz: better error message for out of range timestamps ( #14722 )
...
* test_timestamp_out_of_range
* rel_ts helper
* linter
2026-02-13 12:13:40 +09:00
George Hotz
4088d686b2
remove llvm requirement from amd ( #14717 )
...
* remove llvm requirement from amd
* tests pass
* test
* sink kernarg_size
* move stuff
* amd_asm_matmul to new style
* default type
* fix tests, simpler
* cu mode is faster and simpler
* darken
2026-02-13 10:50:12 +08:00
chenyu
9e33a08adb
use more pad_to and shrink_to in tensor.py ( #14719 )
...
good wins
2026-02-12 20:10:57 -05:00
George Hotz
d3adb8428e
Revert "hotfix: skip test/amd in macpytest" ( #14704 )
...
* Revert "hotfix: skip test/amd in macpytest"
This reverts commit b7dade2adf .
* no llvm subprocess
* simpler
* sys.exec
* cleanup
* process safe
* diag
* arm ftz support
* 5 sec
* this one
2026-02-13 08:00:24 +08:00
Christopher Milan
d4bc5ab609
autogen: download linux sources ( #14714 )
2026-02-12 18:50:50 -05:00