Ananta Ranganathan
8cf902ce0d
check if assert equal works in ci
2026-02-17 20:51:10 -08:00
Ananta Ranganathan
b226626c85
double e2m1 values for mxfp4
2026-02-17 20:47:17 -08:00
George Hotz
d5636fba90
assign after copy shouldn't contig ( #14847 )
...
* assign after copy shouldn't contig
* fix assign copy
2026-02-18 12:23:49 +08:00
George Hotz
ab55e8c6b9
assign should be used as output buffer ( #14845 )
...
* assign should be used as buffer
* late removed
* the fix
* better fix
* backward slice
2026-02-18 09:37:46 +08:00
chenyu
e3c120c8e1
exclude 100 in test_assign_add ( #14846 )
...
this can crash, not sure why. skip 100 to see if it's better
2026-02-17 19:12:47 -05:00
Christopher Milan
7641ed61af
remove doublecast in IMAGE=1 ( #14839 )
2026-02-17 18:22:14 -05:00
Christopher Milan
5b11519d5e
LLVM actually supports ops ( #14843 )
...
LLVM should support eg, SHL/SHR, but this was never actually rendered
2026-02-17 18:21:33 -05:00
wozeparrot
95e97ec341
seperate llama optim ( #14810 )
2026-02-17 13:02:35 -08:00
chenyu
72cf603805
removed if self.buffer.is_allocated() in realized ( #14836 )
...
automatically fixes is_realized issue for empty
2026-02-17 15:35:56 -05:00
chenyu
aec8a6c85b
Revert "one run_schedule for assign realize ( #14835 )" ( #14837 )
...
This reverts commit df7c37f611 .
2026-02-17 14:34:26 -05:00
chenyu
df7c37f611
one run_schedule for assign realize ( #14835 )
...
concat schedules. separate out the execution part
2026-02-17 14:01:55 -05:00
chenyu
61867c2f35
TestRealizeIsRealized ( #14834 )
...
test after calling .realize(), uop.is_realized is True. currently not working for empty (thus disk tensor), and const
2026-02-17 13:30:35 -05:00
chenyu
f147791105
update test to reset and test kernel_count directly ( #14832 )
2026-02-17 11:48:46 -05:00
chenyu
9d4937ab5e
remove assign test @unittest.skip("this test is crashing!") ( #14831 )
2026-02-17 10:30:58 -05:00
nimlgen
dda5ccf63b
hcq: fix usb<->cpu mappings ( #14827 )
...
* hcq: fix usb<->cpu mappings
* non cpu
* um
2026-02-17 18:04:18 +03:00
nimlgen
801677cf12
am: GCVM_L2_PROTECTION_FAULT_STATUS prints device ( #14830 )
2026-02-17 18:03:52 +03:00
chenyu
f07898c68a
move assign chain fix to rangeify ( #14829 )
2026-02-17 09:40:34 -05:00
nimlgen
a2586e4c70
nv: move reset earlier ( #14824 )
2026-02-17 17:25:49 +03:00
chenyu
f2f039cc0f
fix chained full-buffer assign ( #14828 )
...
this shows issue that pm_remove_bufferize drops tags, will fix in bufferize next. this also fixed rand being different in jit vs no-jit
2026-02-17 09:11:04 -05:00
chenyu
58fa82eef5
stronger test_assign_add ( #14826 )
...
also test self add 10 and 100 times
2026-02-17 08:36:09 -05:00
George Hotz
ff60dab622
Revert "big sink is on base ( #14819 )" ( #14825 )
...
This reverts commit 5fc3d8109f .
2026-02-17 19:18:06 +08:00
qazal
f8e485ee9e
nvcc/nvdisasm macos shim ( #14822 )
...
* move to backend
* and arch
* setup_nvcc_osx
* blackwell
* min test
* now getting dumb assert is_ptx
* support cubin.
* work
* remove that
* simpler
2026-02-17 20:07:05 +09:00
qazal
d24781f45f
viz: do not, ever, open devices ( #14820 )
...
* viz: do not, ever, open devices
* unwrap
* on the kernel info
2026-02-17 19:42:44 +09:00
George Hotz
5fc3d8109f
big sink is on base ( #14819 )
...
* big sink is on base
* contiguous fixes tests
2026-02-17 18:32:56 +08:00
qazal
99a988b9d2
viz: remove ProgramSpec from trace ( #14818 )
2026-02-17 19:04:58 +09:00
qazal
f590564bf7
gemm multiple is only for cdna4 asm ( #14814 )
...
* gemm multiple is only for cdna4 asm
* move to backend
* and arch
* path
2026-02-17 14:00:02 +09:00
George Hotz
5bd2862d1a
late compile the cdna gemm ( #14783 )
...
* late compile the cdna gemm
* remove old things
* finalize inplace
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2026-02-17 13:04:22 +09:00
Christopher Milan
275319c789
IMAGE=1 2d indexing ( #14809 )
...
* IMAGE=1 2d indexing
* cleanup
* oops
* go back to 'idx'
* fix vals
* fix
* ugh
2026-02-16 22:51:18 -05:00
George Hotz
f081f154ae
parameterize the CDNA asm gemm ( #14813 )
...
* parameterize the CDNA asm gemm
* fix llama test
* fix
* add more gemmt ests
* confirm all match
* test these asm gemms
2026-02-17 11:35:18 +08:00
George Hotz
bc3487d607
VIZ display cleanups ( #14811 )
...
* exclude reshape/expand broadcasts from viz
* limit src lines
2026-02-17 10:03:08 +08:00
chenyu
5bca5be2d2
test slice assign twice retains the buffer ( #14807 )
2026-02-16 20:01:47 -05:00
ridoy majumdar
ba39a19114
viz: remove duplicate Ops.PARAM color ( #14808 )
2026-02-17 09:31:47 +09:00
chenyu
9b44fbe0b8
update test_assign_add_twice ( #14806 )
...
failed test case to show that `+=1` twice returns a different buffer
2026-02-16 17:52:11 -05:00
chenyu
f290af6c7d
test_schedule always test with SPLIT_REDUCEOP=0 ( #14802 )
...
* test_schedule always test with SPLIT_REDUCEOP=0
except tests that tests SPLIT_REDUCEOP=1
* like that
2026-02-16 15:30:26 -05:00
kevvz
e41da0c396
use relative address for MOCKGPU rdna4 tracing ( #14801 )
...
* rdna3/4 trace separation
* remove comments
2026-02-16 22:59:46 +03:00
nimlgen
131bbbbfd8
am: smu_v13_0_12 ( #14800 )
2026-02-16 22:58:10 +03:00
nimlgen
7ddc888ad5
am: 48bit for gfx950 ( #14799 )
2026-02-16 22:48:07 +03:00
nimlgen
9f8afb518c
viz: sdma gb/s in graph ( #14798 )
...
* viz: sdma gb/s in graph
* f
2026-02-16 16:45:06 +03:00
qazal
db3db476ff
viz: add GB/s to SDMA ( #14795 )
...
* work
* better
* fix that
* no decimal
2026-02-16 20:09:20 +09:00
qazal
2b36708c6d
viz: split all long labels with ... ( #14794 )
2026-02-16 19:18:42 +09:00
qazal
d213fe95a0
viz: integer ticks on the x axis, fix small cycle numbers ( #14792 )
2026-02-16 18:07:40 +09:00
George Hotz
47d39a6b8b
add sqtt support to the emulator ( #14791 )
...
* add sqtt support to the emulator
* more sqtt
* cleanup
* cleanups
* simpler tests
* some decent tests
* test branch
2026-02-16 16:48:26 +08:00
wozeparrot
45aebe1572
hipkittens fa backward ( #14723 )
2026-02-16 00:38:44 -08:00
Nicolas Pinto
20b658b786
fuse MULACC after MUL->SHL ( #14788 )
...
* decompositions: fuse (x << n) + c to MULACC
MUL→SHL converts x*(2^n) to x<<n before MULACC can fuse (x*c)+y.
Add pattern to also fuse (x<<n)+c → MULACC(x, 2^n, c) for backends
that support both MULACC and SHL.
* test: add test_mulacc_shl for SHL->MULACC fusion
* test: relax test_mulacc_unrolled to >= 4
SHL->MULACC fusion now also catches power-of-2 address calculations,
increasing MULACC count from 4 to 6 on PTX. the test's intent is that
each unrolled multiply is individually fused (not grouped), so >= 4
is the correct assertion.
---------
Co-authored-by: Prithvish <deformercoding@gmail.com >
Co-authored-by: Nicolas Pinto <41171+npinto@users.noreply.github.com >
Co-authored-by: Nicolas Pinto <npinto@mbp23.local >
2026-02-16 16:26:44 +08:00
qazal
ac62d28ddc
viz: amdgpu arch cleanup ( #14790 )
...
* viz: amdgpu arch cleanup
* don't do that
* simpler sqttmap
* work
* self.arch
2026-02-16 16:48:12 +09:00
George Hotz
401095e3e7
emulator barrier tests ( #14789 )
2026-02-16 15:31:01 +08:00
qazal
c7a4dbf918
viz: get program binary from the UOp ( #14787 )
...
* viz: get program binary from the UOp
* remove that
* less
* rename View Program to View Source
* two words
* fix
2026-02-16 15:46:58 +09:00
Bautista Garcia
0f1ca8eb43
torch_load: fix shared storage slicing ( #14771 )
...
* faster zip_extract + usage in torch load
* clean zip in torch load
* working zipextract in torchload
* tar_extract in tar path
* faster tar path
* tests passing, cleanup needed
* faster tar with 1MB buffer
* comments
* unify storage_source with all paths
* use bufferedreader in zip path
* fix ruff
* clean
* removed unnecessary string conversion
* fix for tensors that share storage
* less hacky
* shared storage test
* test comment
* linter
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2026-02-16 14:30:13 +08:00
George Hotz
dff9cf35c2
amd asm emulator fixes + run it in CI ( #14786 )
...
* amd asm fix, try 2
* fix tests
2026-02-16 13:24:21 +08:00
qazal
55a4dfa2e0
cdna4 asm_gemm tests in CI on the null backend ( #14785 )
...
* cdna4 asm_gemm tests in CI on the null backend
* no .numpy() in null
* better
* gemm/asm: device comes from renderer
2026-02-16 14:06:23 +09:00