George Hotz
6eaea3c9d9
RANGEIFY=2 is partial contig
2025-08-21 16:33:33 -07:00
Jordan Chalupka
8de6db15ac
exclude .git from ruff ( #11773 )
2025-08-21 15:37:50 -07:00
George Hotz
5954a0975f
fix some assigns on rangeify ( #11774 )
...
* fix some assigns
* llvm test
* more tests
* upd test
2025-08-21 15:15:54 -07:00
qazal
2e0eb88549
viz: add metadata to UOp tracing ( #11772 )
...
* viz: add metadata to UOp tracing
* place after tag
* optional field
* err, refcount of root must be 0
2025-08-22 00:18:45 +03:00
George Hotz
d6f9606e93
small cleanups to rangeify ( #11769 )
2025-08-21 11:15:09 -07:00
uuuvn
bd4a9473b0
Multihost exception handling ( #11729 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-08-21 13:51:49 -04:00
George Hotz
a2c7b807e0
don't bufferize 0s ( #11766 )
2025-08-21 10:10:56 -07:00
nimlgen
9eff7cd1d8
am: support 64bit discovery ( #11768 )
2025-08-21 18:28:13 +03:00
b1tg
56cd47a159
fix amd llvm bf16 tc ( #11713 )
...
* fix amd llvm bf16 tc
* is_cdna
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-08-21 09:33:28 -04:00
George Hotz
a044648111
rangeify load cleanups + multi support ( #11765 )
...
* use the old buf_uop + cleanups
* simpler handling of load
* everything needed for multi too
2025-08-20 20:55:49 -07:00
George Hotz
9f94c25a25
fix symbolic usage. use shrink, not reshape ( #11762 )
...
* fix test_var
* revert those things
* fix the ones in test tiny
* use better syntax
* it's the same, but that's clearer
* fix pad
2025-08-20 18:35:42 -07:00
chenyu
5276fbc9c5
fix gather with inf values ( #11760 )
...
(mask * x) is wrong because 0*inf is nan. i feel we have a lot of those still...
2025-08-20 20:35:40 -04:00
wozeparrot
b979162c5d
llama3 eval train ( #11706 )
2025-08-20 19:56:35 -04:00
chenyu
dbd3b67657
clamp GRAD_CLIP_NORM in llama ( #11761 )
2025-08-20 19:55:50 -04:00
George Hotz
9635592141
** rangeify, try 3 ( #11683 )
...
* ** rangeify, try 3
* bring that over
* bufferize, don't use contig tag
* work
* ish
* fix rangeify
* flash attention is back
* fix rangeify tests
* stuff passes
* fix test_log_softmax
* more stuff passes
* progress children
* new endrange solution
* progress
* progress counter
* basic assign
* contigs only
* symbolic in schedule
* unbind_kernel
* late children
* ops fixed
* beautiful mnist is close
* that seems to work
* mnist works
* improve names
* fix bmnist
* no pcontig
* testing backward
* work
* clone movement ops
* new_range helper
* MBLOCK/MERGE
* ops tests pass
* revert mblock stuff
* cleanups...but it breaks ops
* remove reindex
* hack for relu
* disable the hacks
* more hacks
* upd
* mostly works with cleanups disabled
* ndr
* ops tests pass
* terrible hacks for indexing to work
* context mismatch
* pcontig
* split pcontig v contig
* z3 trunc
* null
* no fuse in rangeify
* ops test passes
* lnorm
* fix assign
* nd rangeify
* both should work
* tests for rangeify
* cleanups
* stores pass the pointer through
* disable pcontig for now
* PARTIAL_CONTIG is a flag
2025-08-20 14:22:44 -07:00
chenyu
d7553721d1
clean up test_dtype_alu ( #11757 )
...
remove the check that looks into schedule, only test if output matches
2025-08-20 14:36:18 -04:00
chenyu
5f08a3e928
hotfix: cast half to float in Tensor.tolist ( #11755 )
...
workaround for python < 3.12
2025-08-20 12:18:35 -04:00
qazal
de4cb722a4
viz: add metadata and var_vals tracing ( #11753 )
...
* viz: add metadata and var_vals tracing
* add test_trace_metadata
* set TRACEMETA=1
2025-08-20 18:39:51 +03:00
nimlgen
6589c9e643
hcq: better errors for ifaces ( #11751 )
...
* hcq: better errors for ifaces
* fix linter
* typo
* space
2025-08-20 17:50:51 +03:00
chenyu
be7b0b6970
TRANSCENDENTAL_SUPPORTED_DTYPES->TRANSCENDENTAL_DTYPES ( #11752 )
2025-08-20 10:29:36 -04:00
ttomsa
220a2a88d7
a*(1/b) -> a/b on LLVM, CPU ( #11743 )
...
* add fdiv rewrite
* :)
* use float_lop
* use reciprocal()
* revert
* move to decompositions
2025-08-20 09:35:10 -04:00
George Hotz
12ab3f8b06
correct row_count in process replay ( #11748 )
2025-08-19 22:21:07 -07:00
George Hotz
8af8808c61
cleanup tests, bump caches ( #11746 )
2025-08-19 21:21:07 -07:00
George Hotz
00391db628
no ast for mem estimate ( #11744 )
...
* no ast for mem estimate
* skip for webgpu
2025-08-19 20:18:45 -07:00
chenyu
dd413e1208
remove a Ops.REDUCE check in reduce_collapse [pr] ( #11734 )
2025-08-19 19:21:28 -04:00
ttomsa
70c3f1fb29
x.where(False, True) -> !x ( #11738 )
...
* add pat
* add test
2025-08-19 19:08:16 -04:00
George Hotz
1d307f568c
move device tests to test/device + test cleanups ( #11735 )
...
* move device tests to test/device
* test speedups
* test device
* linalg to unit
* upd
* so pytest just works
* more divide and skip
* speed
* test devectorize
* add pillow
2025-08-19 16:02:20 -07:00
wozeparrot
bcc7623025
feat: bump version to 0.11.0 ( #11736 )
v0.11.0
2025-08-19 17:08:56 -04:00
qazal
8c987b3293
DISABLE_FAST_IDIV is a context var [pr] ( #11733 )
2025-08-19 23:30:50 +03:00
George Hotz
bf467c623d
changes from rangeify + better NullRenderer ( #11732 )
...
* changes from rangeify + better NullRenderer
* fix test
2025-08-19 12:51:54 -07:00
chenyu
02353588cb
small getitem cleanup ( #11730 )
2025-08-19 12:25:58 -04:00
chenyu
712a5c651a
minor Tensor.triu cleanup ( #11728 )
...
less confusing dtype
2025-08-19 08:07:38 -04:00
nimlgen
9c9e337c78
amd: parse soc enums ( #11727 )
...
* amd: parse soc enums
* remove from mock
* fix
* minimal amd_gpu
2025-08-19 15:06:09 +03:00
qazal
57ad69160a
viz: inline memory shape spec ( #11725 )
2025-08-19 08:03:29 +03:00
chenyu
c5b52e9321
onnx RotaryEmbedding cleanup ( #11724 )
2025-08-18 23:34:42 -04:00
George Hotz
31619774a9
Revert "Revert "fix the misused cast in amd llvm tc ( #11711 )" ( #11715 )" ( #11723 )
...
This reverts commit ca28db5a97 .
2025-08-18 19:44:35 -07:00
George Hotz
2ea54d7337
improve syntax of UPats using f [pr] ( #11717 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-08-18 20:49:45 -04:00
chenyu
b67345caa3
use truncate in onnx read_int64 [pr] ( #11720 )
2025-08-18 20:49:35 -04:00
qazal
50e789e290
hotfix: add device to decompositions ctx ( #11721 )
...
fast_idiv requires it for checking if a dtype is supported. Without
this, codegen creates non reproducible output without a complete
os.environ. since `is_dtype_supported` will open devices based on the
env var unless the device is specified by the caller.
2025-08-19 03:31:16 +03:00
George Hotz
4b3fcb4064
Revert "REDUCE_AXIS keepdim=False ( #11311 )" ( #11718 )
...
This reverts commit b518a7378a .
2025-08-18 13:28:53 -07:00
George Hotz
67d0ba5bd8
new ops from rangeify ( #11716 )
2025-08-18 13:13:11 -07:00
George Hotz
4afa0b86bb
hotfix: ls -lh on wheel size
2025-08-18 11:52:59 -07:00
George Hotz
ca28db5a97
Revert "fix the misused cast in amd llvm tc ( #11711 )" ( #11715 )
...
This reverts commit 799a637b03 .
2025-08-18 11:51:28 -07:00
chenyu
c10e4c4e20
print wheel build size ( #11714 )
2025-08-18 14:29:47 -04:00
b1tg
b518a7378a
REDUCE_AXIS keepdim=False ( #11311 )
...
* progress
* fix tests
* fix tests
* remove hack for test_symfold
* fix test_conv.py on llvm
* hack test_cache_speed
* lint
* remove hack for helper_linearizer_opt
* tests
* fix DSP
* clean up
* remove hack for kernelize.py
* hack for test/test_multitensor.py TestMultiTensor.test_matmul_shard_none
* clean
* uop.r need reshape?
* lower_store cause fail
* fix lower?
* avoid contiguous hack
* 2134
* conv2d count
* remove unused
* hack lower
* reduced and clean up
* fix TestMultiTensor.test_matmul_shard_none
* src sync + fix TestMultiTensor.test_matmul_shard_none
* remove excluded in mop
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
2025-08-18 10:09:17 -07:00
b1tg
61884f2057
add cstyle renderer to the NULL device ( #11709 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-18 09:52:22 -07:00
uuuvn
18db8fa311
Allow choosing leaders in multinode reduce ( #11506 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-08-18 12:43:20 -04:00
b1tg
799a637b03
fix the misused cast in amd llvm tc ( #11711 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-18 09:15:34 -07:00
qazal
fef97547f9
viz: preset the final timestamp ( #11712 )
2025-08-18 17:51:21 +03:00
chenyu
c30a113b2a
support bf16 and fp8 in Tensor.tolist ( #11704 )
...
memoryview does not support it, but casting works fine so cast is fine
2025-08-17 15:11:13 -04:00