Commit Graph

9878 Commits

Author SHA1 Message Date
George Hotz
6eaea3c9d9 RANGEIFY=2 is partial contig 2025-08-21 16:33:33 -07:00
Jordan Chalupka
8de6db15ac exclude .git from ruff (#11773) 2025-08-21 15:37:50 -07:00
George Hotz
5954a0975f fix some assigns on rangeify (#11774)
* fix some assigns

* llvm test

* more tests

* upd test
2025-08-21 15:15:54 -07:00
qazal
2e0eb88549 viz: add metadata to UOp tracing (#11772)
* viz: add metadata to UOp tracing

* place after tag

* optional field

* err, refcount of root must be 0
2025-08-22 00:18:45 +03:00
George Hotz
d6f9606e93 small cleanups to rangeify (#11769) 2025-08-21 11:15:09 -07:00
uuuvn
bd4a9473b0 Multihost exception handling (#11729)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-08-21 13:51:49 -04:00
George Hotz
a2c7b807e0 don't bufferize 0s (#11766) 2025-08-21 10:10:56 -07:00
nimlgen
9eff7cd1d8 am: support 64bit discovery (#11768) 2025-08-21 18:28:13 +03:00
b1tg
56cd47a159 fix amd llvm bf16 tc (#11713)
* fix amd llvm bf16 tc

* is_cdna

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-21 09:33:28 -04:00
George Hotz
a044648111 rangeify load cleanups + multi support (#11765)
* use the old buf_uop + cleanups

* simpler handling of load

* everything needed for multi too
2025-08-20 20:55:49 -07:00
George Hotz
9f94c25a25 fix symbolic usage. use shrink, not reshape (#11762)
* fix test_var

* revert those things

* fix the ones in test tiny

* use better syntax

* it's the same, but that's clearer

* fix pad
2025-08-20 18:35:42 -07:00
chenyu
5276fbc9c5 fix gather with inf values (#11760)
(mask * x) is wrong because 0*inf is nan. i feel we have a lot of those still...
2025-08-20 20:35:40 -04:00
wozeparrot
b979162c5d llama3 eval train (#11706) 2025-08-20 19:56:35 -04:00
chenyu
dbd3b67657 clamp GRAD_CLIP_NORM in llama (#11761) 2025-08-20 19:55:50 -04:00
George Hotz
9635592141 ** rangeify, try 3 (#11683)
* ** rangeify, try 3

* bring that over

* bufferize, don't use contig tag

* work

* ish

* fix rangeify

* flash attention is back

* fix rangeify tests

* stuff passes

* fix test_log_softmax

* more stuff passes

* progress children

* new endrange solution

* progress

* progress counter

* basic assign

* contigs only

* symbolic in schedule

* unbind_kernel

* late children

* ops fixed

* beautiful mnist is close

* that seems to work

* mnist works

* improve names

* fix bmnist

* no pcontig

* testing backward

* work

* clone movement ops

* new_range helper

* MBLOCK/MERGE

* ops tests pass

* revert mblock stuff

* cleanups...but it breaks ops

* remove reindex

* hack for relu

* disable the hacks

* more hacks

* upd

* mostly works with cleanups disabled

* ndr

* ops tests pass

* terrible hacks for indexing to work

* context mismatch

* pcontig

* split pcontig v contig

* z3 trunc

* null

* no fuse in rangeify

* ops test passes

* lnorm

* fix assign

* nd rangeify

* both should work

* tests for rangeify

* cleanups

* stores pass the pointer through

* disable pcontig for now

* PARTIAL_CONTIG is a flag
2025-08-20 14:22:44 -07:00
chenyu
d7553721d1 clean up test_dtype_alu (#11757)
remove the check that looks into schedule, only test if output matches
2025-08-20 14:36:18 -04:00
chenyu
5f08a3e928 hotfix: cast half to float in Tensor.tolist (#11755)
workaround for python < 3.12
2025-08-20 12:18:35 -04:00
qazal
de4cb722a4 viz: add metadata and var_vals tracing (#11753)
* viz: add metadata and var_vals tracing

* add test_trace_metadata

* set TRACEMETA=1
2025-08-20 18:39:51 +03:00
nimlgen
6589c9e643 hcq: better errors for ifaces (#11751)
* hcq: better errors for ifaces

* fix linter

* typo

* space
2025-08-20 17:50:51 +03:00
chenyu
be7b0b6970 TRANSCENDENTAL_SUPPORTED_DTYPES->TRANSCENDENTAL_DTYPES (#11752) 2025-08-20 10:29:36 -04:00
ttomsa
220a2a88d7 a*(1/b) -> a/b on LLVM, CPU (#11743)
* add fdiv rewrite

* :)

* use float_lop

* use reciprocal()

* revert

* move to decompositions
2025-08-20 09:35:10 -04:00
George Hotz
12ab3f8b06 correct row_count in process replay (#11748) 2025-08-19 22:21:07 -07:00
George Hotz
8af8808c61 cleanup tests, bump caches (#11746) 2025-08-19 21:21:07 -07:00
George Hotz
00391db628 no ast for mem estimate (#11744)
* no ast for mem estimate

* skip for webgpu
2025-08-19 20:18:45 -07:00
chenyu
dd413e1208 remove a Ops.REDUCE check in reduce_collapse [pr] (#11734) 2025-08-19 19:21:28 -04:00
ttomsa
70c3f1fb29 x.where(False, True) -> !x (#11738)
* add pat

* add test
2025-08-19 19:08:16 -04:00
George Hotz
1d307f568c move device tests to test/device + test cleanups (#11735)
* move device tests to test/device

* test speedups

* test device

* linalg to unit

* upd

* so pytest just works

* more divide and skip

* speed

* test devectorize

* add pillow
2025-08-19 16:02:20 -07:00
wozeparrot
bcc7623025 feat: bump version to 0.11.0 (#11736) v0.11.0 2025-08-19 17:08:56 -04:00
qazal
8c987b3293 DISABLE_FAST_IDIV is a context var [pr] (#11733) 2025-08-19 23:30:50 +03:00
George Hotz
bf467c623d changes from rangeify + better NullRenderer (#11732)
* changes from rangeify + better NullRenderer

* fix test
2025-08-19 12:51:54 -07:00
chenyu
02353588cb small getitem cleanup (#11730) 2025-08-19 12:25:58 -04:00
chenyu
712a5c651a minor Tensor.triu cleanup (#11728)
less confusing dtype
2025-08-19 08:07:38 -04:00
nimlgen
9c9e337c78 amd: parse soc enums (#11727)
* amd: parse soc enums

* remove from mock

* fix

* minimal amd_gpu
2025-08-19 15:06:09 +03:00
qazal
57ad69160a viz: inline memory shape spec (#11725) 2025-08-19 08:03:29 +03:00
chenyu
c5b52e9321 onnx RotaryEmbedding cleanup (#11724) 2025-08-18 23:34:42 -04:00
George Hotz
31619774a9 Revert "Revert "fix the misused cast in amd llvm tc (#11711)" (#11715)" (#11723)
This reverts commit ca28db5a97.
2025-08-18 19:44:35 -07:00
George Hotz
2ea54d7337 improve syntax of UPats using f [pr] (#11717)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-18 20:49:45 -04:00
chenyu
b67345caa3 use truncate in onnx read_int64 [pr] (#11720) 2025-08-18 20:49:35 -04:00
qazal
50e789e290 hotfix: add device to decompositions ctx (#11721)
fast_idiv requires it for checking if a dtype is supported. Without
this, codegen creates non reproducible output without a complete
os.environ. since `is_dtype_supported` will open devices based on the
env var unless the device is specified by the caller.
2025-08-19 03:31:16 +03:00
George Hotz
4b3fcb4064 Revert "REDUCE_AXIS keepdim=False (#11311)" (#11718)
This reverts commit b518a7378a.
2025-08-18 13:28:53 -07:00
George Hotz
67d0ba5bd8 new ops from rangeify (#11716) 2025-08-18 13:13:11 -07:00
George Hotz
4afa0b86bb hotfix: ls -lh on wheel size 2025-08-18 11:52:59 -07:00
George Hotz
ca28db5a97 Revert "fix the misused cast in amd llvm tc (#11711)" (#11715)
This reverts commit 799a637b03.
2025-08-18 11:51:28 -07:00
chenyu
c10e4c4e20 print wheel build size (#11714) 2025-08-18 14:29:47 -04:00
b1tg
b518a7378a REDUCE_AXIS keepdim=False (#11311)
* progress

* fix tests

* fix tests

* remove hack for test_symfold

* fix test_conv.py  on llvm

* hack test_cache_speed

* lint

* remove hack for helper_linearizer_opt

* tests

* fix DSP

* clean up

* remove hack for kernelize.py

* hack for test/test_multitensor.py TestMultiTensor.test_matmul_shard_none

* clean

* uop.r need reshape?

* lower_store cause fail

* fix lower?

* avoid contiguous hack

* 2134

* conv2d count

* remove unused

* hack lower

* reduced and clean up

* fix TestMultiTensor.test_matmul_shard_none

* src sync + fix TestMultiTensor.test_matmul_shard_none

* remove excluded in mop

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-08-18 10:09:17 -07:00
b1tg
61884f2057 add cstyle renderer to the NULL device (#11709)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-18 09:52:22 -07:00
uuuvn
18db8fa311 Allow choosing leaders in multinode reduce (#11506)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-08-18 12:43:20 -04:00
b1tg
799a637b03 fix the misused cast in amd llvm tc (#11711)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-18 09:15:34 -07:00
qazal
fef97547f9 viz: preset the final timestamp (#11712) 2025-08-18 17:51:21 +03:00
chenyu
c30a113b2a support bf16 and fp8 in Tensor.tolist (#11704)
memoryview does not support it, but casting works fine so cast is fine
2025-08-17 15:11:13 -04:00