Commit Graph

1265 Commits

Author SHA1 Message Date
Shun Usami
34a05b31fe Fix advanced tensor indexing setitem (#12128)
* Add failure test case for advanced tensor indexing setitem

* Fix advanced tensor indexing setitem when permuted

* Reduce line count

* Revert unnecessary change

* Combine two lines into one
2025-09-14 15:22:40 -04:00
Sieds Lykles
2fc0bd150b Arange overflow raises error and one_hot upcast (#11975)
* add error

* to_dtype

* shorten line

* add test

* upcast one hot dim im overflows
2025-09-13 00:18:25 +02:00
b1tg
14faf7a5c0 AutoCastType tests for fp8s/bf16 (#12084) 2025-09-09 11:33:01 -04:00
nimlgen
9182948951 remove llvm_bf16_cast (#12075) 2025-09-08 20:51:15 +03:00
Sieds Lykles
f326df8ae8 add type: ignore (#12059) 2025-09-06 21:17:35 +02:00
Sieds Lykles
581b2388c2 add dtypes.index (#12015)
* add dtypes.index

* cast shape, stride and mask to dtypes.index in view.create

* move pm_lower_index_dtype to ops

* DEFINE_VAR is dtype.index by default

* merge var_val_using_str

* remove int from commutative

* fix test_rewrite_map

* change that to dtypes.index

* change some int to index

* shorten those

* remove old cast in renderer

* cleanup

* change that back

* add comment

* delete comment

* just delete those

* view doesnt have to cast anymore

* adjust comment
2025-09-06 06:03:44 +02:00
Sieds Lykles
c6c16b2946 var_vals uses str for var (#12011)
* var_vals is str,int

* remove imports

* remove print

* fix test

* change var_vals in hcq

* update test_hcq

* fix multitensor _device_num var

* fix syminfer test

* shorten line

* p.vars stays list[Variable]

* shorten line

* vars is back to tuple[Variable, ...]

* change var_vals in extra

* change var_vals from shapetracker

* var_vals is str:int

* fix signature
2025-09-06 04:16:12 +02:00
George Hotz
870f63d9cc add WARP axistype, fix postopt bugs (#12033)
* postopt is 83% match

* warp is bright CYAN

* beautiful mnist beam works

* fix shutdown bug
2025-09-05 10:36:55 -07:00
chenyu
d0e739453e update many einsum tests (#11981)
correct the exception testing, and raise ValueError instead of assert when checking args
2025-09-03 15:40:20 -04:00
chenyu
561318fea7 Tensor.cos in test_stype_alu (#11916)
* Tensor.cos in test_stype_alu

* need this fix anyway
2025-08-29 20:26:36 -04:00
Ben Waldron
ea1be2e4cd [bounty] Remove using reshape to register symbolic shape (#11771)
* Modify tests and start work towards removing symbolic reshape

* Refactor symbolic reshape

* fix small error

* much cleaner + fix more tests

* Can remove this now

* Update test_symbolic_ops and test_tiny

* Couple more tests

* Unused import

* More tests and add EXPAND to Tensor.empty

* Fix test beam search

* all int

* Fix rangeify by adding shrink

* Remove OOB check and so fix test_symbolic_jit

* test_symbolic_jit doesn't need OOB Context anymore either

* Should remove that test now

* Cleanups part 1

* fix linters

* Final cleanups

* Don't reassign inside for loop

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-28 12:30:49 -04:00
chenyu
beb5982165 FUSE_ATTENTION (#11884) 2025-08-27 19:59:17 -04:00
chenyu
337e979a59 call dtypes.as_const in Tensor(list) (#11840) 2025-08-25 22:08:26 -04:00
chenyu
7123df3928 Use Tensor.logaddexp to implement Tensor.softplus (#11796)
instead of piecewise linear, numerical is handled by logaddexp. jax does this and i think it's more elegant than torch's approach
2025-08-23 11:52:29 -04:00
chenyu
fb8ee02424 Tensor.logaddexp (#11793) 2025-08-23 09:15:00 -04:00
chenyu
e39b25cd36 upcast float exp to at least float32 (#11758)
* upcast float exp to at least float32

* unlucky seed
2025-08-22 20:16:34 -04:00
geohotstan
1e679bd789 fix max_unpool2d inf (#11784)
* start

* add regression test for maxunpool2d
2025-08-22 08:31:24 -04:00
George Hotz
bb8de51e5f remove unused early cleanups + contig w range [pr] (#11780)
* remove unused early cleanups [pr]

* contiguous with range

* woah, this works
2025-08-21 20:04:45 -07:00
chenyu
91a4de4ca7 fix getitem with inf in tensor (#11781) 2025-08-21 21:55:32 -04:00
chenyu
5276fbc9c5 fix gather with inf values (#11760)
(mask * x) is wrong because 0*inf is nan. i feel we have a lot of those still...
2025-08-20 20:35:40 -04:00
George Hotz
9635592141 ** rangeify, try 3 (#11683)
* ** rangeify, try 3

* bring that over

* bufferize, don't use contig tag

* work

* ish

* fix rangeify

* flash attention is back

* fix rangeify tests

* stuff passes

* fix test_log_softmax

* more stuff passes

* progress children

* new endrange solution

* progress

* progress counter

* basic assign

* contigs only

* symbolic in schedule

* unbind_kernel

* late children

* ops fixed

* beautiful mnist is close

* that seems to work

* mnist works

* improve names

* fix bmnist

* no pcontig

* testing backward

* work

* clone movement ops

* new_range helper

* MBLOCK/MERGE

* ops tests pass

* revert mblock stuff

* cleanups...but it breaks ops

* remove reindex

* hack for relu

* disable the hacks

* more hacks

* upd

* mostly works with cleanups disabled

* ndr

* ops tests pass

* terrible hacks for indexing to work

* context mismatch

* pcontig

* split pcontig v contig

* z3 trunc

* null

* no fuse in rangeify

* ops test passes

* lnorm

* fix assign

* nd rangeify

* both should work

* tests for rangeify

* cleanups

* stores pass the pointer through

* disable pcontig for now

* PARTIAL_CONTIG is a flag
2025-08-20 14:22:44 -07:00
chenyu
5f08a3e928 hotfix: cast half to float in Tensor.tolist (#11755)
workaround for python < 3.12
2025-08-20 12:18:35 -04:00
chenyu
02353588cb small getitem cleanup (#11730) 2025-08-19 12:25:58 -04:00
chenyu
712a5c651a minor Tensor.triu cleanup (#11728)
less confusing dtype
2025-08-19 08:07:38 -04:00
George Hotz
4b3fcb4064 Revert "REDUCE_AXIS keepdim=False (#11311)" (#11718)
This reverts commit b518a7378a.
2025-08-18 13:28:53 -07:00
b1tg
b518a7378a REDUCE_AXIS keepdim=False (#11311)
* progress

* fix tests

* fix tests

* remove hack for test_symfold

* fix test_conv.py  on llvm

* hack test_cache_speed

* lint

* remove hack for helper_linearizer_opt

* tests

* fix DSP

* clean up

* remove hack for kernelize.py

* hack for test/test_multitensor.py TestMultiTensor.test_matmul_shard_none

* clean

* uop.r need reshape?

* lower_store cause fail

* fix lower?

* avoid contiguous hack

* 2134

* conv2d count

* remove unused

* hack lower

* reduced and clean up

* fix TestMultiTensor.test_matmul_shard_none

* src sync + fix TestMultiTensor.test_matmul_shard_none

* remove excluded in mop

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-08-18 10:09:17 -07:00
chenyu
c30a113b2a support bf16 and fp8 in Tensor.tolist (#11704)
memoryview does not support it, but casting works fine so cast is fine
2025-08-17 15:11:13 -04:00
George Hotz
9366a23eb0 test backward in test_tiny (#11697)
* test backward in test_tiny

* empty
2025-08-16 20:29:39 -07:00
chenyu
4fe19eec72 Ops.TRUNC (#11659) 2025-08-13 18:40:48 -04:00
George Hotz
22bdf48cdd render ranges in viz, name gbufs with sizes. changes from rangeify (#11656)
* render ranges in viz, name gbufs with sizes. changes from rangeify

* fix unit test dtypes
2025-08-13 12:46:16 -07:00
kevvz
e2873a3a41 [bounty] Muon optim (#11414)
* newton schulz

* add muon + move newton schulz to tensor

* compact newton schulz

* better tests

* cleanup

* add comments for muon

* cleanup

* add export with tests

* match muon optim with test optim

* cleanup

* unsed import

* correct comment

* whitespace

* move export

* muon test fix

* match reference impl + tests

* remove export by moving muon device

* add credit

* cleanup

* remove print

* spacing

* spacing

* comma

* cleanup

* removal

* fix tests + optim momentum

* consistent is not/ not

* more consistency

* fix test

* cleanup

* fix the nones

* remove comment

* cast

* comment

* comment

* muon teeny test

* muon flag beautiful mnist

* set steps

* steps as hyperparam

* match default test steps

* name

* large cleanup

* dont care about steps

* nesterov false default

* match each other impl

* steps

* switch nest

* swap defaults

* update docstring

* add no nesterov test

* ban fuse_optim

* prints

* classical momentum

* alternative condition

* recon

* pre + post wd

* false default

* detach

* signature changes

* context

* swap order

* big cleanup

* 0 step instead

* parity

* remove fuse

* remove fused

* better paper

* assert message

* correct shape check + eps

* multidim

* add eps

* cleanup

* correct assert message

* lint

* better tests

* naming

* ns_steps,ns_params

* update docstring

* docstring

* match sgd and muon together

* sandwich

* add back fused

* parity

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-08-13 14:27:55 -04:00
chenyu
94e6d84e32 rewrite Tensor.round to not use cast int (#11654) 2025-08-13 13:51:08 -04:00
chenyu
0d7075f2de assign should broadcast input tensor (#11629)
fixed test_assign_broadcast
2025-08-11 23:36:35 -04:00
chenyu
0c97d6de1b don't round pow output for int pow int (#11625)
also added atol=0 and big pows for the tests
2025-08-11 20:57:47 -04:00
chenyu
d623f6d850 support int Tensor pow to const non-negative int (#11624)
matches torch
2025-08-11 19:50:19 -04:00
chenyu
0806677b51 rewrite sort idx (#11613) 2025-08-11 16:20:56 -04:00
George Hotz
700c11597b switch contextvars.ContextVar to _ContextVar (#11621) 2025-08-11 12:20:09 -07:00
chenyu
a67e0917c3 list indexing can normalize in python (#11609)
* list indexing can normalize in python

list index does not need to be normalized in tensor

* update those
2025-08-10 20:02:38 -04:00
chenyu
f7aa1b85fe minor sort cleanups (#11602) 2025-08-10 01:51:23 -04:00
chenyu
dfb702ef33 fix sort for small dim (#11601)
* fix sort for small dim

* fixed test_sort_empty
2025-08-10 01:17:41 -04:00
chenyu
aa1a6f2132 support threshold in Tensor.softplus (#11564)
fix gradient for large input
2025-08-07 13:43:18 -04:00
b1tg
8b8bd6c534 make einsum generate same kernels (#11508)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-05 11:12:52 -04:00
chenyu
8a11af01ed remove broken paperswithcode links in doc (#11497) 2025-08-04 13:12:33 -04:00
chenyu
823f1a01db move cast around expand backward to tensor.py (#11483) 2025-08-02 23:03:54 -04:00
chenyu
66be747908 few more dtype cast convinience methods (#11480) 2025-08-02 15:47:09 -04:00
kevvz
ef7e01cadf Fix SVD shape bug + Fix batched SVD bug (#11477)
* failing test case

* fix

* better test

* space
2025-08-02 09:47:41 -07:00
wozeparrot
24dd0d52ed feat: test remove to cpu (#11444) 2025-07-30 20:18:56 -07:00
chenyu
88c338bfcc add kernelize to keccak for each data block (#11370)
* add kernelize to keccak for each data block

test_long works now. this prevents internal uops from growing propotional to data length and eventually too deep

* this?

* hash stuff

* gate test

* mv
2025-07-25 16:07:20 -04:00
chenyu
cc795c6656 simplify keccak pad mask code (#11362) 2025-07-24 19:24:10 -04:00
chenyu
c0c4bc9d7c use int32 for keccak reorder_indexes (#11360)
it's used for tensor indexing, so int32 instead of uint64 is slightly faster
2025-07-24 15:54:50 -04:00