Commit Graph

623 Commits

Author SHA1 Message Date
chenyu
8dfa0024f0 raise in scatter if self and src have different dtype [pr] (#9109)
raise RuntimeError that matches torch instead of an implcitly cast
2025-02-15 11:21:34 -05:00
Marcello Fuschi
8824f7e9df Make logcumsumexp numerically stable (#9050)
* Make logcumsumexp numerically stable

* Refactor

* Refactor for special case ndim=0

* Refactor

* Use the correct device for mask

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-14 19:25:17 -05:00
chenyu
73af42aeab fix pow backward when base is 0 (#9075) 2025-02-13 21:06:01 -05:00
chenyu
5ef48bbe0a swap order in rsqrt (#9069)
fixed backward for 0
2025-02-13 16:51:21 -05:00
chenyu
e02e3b94c3 remove SQRT hack in llvm (#9067)
replaced with xpow 0.5 in transcendental. fixed sqrt(0) backward
2025-02-13 15:42:34 -05:00
chenyu
2573d0621a Tensor.scatter_reduce touchup [pr] (#9060) 2025-02-13 10:01:14 -05:00
Josh Moore
1f9d2442b9 Add Tensor.scatter_reduce (#8947)
* pytorch scatter -> scatter_reduce

* WIP scatter_reduce implementation

* _pre_scatter return type hint

* split out src, mask to satisfy linter

* Add src cast back in

* dict of lambdas instead of ifs

* sum and prod reduction ops with include_self

* add reduce arg error message

* add amax and amin reduction ops

* Fix include_self for higher dims

* Simplify

* Simplify amax and amin too

* Pull include_self logic out into _inv_mask function

* reduce arg cannot be None for scatter_reduce

* Fix self-mask issue

* Add mean reduce op

* Add tests

* any() not needed here

* remove comment

* End support for Tensor src with reduce arg in tinygrad scatter

* Process index, dim inside actual functions

* Add scatter_reduce to onnx

* Add excluded onnx ScatterElements reduction tests back in

* Save 2 lines on the mask helpers

* Update docs

* Add include_self=False tests

* cleanup

* Remove unneeded helper function

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-13 09:08:54 -05:00
Josh Moore
0c97c10814 TestOps: silence pytorch std()/var() degrees of freedom warnings (#9034) 2025-02-12 14:49:18 +08:00
chenyu
2845f8797a failed test cases for rsqrt at 0 and similar ones (#9035)
* failed test cases for rsqrt at 0 and similar ones

related to 0*inf

* this failed
2025-02-11 17:50:16 -05:00
chenyu
586e48d696 a few more backward tests now pass (#9010) 2025-02-10 12:46:21 -05:00
chenyu
25fa5e4d5f enable backward tests in test_std_one_in_axis [pr] (#9007)
still one correction=0 case is broken

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-02-10 10:44:05 -05:00
Ahmed Harmouche
133cacadde Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646)
* Switch to dawn, all tests passing locally

* Use dawn-python

* Skip failing test

* Skip midcast and fix timestamp on metal ci

* Autogen webgpu

* Try fetch dawn lib again

* /usr/lib

* Without lib prefix

* Test autogen diff

* Delete webgpu support, move everything to ops_webgpu

* mypy fix

* Simplify, refactor

* Line savings

* No ResultContainer

* Type annotation for result

* Some more simplifications

* Why was this explicit sync used at all?

* Refactor: delete functions that are only used once

* Create shader module inline

* Clear unit tests cache, maybe that solves it

* That wasn't it

* Try deleting cache to pass failing weight compare

* weights_only=False for pytorch 2.6

* Simplify ctype array creation

* Remove nanosecond precision timestamps

* Simplify error handling

* Refactor, add back type annotations

* Deleted custom submit function, refactor

* read_buffer simplify

* Fix use after free, refactor

* Simplify supported_features

* Runtime docs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 15:16:59 +08:00
Bhavya Gada
3b67712892 [bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple (#8937)
* fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple

* remove expectedFailure

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 10:07:54 +08:00
George Hotz
f54242849d failing test for the devectorize [pr] (#8940)
* failing test for the devectorize [pr]

* add DEVECTORIZE to method_cache
2025-02-07 09:44:54 +08:00
chenyu
189bfa164e enable backward test for pow(neg const ** x) (#8912)
backward works now. 0**x still does not work because it's a special case fixed in transcendental
2025-02-05 15:35:21 -05:00
eliotgolding
bb5ded85cc Don't rewrite idiv to rshift when numerator is negative (#8885)
* more conditions for shift rewrite mul/idiv

* make ptx test uint so the new condition is true

* delete idiv test

* rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division

* mul/div by 2**(large count) is unsupported anyway
2025-02-05 07:47:33 +08:00
chenyu
73ee2d74c0 raise RuntimeError for int base pow (#8852)
current implementation is not precise and blocking other simplification change
2025-02-01 12:11:57 -05:00
Sieds Lykles
78c0455c7a Better stable sigmoid (#8806)
Uses `1/(x*x) -> 1/x * 1/x`  together with `x/(1+x) -> 1-1/(1+x)` to
rewrite sigmoid instead of `x/((x+1)(x+1)) -> 1/(x+1)*(1-1/(x+1))`

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-29 16:08:53 -05:00
George Hotz
b4bf6a7dea switch backward to use gradient [pr] (#8235)
* switch backward to use gradient [pr]

* set device correctly, dedup

* why does that fail?

* add noop cast

* simple backward

* fix beautiful_mnist

* touchups

* set in compute_gradient

* uop_count

* uop_count was wrong

* collections

* no note

* skip that test

* update sched kernel counts

* train mnist is 65

* fix metadata and gc

* fixes

* materialize_grads

* no pathlib stuff

* add contiguous_backward, fix bugs

* add some realize

* fix multi
2025-01-26 09:12:16 +09:00
chenyu
2d0842386d fix parse_valid for float uop (#8681)
x < c -> X <= c-1 only works for int
2025-01-19 18:15:49 -05:00
chenyu
5842ee56c6 raise if attn_mask is set when is_causal=True in sdpa [pr] (#8675)
matches torch, also fixed incorrect usage in tests
2025-01-19 12:55:04 -05:00
geohotstan
9229867fec Support asymmetrical pads for all pooling functions (#8109)
* implemented in tensor

* apply onnx tests to asymmetrical pads

* better onnx op ordering

* correct ceil_mode asymmetrical

* fix onnx_ops comments

* a few more TODOs and fix some stupidity

* fix some typing

* fix test

* mypy still a little messed up

* refactor out pad struct transformation

* add simple docs for now

* add whatever tests possible

* add tests for _resolve_pool_pads

* better err msg

* whoops didn't mean to include this

* retry CI

* enable asymmetric pads onnx tests

* better docs

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-05 16:01:08 -05:00
geohotstan
3dfc8e1706 Share a _resolve_pool_pads function for pool ops in Tensor (#8485)
* _padding2d -> _resolve_pool_pads

* rephrase err msg

* even better error msg

* check asymmetric first os people don't hit error twice

* test against torch
2025-01-03 23:54:11 -05:00
chenyu
f3fdec940d Tensor.mod (#8458)
it's a python style mod. possibily can be cleaner with a floor div

relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs
2024-12-31 11:31:42 -05:00
chenyu
de3705168e update idiv doc and test cases (#8398)
test more cases when either numerator and denominator is negative and has remainder or not
2024-12-24 17:03:18 -05:00
chenyu
2c93f27652 remove explicit np.array and np.int32 in test_div_int [pr] (#8395)
vals default loads as int32 now in test_ops
2024-12-24 13:09:30 -05:00
geohotstan
78cb47dfc5 docs and tests clean ups (#8383) 2024-12-23 11:12:13 -05:00
chenyu
a556adf028 add test for Tensor silu and swish (#8381)
was only tested in onnx, added to test_ops for completeness
2024-12-22 21:08:59 -05:00
geohotstan
423d823c50 add GatherND and ScatterND to onnx ops (#8241)
* implemented

* this implementation is now correct

* this is fine I guess

* better variable names

* finally correct gathernd

* add a note

* eh just leave it at this for now

* teeny adjustment
2024-12-19 00:35:04 -05:00
chenyu
4c1733440d failed test case for stable sigmoid (#8245)
it should also work if implemented differently
2024-12-14 15:19:41 -05:00
chenyu
3eb952f537 fix some sigmoid extreme (#8238)
* fix some sigmoid extreme

quite brittle... the problem is it has 3 terms and mul might have bad order

* test_tanh_extreme

* just sigmoid gradient
2024-12-14 14:37:06 -05:00
George Hotz
e2f87ecf36 start work on new gradient (#7838)
* start work on new gradient

* more correct

* working tests

* more tests

* work

* add (faliing) gradient test

* add view and reduce gradient

* test_add works, many failing test_ops

* add max and reduce max

* add max and reduce max

* 129 failing

* 108 failed

* better view drawing

* 101 failed

* i got 99 failures

* 94 failures

* it's tons of terrible code, but only 50 tests fail

* only 19 failures

* same 19 but shorter

* minimal doesn't matter

* shorter

* lil simpler

* simpler

* simpler

* simpler

* 13 test failures

* nine tests fail

* all ops tests pass

* add contiguous gradient + fix sched tests

* faster by removing toposort calls

* missed one

* add jax to testing
2024-12-13 16:45:53 -08:00
chenyu
c4be1529cf update test for Tensor.softplus (#8150)
test beta and extreme inputs.
to pass big input, it needs to support `threshold`, which needs fix on backward that we punt until new gradient api
2024-12-10 17:48:02 -05:00
chenyu
286fec115e fix Tensor.minimum for int (#8145)
use invert instead of just neg. consolidate min, argmin, and minimum

also update maximum to not apply the mid point for int
2024-12-10 13:34:41 -05:00
chenyu
917deb88a4 make //0 return 0 in python_alu (#8131)
on master it raises because it cannot truncate inf to int, which crashes valid expression like `(t > 0).where(1//t, t)`.
2024-12-09 19:32:06 -05:00
chenyu
358287959b fix pow of int to negative const int (#8129)
it should return in int
2024-12-09 17:20:18 -05:00
chenyu
12f7d284e0 failed test case for int pow (#8128)
also updated test_ops so that non-float compares with `assert_equal`. removed `test_multinomial` which is tested better in test_randomness
2024-12-09 16:15:09 -05:00
qazal
80de06c8b9 scheduler ops_folding from delete_lazy (#8124)
* scheduler diff from delete_lazy

* test_std_mean

* late fold copy of CONST

* clang const is fine
2024-12-10 00:36:01 +08:00
chenyu
ccf54c2375 fix argmax/min on int32 min (#8118) 2024-12-09 02:29:23 -05:00
chenyu
c814de2dd4 fix bitwise_not for signed int (#8117)
-1 is correct because 2**32-1 is not within int32 range, so in some case clang casts the whole thing into uint32
2024-12-09 02:02:51 -05:00
qazal
69e48da961 set NOOPT in test_avg_pool3d_failure (#8112)
* set NOOPT=0 in test_avg_pool3d_failure

* noopt should still pass
2024-12-08 10:48:29 -05:00
geohotstan
f8294b3bda add avg pool 3d failure test (#8105)
* add test

* try simplify test case

* add TODO comment
2024-12-07 16:34:38 -05:00
chenyu
2d321646b8 default tensors to int32 in test_ops (#8097)
torch defaults to int64 but we care more about int32 anyway. remove skipped tests due to int64 not supported
2024-12-06 20:33:36 -05:00
chenyu
d000c08f04 fix return type of Tensor.pow (#8091)
int to power of int should return int etc, it hints that we would like to have Ops.POW
2024-12-06 13:38:29 -05:00
geohotstan
0b7c44677d Fix uint8 cast underflow (#6305)
* hacky fix for cast

* only float to uint8

* limit to float -> uint8

* touchup alu cast test

* improve tests and support more float to unsigned casts

* del one repeated test

* del 1 more repeated test

* try removing expected failure test

* hmmm try 1 more

* skip tests for flakiness

* uint64 super flaky

* clean up

* grammar

* just match numpy

* why is CI numpy different from local numpy

* increase verbosity

* try

* try2

* try3

* try4

* yeah idk

* new direction

* try again

* just don't support uint32 and uint64

* done?

* oops

* comment

* documentation

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-06 10:25:03 -05:00
geohotstan
a684d72e55 add ceil_mode for avg_pool and max_pool (#7579)
* wip pool

* check CI for remove alternative implementation

* Revert "check CI for remove alternative implementation"

This reverts commit 7b1bb900e5.

* fix test

* tests tests tests

* slap a resolve on it

* fix comment

* a little simpler pool

* check CI for removal again

* Revert "check CI for removal again"

This reverts commit be798b7857.

* small

* update

* some ez tests

* english

* clean up code

* fix ruff

* how did I +25 lines?

* small clean ups

* moar clean ups

* try test_avgpool2d_failure2 in CI

* final clean up

* exclude bug fix

* avg underscore pool

* no more edge case stuff

* add better comments for explanation

* add test cases for decreasing end padding

* address feedback

* improve test coverage

* tiny more polish as we wait for lines :D

* more readable code ordering

* add to documentation

* oops

* set to False instead

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-06 08:34:14 -05:00
Ahmed Harmouche
13eedd373b Run WebGPU tests on ubuntu (#8033) 2024-12-04 12:42:04 +01:00
Ahmed Harmouche
db330a3110 Remove WebGL (#8012) 2024-12-03 16:02:53 +01:00
geohotstan
0a2e10be1d add SELU to Tensor (#7993)
* add selu

* more clean ups
2024-12-02 10:04:01 -05:00
geohotstan
765096fe7d fix Tensor._pool edge case (#7581)
* split into another branch

* polish

* try this

* Revert "try this"

This reverts commit 84f711b13e.

* try

* Revert "try"

This reverts commit 89c7a7649b.

* idk anymore

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-28 23:17:13 -05:00