Commit Graph

500 Commits

Author SHA1 Message Date
chenyu
f3fdec940d Tensor.mod (#8458)
it's a python style mod. possibily can be cleaner with a floor div

relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs
2024-12-31 11:31:42 -05:00
chenyu
de3705168e update idiv doc and test cases (#8398)
test more cases when either numerator and denominator is negative and has remainder or not
2024-12-24 17:03:18 -05:00
chenyu
2c93f27652 remove explicit np.array and np.int32 in test_div_int [pr] (#8395)
vals default loads as int32 now in test_ops
2024-12-24 13:09:30 -05:00
geohotstan
78cb47dfc5 docs and tests clean ups (#8383) 2024-12-23 11:12:13 -05:00
chenyu
a556adf028 add test for Tensor silu and swish (#8381)
was only tested in onnx, added to test_ops for completeness
2024-12-22 21:08:59 -05:00
geohotstan
423d823c50 add GatherND and ScatterND to onnx ops (#8241)
* implemented

* this implementation is now correct

* this is fine I guess

* better variable names

* finally correct gathernd

* add a note

* eh just leave it at this for now

* teeny adjustment
2024-12-19 00:35:04 -05:00
chenyu
4c1733440d failed test case for stable sigmoid (#8245)
it should also work if implemented differently
2024-12-14 15:19:41 -05:00
chenyu
3eb952f537 fix some sigmoid extreme (#8238)
* fix some sigmoid extreme

quite brittle... the problem is it has 3 terms and mul might have bad order

* test_tanh_extreme

* just sigmoid gradient
2024-12-14 14:37:06 -05:00
George Hotz
e2f87ecf36 start work on new gradient (#7838)
* start work on new gradient

* more correct

* working tests

* more tests

* work

* add (faliing) gradient test

* add view and reduce gradient

* test_add works, many failing test_ops

* add max and reduce max

* add max and reduce max

* 129 failing

* 108 failed

* better view drawing

* 101 failed

* i got 99 failures

* 94 failures

* it's tons of terrible code, but only 50 tests fail

* only 19 failures

* same 19 but shorter

* minimal doesn't matter

* shorter

* lil simpler

* simpler

* simpler

* simpler

* 13 test failures

* nine tests fail

* all ops tests pass

* add contiguous gradient + fix sched tests

* faster by removing toposort calls

* missed one

* add jax to testing
2024-12-13 16:45:53 -08:00
chenyu
c4be1529cf update test for Tensor.softplus (#8150)
test beta and extreme inputs.
to pass big input, it needs to support `threshold`, which needs fix on backward that we punt until new gradient api
2024-12-10 17:48:02 -05:00
chenyu
286fec115e fix Tensor.minimum for int (#8145)
use invert instead of just neg. consolidate min, argmin, and minimum

also update maximum to not apply the mid point for int
2024-12-10 13:34:41 -05:00
chenyu
917deb88a4 make //0 return 0 in python_alu (#8131)
on master it raises because it cannot truncate inf to int, which crashes valid expression like `(t > 0).where(1//t, t)`.
2024-12-09 19:32:06 -05:00
chenyu
358287959b fix pow of int to negative const int (#8129)
it should return in int
2024-12-09 17:20:18 -05:00
chenyu
12f7d284e0 failed test case for int pow (#8128)
also updated test_ops so that non-float compares with `assert_equal`. removed `test_multinomial` which is tested better in test_randomness
2024-12-09 16:15:09 -05:00
qazal
80de06c8b9 scheduler ops_folding from delete_lazy (#8124)
* scheduler diff from delete_lazy

* test_std_mean

* late fold copy of CONST

* clang const is fine
2024-12-10 00:36:01 +08:00
chenyu
ccf54c2375 fix argmax/min on int32 min (#8118) 2024-12-09 02:29:23 -05:00
chenyu
c814de2dd4 fix bitwise_not for signed int (#8117)
-1 is correct because 2**32-1 is not within int32 range, so in some case clang casts the whole thing into uint32
2024-12-09 02:02:51 -05:00
qazal
69e48da961 set NOOPT in test_avg_pool3d_failure (#8112)
* set NOOPT=0 in test_avg_pool3d_failure

* noopt should still pass
2024-12-08 10:48:29 -05:00
geohotstan
f8294b3bda add avg pool 3d failure test (#8105)
* add test

* try simplify test case

* add TODO comment
2024-12-07 16:34:38 -05:00
chenyu
2d321646b8 default tensors to int32 in test_ops (#8097)
torch defaults to int64 but we care more about int32 anyway. remove skipped tests due to int64 not supported
2024-12-06 20:33:36 -05:00
chenyu
d000c08f04 fix return type of Tensor.pow (#8091)
int to power of int should return int etc, it hints that we would like to have Ops.POW
2024-12-06 13:38:29 -05:00
geohotstan
0b7c44677d Fix uint8 cast underflow (#6305)
* hacky fix for cast

* only float to uint8

* limit to float -> uint8

* touchup alu cast test

* improve tests and support more float to unsigned casts

* del one repeated test

* del 1 more repeated test

* try removing expected failure test

* hmmm try 1 more

* skip tests for flakiness

* uint64 super flaky

* clean up

* grammar

* just match numpy

* why is CI numpy different from local numpy

* increase verbosity

* try

* try2

* try3

* try4

* yeah idk

* new direction

* try again

* just don't support uint32 and uint64

* done?

* oops

* comment

* documentation

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-06 10:25:03 -05:00
geohotstan
a684d72e55 add ceil_mode for avg_pool and max_pool (#7579)
* wip pool

* check CI for remove alternative implementation

* Revert "check CI for remove alternative implementation"

This reverts commit 7b1bb900e5.

* fix test

* tests tests tests

* slap a resolve on it

* fix comment

* a little simpler pool

* check CI for removal again

* Revert "check CI for removal again"

This reverts commit be798b7857.

* small

* update

* some ez tests

* english

* clean up code

* fix ruff

* how did I +25 lines?

* small clean ups

* moar clean ups

* try test_avgpool2d_failure2 in CI

* final clean up

* exclude bug fix

* avg underscore pool

* no more edge case stuff

* add better comments for explanation

* add test cases for decreasing end padding

* address feedback

* improve test coverage

* tiny more polish as we wait for lines :D

* more readable code ordering

* add to documentation

* oops

* set to False instead

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-06 08:34:14 -05:00
Ahmed Harmouche
13eedd373b Run WebGPU tests on ubuntu (#8033) 2024-12-04 12:42:04 +01:00
Ahmed Harmouche
db330a3110 Remove WebGL (#8012) 2024-12-03 16:02:53 +01:00
geohotstan
0a2e10be1d add SELU to Tensor (#7993)
* add selu

* more clean ups
2024-12-02 10:04:01 -05:00
geohotstan
765096fe7d fix Tensor._pool edge case (#7581)
* split into another branch

* polish

* try this

* Revert "try this"

This reverts commit 84f711b13e.

* try

* Revert "try"

This reverts commit 89c7a7649b.

* idk anymore

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-28 23:17:13 -05:00
geohotstan
cea5853cfa add Tensor.scatter (#7737)
* working I think

* where are my onnx scatter tests??

* forward_only for now

* try if nan hack fix NV

* looks like issue is different... CUDA WHY

* oops that was wrong. Try if this fixes CUDA

* simpler multiply

* actually finish this up tmrw morning :x

* fix tests?

* improve tests

* improve test and implementation

* fix ruff

* complete but lots of expected failure...

* reviewed tests

* add onnx tests

* is this a processing op?

* add return type to indicate that it's not in-place

* final cleanups

* use or and improve tests a little

* add masked_index_select

* call it masked_setitem instead

* try

* FIXED

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-27 10:52:04 -05:00
geohotstan
753f07e193 add circular pad mode to Tensor.pad (#7918)
* start

* send it

* no more neg circular pads

* quick fix onnx too

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-27 10:30:51 -05:00
Ahmed Harmouche
10618aba98 Bring back WebGPU (#7063)
* Start from andredaprato:webgpu-clean

* Fix infs

* inf wgsl function is not needed

* Emulated ulong for threefry, more tests passing

* Randomness tests passing

* Update model export to support new changes in webgpu, efficientnet export works again

* Simplify shift emulation in wgsl

* Delete test file

* Fix bigger than u32 u32 literal

* Why was skip copies added here?

* Python3.12 for webgpu tests

* Fix model export syntax error

* Get test ops passing with some skips

* Fix lint

* Much simpler shift

* Run more tests

* Timestamp queries are not supported in CI, so skip search tests

* All fancy indexing passing

* r is ctx

* Run more dtype tests by using is_dtype_supported

* Cleanup ulong shift rendering

* UPat -> Pat, UOps -> Ops

* Pat -> UPat

* Refactor render_ushift if-else

* Pattern to avoid ulong mul

* Remove vals_dtype

* is_nan trick + rewrite, test_isnan passing

* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)

* No arg, just op

* Support char, uchar, short, ushort

* Run test_index_mnis now that we have uint8

* Fix pyling

* Save 3 lines by using base Compiler

* No more long emulation

* Remove fixup_binops

* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm

* Simpler, faster copyin/out

* Skip some new tests that use long

* Fix typo

* copyout touchup

* Save lines by using render_cast

* WebGL is not supported in core, delete it from is_dtype_supported

* More narrow test skips for some unary tests

* TernaryOps, UnaryOps -> Ops

* TinyGrad supports WebGPU

* StableDiffusion demo: f16tof32 gpu is a lib, update UI

* Packed load/store, no more scale_size, no core tinygrad changes

* Rename copyin, copyout

* Device -> dev

* Fix lint

* Pattern matcher rule for packed load/store

* Refactor

* Shorter packed load/store

* this should fix lint

* Fix mypy

* SD compile script working

* New SD webgpu UI

* New default prompt

* New SD weights

* Fix title when webgpu not available

* Run symbolic tests, simplify is_nan, use round_up

* Show step time on UI

* Bump minimum wgpu version to v0.19

* Fix latent

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-26 12:26:40 +08:00
chenyu
3b26e51fce Tensor.cummax (#7854)
generalized the existing cumsum and take Ops.MAX in addition to Ops.ADD
2024-11-22 15:55:02 -05:00
geohotstan
cf1ec90ad4 add inverse trig functions to Tensor (#7805)
* implement inverse trig functions

* guess we should still test nans?

* magnitude as variable name :D

* reorder onnx_ops ops

* approximation -> x for consistency

* address feedback

* simpler acos

* improvement?

* actually just have asin depend on atan

* actually this is nicer

* remove a comment

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-21 09:13:36 -05:00
geohotstan
66a069ee25 add replicate mode to Tensor.pad (#7802)
* base implementation

* add tests

* actually remove the assertionerror test

* good
2024-11-20 08:39:58 -05:00
geohotstan
8100109c9d Add replicate mode to Tensor.pad (#7608)
* base implementation

* add tests

* actually remove the assertionerror test

* actually only have reflect for this pr

* change the 4 if-else one liner

* maybe use a lambda

* fix

* maybe a lil cleaner

* fix tests

* complete

* small change

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-18 10:55:38 -05:00
chenyu
df817297b6 fix passing acc_dtype="" to Tensor.prod should fail (#7750)
similar to sum
2024-11-17 11:38:13 -05:00
chenyu
55707fd00d fix passing sum_acc_dtype="" to Tensor.sum should fail (#7748) 2024-11-17 10:58:41 -05:00
chenyu
a15a900415 fix Tensor.meshgrid for 1D input and check indexing (#7740) 2024-11-16 23:39:30 -05:00
geohotstan
72a41095bc add Tensor.meshgrid (#7714)
* initial implementation and test

* some other places that can use meshgrid

* revert the onnx_ops change

* add to docs

* revert interpolate too

* update

* improve edge case test

* might as well test grad

* add to test can improve docs

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-16 23:06:47 -05:00
chenyu
f1efd84c92 fix repeat_interleave with negative dim (#7734) 2024-11-16 10:15:29 -05:00
chenyu
22da31b223 clean up Tensor.dot (#7728)
more docs (similar to numpy) and removed many confusing  `-min(n2, 2)`
2024-11-15 18:21:15 -05:00
chenyu
4338c450ac fix max_pool2d for int tensor with padding (#7726)
padding inf messed output dtype
2024-11-15 16:22:11 -05:00
chenyu
9fb396f660 test_ops maxpool2d -> max_pool2d (#7696)
and avgpool2d -> avg_pool2d for better grepping the tests
2024-11-14 10:39:12 -05:00
geohotstan
f8056a74d6 combine pad2d with pad (#7677)
* I have pad2d, I have pad, uuh~, pad2dpad~

* fix some small things

* strategically placed cast hack

* fix more

* fix more more

* tests

* periods
2024-11-14 17:56:02 +08:00
chenyu
333f5f9f8b Tensor.bitwise_not (#7688)
implemented with xor in tensor for now to not add another op. also used it in Tensor.min to fix dtype int on -2**31
2024-11-13 16:31:52 -05:00
chenyu
fb933b79a6 add test case for nll_loss with input > 2D (#7685)
* failed test case for nll_loss with input > 2D

* fixed

* add more
2024-11-13 14:34:07 -05:00
geohotstan
9c41c376d3 add Tensor.nll_loss (#7683)
* move nll_loss to new branch

* make nll_loss examples practical

* self *is*

* add to docs

* small
2024-11-13 13:12:13 -05:00
chenyu
3c6fe4b79a fix Tensor.bitwise_and and Tensor.bitwise_or to support bool (#7684) 2024-11-13 13:10:39 -05:00
James
d4e4a084a1 fix: Tensor min function for unsigned ints (#7675)
* add failing tests for uint8 `min()`

* fix unsigned data type min()

* fix test data

* fix whitespace

---------

Co-authored-by: rezaarezvan <reza@rezvan.xyz>
Co-authored-by: Jamesb <experimentallearning0@gmail.com>
2024-11-13 11:04:27 -05:00
Reza Rezvan
23363dee55 Add: failing tests for uint8 min() (#7669)
* add failing tests for uint8 `min()`

* mark as expected failure
2024-11-13 22:12:53 +08:00
chenyu
c06a5a9c72 Tensor.linspace raises for dtype.bool (#7649)
also fixed an assert when passing str dtype to randint
2024-11-11 23:05:14 -05:00