Commit Graph

338 Commits

Author SHA1 Message Date
chenyu
d89e3c4e08 enable METAL tests now runner is M1 and no fast-math (#3523) 2024-02-28 14:14:23 -05:00
David Friehs
2fe98b64bb fix Tensor.split not passing dim to Tensor.chunk (#3490) 2024-02-24 07:53:11 -05:00
chenyu
1eb24af63b fix softmax and log_softmax for 0d tensor (#3463)
matched torch to take axis \in [-1, 0] and used axis=None internally
2024-02-21 11:30:30 -05:00
George Hotz
871ba73e65 _reduce_op is axis based now (#3462)
* _reduce_op is axis based now

* axis_

* update lin failures

* disable that

* fix shape
2024-02-21 16:36:31 +01:00
geohotstan
5eb4c902f6 correct division dtype casting (#3405)
* 新年快乐

* fix: exclude floordiv onnx tests

* fix: less weird if statements in div

* 龙年大吉

* fix: tempfix onnx div

* fix: use reference impl for div
2024-02-15 19:34:40 -05:00
Obada Khalili
18bb6a22e0 make tensors sizes smaller in maxpool2d tests (#3417) 2024-02-15 15:53:52 +01:00
chenyu
1156a27619 cleanup atol in test_ops (#3368)
removed the explicit set value if it's the same as default 1e-6, or higher but can be set to default.
2024-02-10 19:44:44 -05:00
George Hotz
c32ea95d7d Python uop emulator (#3327)
* start uop emu

* tiny_add passes

* more ops

* emulate the whole warp

* test_gemm passes

* metal gemm test pass

* works on big gemm

* works on big gemm

* more tests pass

* touch ups

* fix mypy

* cleanups

* exp2 mypy

* arch is where it belongs

* actually emulate tensor cores

* fix test

* new style
2024-02-08 19:24:55 +01:00
chenyu
b110c4a7b8 explicitly set input low and high in test_ops (#3347)
easier to set `(low, high)` than figuring out a,b for `(x+a)*b`. this pr kept the same input ranges
2024-02-08 04:11:45 -05:00
chenyu
0d2dacb549 test intermediate tensors created by function have same device as input (#3338)
run on TORCH since it's the fastest one on CI.
caught a bug in multinomial, and update the behavior of fancy index and gather to move the indices Tensor to same device as self.
2024-02-07 09:24:36 -05:00
chenyu
ca66be6a70 add failed Tensor.pow test cases (#3334)
tried refactoring pow and found some bugs
2024-02-07 04:28:24 -05:00
chenyu
d9ef8e25b3 fix Tensor.var with 0 in reduce dim. (#3324)
fix when correction is too big. it seems to only work when input size is 0 though.
torch can output -inf in var when correction is too big, which does not make sense.
2024-02-05 20:59:13 -05:00
Obada Khalili
ee25f73283 Fix Tensor.mean to compute the mean correctly when 0-length axes are selected (#3318)
* fix Tensor.mean to compute the mean correctly with 0-length axes are selected

* add a regression test

* rename sum variable to sum_t to avoid conflict with built it function

* refactor Tensor.mean to has less lines
2024-02-05 01:40:37 -05:00
Obada Khalili
b4ea0e18e3 Fix dot product on buffers with zero strides (#3303)
* skip matacc opt if the all src buffers of mul op are const buffers

* add noqa directive for long test

* unskip MALACC opt

* ensure that a_axes at least includes summation axes in order to perform np.einsum correctly

* add regression test for mulacc op

* compute a_slices using a_axes

* refactor  helper of  function to retrieve axes and slices for nonzero strides as well as summation axes

* include a regression test that uses  and  to test the behaviour indirectly
2024-02-04 05:15:06 -05:00
chenyu
9196b11dfb test_ops sinh/cosh/asinh/acosh/atanh (#3294)
some have numerical issues at large input similar to sigmoid
2024-02-01 03:10:11 -05:00
chenyu
a3652e6ddc minor cleanups to test_ops (#3290)
- removed noop a=0
- fixed integer div test
- added test for both python expression and Tensor method call
- reordered for consistency and added some spaces
2024-01-31 19:01:25 -05:00
chenyu
7816c3b692 onnx update for trilu and argmax (#3283)
* support 0 in shape for tril and triu

* select_last_index for ArgMax and ArgMin

* pass **kwargs
2024-01-30 18:39:16 -05:00
geohotstan
d0e116c6d6 fix maximum/where Scalar casting (#3194)
* init

* test: added dtype tests for maximum

* fix: seperate maximum const and maximum tensors

* fix: del useless line

* fix: some dtypes

* CODE GOLF: we golfing at mar-a-lago golf club tonight boyyyys

* fix: add lil helper function

* fix: some test refactoring

* done

* sike: not done yet lol

* wtf I missed an assert, am I drunk

* yeah idk

* fix: line save from redundant check

* revert: line save

* fix: simplify test_broadcast cuz I'm stumped

* change some test name

* fix: bool max bool  works

* test: add a maximum bool test

* test: make sure minimum also works with bool

* fix: something like this? :s

* fix: maybe this?

* fix: how about this? tighter check

* fix: this.

* revert: nvm mul(0.5) and div(2) has the same kernel for backward

* fix: .is_floating_point() xD

* revert: maximum and minimum and add cast

* fix: cover negative const case in test

* fix: use eq because I don't understand clang :D

* WHOOOOPS
2024-01-25 12:26:04 -05:00
geohotstan
3628bea910 fix: big round even rounder round (#3242)
* fix: big round even rounder round

* fix: variable name lol

* feat: 1 less potential cast

* consistant naming (im just spaming commits now)

* LOL MISSED ONNX ANOTHER COMMIT

* test: fix test_ops and remove _round

* test: tensor methods oops
2024-01-25 12:24:15 -05:00
chenyu
da5e27968c failed test cases for Tensor.round (#3240)
it should round to even
2024-01-25 02:12:50 -05:00
chenyu
afeadbedc9 touch up Tensor.round and Tensor.neg (#3228) 2024-01-24 12:29:37 -05:00
Obada Khalili
0e103b4aa0 implement Tensor.round (#3225) 2024-01-24 11:49:17 -05:00
geohotstan
842053873d fix neg logical_not inconsistencies (#3222)
* try

* test: add logical_not tests

* gah im retarded, but this doesn't match types for const()

* fix: can't we jsut do this?

* big change: I don't actually know what I'm doing

* WOOO IM JUST CHANGING EVERYTHING WOW probably gon revert later

* BYE BYE noqa: E501

* fix: less lines and add test

* fix: rm 2 redundant tests

* fix: eq with False so we don't unintentionally implicit upcast, but it's bool anyways so w/e
2024-01-24 11:48:40 -05:00
chenyu
b9d27636aa cleanup test_ops.py (#3192)
- removed exact duplicated tests
- only kept one function if torch_fxn is the same as tinygrad_fxn
- used tensor method instead of class method style
- replaced unneeded `lamdba f: f(x)` with just `f`
- re-enabled commented tests that work now
- removed some forward_only now 0 shape tensor can backward
2024-01-20 20:08:56 -05:00
chenyu
fdb1c2b1d9 move reduce over 0 len axis logic to lazy.py (#3188)
* move reduce over 0 len axis logic to lazy.py

this fixed uneven shard reduce case if the uneven one has length 0

* fix interpreted backends

* fix backwards for 0 shape tensors too
2024-01-20 00:13:03 -05:00
geohotstan
efbe4788d1 indexing: Final cleanup (#3156)
* init

* feat: add _to_const_val to getitem

* doc: changed docs

* docs: updated more docs

* merge: improved/fancy

* better error msg, minor cleanups

* feat: added index_put to test_indexing

* clean: test_indexing

* revert: gather changes lol

* refactor: use dict for tracking tensor indexing, also asserts for type

* oooooooooops

* ugh

* will revert this commit xD

* fix: removed asserts

* improvement: made in-line if statement clearer

* improved err message and improved slice_int tests

* fix: recover accidentally deleted line

* finishing touches

* reword some docs and del torch device tests in test_indexing

* del some redundant tests

* revert: gather asserts, do it in seperate pr

* fix some data_ptr stuff

* done

* done done
2024-01-18 14:08:03 -05:00
Guy Leroy
0dba34b81c Fix backward fn for < and == (#3037)
* fix no grad fn for < and ==

* remove 2 line breaks

* Remove deprecated autograd variable

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-01-14 20:39:52 -08:00
chenyu
a313e63a9b add Tensor.var (#3114)
also updated MeanVarianceNormalization and made test_ops test tensors of var and std smaller
2024-01-14 01:11:08 -05:00
chenyu
f3a50b4e40 fix broadcasted logic if there's 0 in shapes (#3097)
* fix broadcasted logic if there's 0 in shapes

should always expand into 0, not the other way around. fixed matmul with 0 in input shapes.
for forwards for now though, backward is more involved and would need to change 0 size shortcuts

* fix tests
2024-01-12 13:32:43 -05:00
SnakeOnex
025fbf4e80 One hot in tensor.py (#3093)
* onehot in Tensor.py

* one_hot tests

* works for all shapes, not just 1

* pylint

* not a static method

* moved around, num_classes mandatory

* pylint

* pylint

* space & moving

* formatting

* moved tests
2024-01-12 13:31:18 -05:00
chenyu
55ac2a2cf7 Tensor.cat with 0 shape tensors (#3062)
* Tensor.cat with 0 shape tensors

supported both 0 in cat axis (for a subset of input), or 0 in non-cat axis (all needs to be 0)

* no shp
2024-01-09 16:54:06 -05:00
George Hotz
c5a941d466 webgl backend in extra (#3041)
* WebGL WIP

* 84% of ops passing test

* tests passing 100%

* Cleanup, refactor

* Shave off some lines

* Work on dtypes

* TestOps at 100% again

* Efficient net shaders compile in browser webgl2

* Compile all efficientnet shaders in browser

* Create empty textures for tensor buffers

* Run program. Up next weight loading

* Exported WebGL model working

* Add tests, refactor

* Explicit cast alu for GLSL

* Fix CI tests

* WebGL efficientnet demo

* Compile and run yolov8 in browser

* Fix imports

* Simplify yolo compile

* Fix bool*bool and cast cmplt to float

* More tests

* Do std tests pass on CI?

* Skip std tests on CI

* Remove explicit_cast_alu hack, and solve it in code_for_op

* Move to new dtype-less alloc api

* Remove local size hack: optimize local_size only if device has local

* Remove glsl.py, and move content to cstyle

* dont_use_locals in opts

* Fix dtype tests

* type_map in CStyleLanguage

* Make core changes smaller, cleaner, refactor export_model and demo

* Skip pad_slice

* Simplify: render_const, render_conditional

* solve bool alu for other binops, cleaner ops_webgl

* Fix noopt hack

* Remove some skipIfs

* WebGL image hack

* type_names is a better name

* global_max

* Fix dtype import

* Fix type_names -> type_map

* Fix lint

* Remove webgpu, back to 5k lines (#3040)

* remove webgpu

* max 5000 lines

* revert those to master

* retain that cstyle

---------

Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>
2024-01-08 09:29:13 -08:00
chenyu
ef5f545fd8 add more Tensor.clip test cases (#3034)
* add more Tensor.clip test cases

add cases for same low/high and both negative etc

* case min > max
2024-01-07 13:08:59 -05:00
chenyu
138c17c094 enable argmax tests for METAL/WEBGPU in CI (#3027)
not sure why it was skipped but works now in CI
2024-01-05 21:43:00 -05:00
chenyu
520406cf3a add Tensor.unflatten and Tensor.flatten(end_dim) (#3023)
simplified cases when splitting a dim, or merge dims in predix
2024-01-05 17:55:29 -05:00
chenyu
4465ef28c5 add test_softmax to test_ops (#3020)
* add test_softmax to test_ops

somehow it was not tested

* too many buffers in softmax backward for WEBGPU
2024-01-05 11:19:49 -05:00
chenyu
ae112c9dbe fix some long lines in tests (#3006)
* fix some long lines in tests

* better
2024-01-03 23:53:33 -05:00
Kevin Herro
bd6a0c90a0 add Tensor.split (#2750)
* add Tensor.split (#2677)

* fix mypy errors

* add list support for Tensor.split

* fix ruff comments

* match tensor.split api

* simplify split and test_split

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-01-01 22:09:04 -08:00
George Hotz
a280cfe169 move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
Isalia20
8de1fc2539 Einsum space fix (#2927)
* space removal in formula and a single test to cover it

* space in torch einsum as well

* replacing spaces in a var formula to support truncating all the spaces
2023-12-24 01:23:27 -05:00
George Hotz
1765849937 new lazy, benchmark (#2878)
* lazy rewrite, try 2

* min fix tests

* pass contig test

* put broken pads back

* move that to realize

* no contig child fixes array packing

* so wrong

* now that's correct

* base children

* fix bind issues

* disable to_image_idx

* fix tests

* that failure shouldn't break other tests

* more fixes

* fix torch

* skip failing tests in CI

* 1e-7

* half is broken

* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
geohotstan
fec8e9060c Add simple fancy indexing exceptions (#2706)
* fancy indexing raise error

* updated error message

* improved error check

* oops

* fixed onnx

* oops typo

* merge

* add full_flatten

* try

* merged and updated some tests

* more cleaning

* done

* temp fix onnx

* try

* add todo in onnx_test

* reword

* gah
2023-12-19 11:23:51 -05:00
chenyu
220abcd8ff fix squeeze of 0-dim Tensor with negative dim (#2821)
if ndim=0, only accepted dim is 0, -1, None. other negative dim results in IndexError
2023-12-17 22:02:07 -05:00
chenyu
85c6250a3e support Tensor.einsum with no "->" in formula (#2807)
output is the sorted alphabets if there's no "->"
2023-12-17 00:46:24 -05:00
George Hotz
051402625e remove pushing contig + fix linearizer bug (#2798)
* remove that logic

* fix test, move LOADs

* fix repeat issue on LLVM

* with_phi
2023-12-16 09:36:31 -08:00
chenyu
765f8b05e5 TernaryOps.WHERE has vin[0] as bool and BinaryOps.CMPLT always outputs bool (#2782)
* vin[0] to where is always bool

* due to better hack

* update test

* fix test_uops
2023-12-15 14:51:51 -05:00
chenyu
81a747fc63 more test cases in test_slice_fancy_indexing_with_idx (#2751) 2023-12-13 17:52:26 -05:00
George Hotz
7e5b3e53fe changes to prep for new lazy (#2748)
* changes to prep for new lazy

* put those back
2023-12-13 10:28:22 -08:00
chenyu
aa4a0de287 simpler Tensor.pow to integer (#2746) 2023-12-13 11:39:20 -05:00
George Hotz
6d6eb9302d ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00