moved Sign logic to function.py, and backward always returns 0 to match torch.
rewrite abs as `self * self.sign()`, so it's backward also matches torch.
* mockgpu nv
* works
* comment that out
* fix merge
* setup gpuocelot
* install packages
* not run all of them
* passes
* fix ci
* almost
* should pass
* linter
* linter 2
* try this?
* ugn, not supported
* ci
* remove ticket from description
* better descs
* handle reshape with remainder in _reshape_mask
* remove trailing whitespce
* use helper_test_op to generate tensors from shapes
* test in shapetracket too
* remove whitespace
* revert property name in other class tests
* write llm.c and add a few new methods to tensor
* training works
* add jit
* tests for new functions
* test tolist
* simple fix for onnx test failures (#4186)
* write llm.c and add a few new methods to tensor
* training works
* add jit
* tests for new functions
* bump line count to 7500
* simplest fix
* safenumpy tolist for now
---------
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
---------
Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com>
* initial version
* heh gimme grrrreen
* version 2
* clean ups
* some test confusion
* fix onnx
* rename to _broadcast_tensors
* improved errors and test
* fixed?
* some test fixup
* version 3 lol
* comments
* cleaner
* add failure test for expand to 0 test
* 1 more assertRaises test
* make err msg better
* also rewrite the expand onnx op? :s
* Fix permutation of result indices in einsum.
* Delete stray line used for breaking tests
* Fix linter error by renaming twice-used variable
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* It works?
* Clamp correctly
* Refactor
* Make code better
* Undo some stuff
* First step to trying to make floats work
* Floats work in Python op but not metal because int div is different
Python integerdivision was implemented as // which rounds towards
negative infinity, but C integer division rounds towards 0 so there
is an off-by-1 division error
* arange does cumsum with ints and then multiplies by step
This is so loop optimization can remain int only
* Undo a lot of symbolic changes
* Final check
* Cleanup
* There can be multiple phis
* Fix multiple phi op removal
* const sets dtype correctly
* Fix bugs
* Fix a couple bugs and add loop vars to resolve
* missed one
* Don't trim too many ops
* Fix symbolic test
* Use ones instead of full
* Delete test
* Lint passes
* max node error
* Small updates to loop logic
* Remove unnecessary changes
* We are getting somewhere
* Simple case
* Fix
* rm, prn
* Better
* If NumNode doesn't work then continue
* clamp is needed for arange(256)
* Move everything into the optim fn
* Replace correctly
* Order optimizations better
* Delete
* mypy
* Test for simplification
* Rename
* Fix test
* update test description
* Undo more
* Cleanup
* No replaced_ops map
* Fix lint
* AssertionError
* back again
* Reinstate assertion
* Return true and make diff not as big
* Bigger range for test
* Change cumsum impl
* fix bug
* make big cumsum work
* lint
* Undo cumsum 2-stage removal
* No while helper
* optional min/max clamping
* floats work
* rm giant arange test
* fix python cast None
* Check phi parents
* one phi allowed per where
* Fix one phi per where
* Rework iteration
* Delete assertions
* convert to int
* Try mul -1 instead of neg for hip..?
* Remove one phi per where requirements
* one accum only
* Lint
* should simplify a loop at a time
* Don't get rid of loop explcitly
* Need to iterate backwards
* lint
* unary neg
* Make optim work for onnx and sum_pad_collapse
* Better message
* filter alu ops correctly
* Fix the limiter
* lint and simplify
* Add it back
* off by one error
* test wheres and phis
* test max ops and non-if stuff
* <=
* cast_scalar
* Oops
* Change test
* Pass loop uops instead of a modified map
* Cut param transfer between linearizer and uops
* Fix issues
* Fix lint
* fix efficientnet python 3.8 invalid syntax
* distinct vars in seen_vars
* accurate var names
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* start uop emu
* tiny_add passes
* more ops
* emulate the whole warp
* test_gemm passes
* metal gemm test pass
* works on big gemm
* works on big gemm
* more tests pass
* touch ups
* fix mypy
* cleanups
* exp2 mypy
* arch is where it belongs
* actually emulate tensor cores
* fix test
* new style
run on TORCH since it's the fastest one on CI.
caught a bug in multinomial, and update the behavior of fancy index and gather to move the indices Tensor to same device as self.
fix when correction is too big. it seems to only work when input size is 0 though.
torch can output -inf in var when correction is too big, which does not make sense.
* fix Tensor.mean to compute the mean correctly with 0-length axes are selected
* add a regression test
* rename sum variable to sum_t to avoid conflict with built it function
* refactor Tensor.mean to has less lines
* skip matacc opt if the all src buffers of mul op are const buffers
* add noqa directive for long test
* unskip MALACC opt
* ensure that a_axes at least includes summation axes in order to perform np.einsum correctly
* add regression test for mulacc op
* compute a_slices using a_axes
* refactor helper of function to retrieve axes and slices for nonzero strides as well as summation axes
* include a regression test that uses and to test the behaviour indirectly
- removed noop a=0
- fixed integer div test
- added test for both python expression and Tensor method call
- reordered for consistency and added some spaces
* init
* test: added dtype tests for maximum
* fix: seperate maximum const and maximum tensors
* fix: del useless line
* fix: some dtypes
* CODE GOLF: we golfing at mar-a-lago golf club tonight boyyyys
* fix: add lil helper function
* fix: some test refactoring
* done
* sike: not done yet lol
* wtf I missed an assert, am I drunk
* yeah idk
* fix: line save from redundant check
* revert: line save
* fix: simplify test_broadcast cuz I'm stumped
* change some test name
* fix: bool max bool works
* test: add a maximum bool test
* test: make sure minimum also works with bool
* fix: something like this? :s
* fix: maybe this?
* fix: how about this? tighter check
* fix: this.
* revert: nvm mul(0.5) and div(2) has the same kernel for backward
* fix: .is_floating_point() xD
* revert: maximum and minimum and add cast
* fix: cover negative const case in test
* fix: use eq because I don't understand clang :D
* WHOOOOPS