* hacky fix for cast
* only float to uint8
* limit to float -> uint8
* touchup alu cast test
* improve tests and support more float to unsigned casts
* del one repeated test
* del 1 more repeated test
* try removing expected failure test
* hmmm try 1 more
* skip tests for flakiness
* uint64 super flaky
* clean up
* grammar
* just match numpy
* why is CI numpy different from local numpy
* increase verbosity
* try
* try2
* try3
* try4
* yeah idk
* new direction
* try again
* just don't support uint32 and uint64
* done?
* oops
* comment
* documentation
* it is what it is
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* wip pool
* check CI for remove alternative implementation
* Revert "check CI for remove alternative implementation"
This reverts commit 7b1bb900e5.
* fix test
* tests tests tests
* slap a resolve on it
* fix comment
* a little simpler pool
* check CI for removal again
* Revert "check CI for removal again"
This reverts commit be798b7857.
* small
* update
* some ez tests
* english
* clean up code
* fix ruff
* how did I +25 lines?
* small clean ups
* moar clean ups
* try test_avgpool2d_failure2 in CI
* final clean up
* exclude bug fix
* avg underscore pool
* no more edge case stuff
* add better comments for explanation
* add test cases for decreasing end padding
* address feedback
* improve test coverage
* tiny more polish as we wait for lines :D
* more readable code ordering
* add to documentation
* oops
* set to False instead
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* split into another branch
* polish
* try this
* Revert "try this"
This reverts commit 84f711b13e.
* try
* Revert "try"
This reverts commit 89c7a7649b.
* idk anymore
* it is what it is
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* working I think
* where are my onnx scatter tests??
* forward_only for now
* try if nan hack fix NV
* looks like issue is different... CUDA WHY
* oops that was wrong. Try if this fixes CUDA
* simpler multiply
* actually finish this up tmrw morning :x
* fix tests?
* improve tests
* improve test and implementation
* fix ruff
* complete but lots of expected failure...
* reviewed tests
* add onnx tests
* is this a processing op?
* add return type to indicate that it's not in-place
* final cleanups
* use or and improve tests a little
* add masked_index_select
* call it masked_setitem instead
* try
* FIXED
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* Start from andredaprato:webgpu-clean
* Fix infs
* inf wgsl function is not needed
* Emulated ulong for threefry, more tests passing
* Randomness tests passing
* Update model export to support new changes in webgpu, efficientnet export works again
* Simplify shift emulation in wgsl
* Delete test file
* Fix bigger than u32 u32 literal
* Why was skip copies added here?
* Python3.12 for webgpu tests
* Fix model export syntax error
* Get test ops passing with some skips
* Fix lint
* Much simpler shift
* Run more tests
* Timestamp queries are not supported in CI, so skip search tests
* All fancy indexing passing
* r is ctx
* Run more dtype tests by using is_dtype_supported
* Cleanup ulong shift rendering
* UPat -> Pat, UOps -> Ops
* Pat -> UPat
* Refactor render_ushift if-else
* Pattern to avoid ulong mul
* Remove vals_dtype
* is_nan trick + rewrite, test_isnan passing
* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)
* No arg, just op
* Support char, uchar, short, ushort
* Run test_index_mnis now that we have uint8
* Fix pyling
* Save 3 lines by using base Compiler
* No more long emulation
* Remove fixup_binops
* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm
* Simpler, faster copyin/out
* Skip some new tests that use long
* Fix typo
* copyout touchup
* Save lines by using render_cast
* WebGL is not supported in core, delete it from is_dtype_supported
* More narrow test skips for some unary tests
* TernaryOps, UnaryOps -> Ops
* TinyGrad supports WebGPU
* StableDiffusion demo: f16tof32 gpu is a lib, update UI
* Packed load/store, no more scale_size, no core tinygrad changes
* Rename copyin, copyout
* Device -> dev
* Fix lint
* Pattern matcher rule for packed load/store
* Refactor
* Shorter packed load/store
* this should fix lint
* Fix mypy
* SD compile script working
* New SD webgpu UI
* New default prompt
* New SD weights
* Fix title when webgpu not available
* Run symbolic tests, simplify is_nan, use round_up
* Show step time on UI
* Bump minimum wgpu version to v0.19
* Fix latent
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* implement inverse trig functions
* guess we should still test nans?
* magnitude as variable name :D
* reorder onnx_ops ops
* approximation -> x for consistency
* address feedback
* simpler acos
* improvement?
* actually just have asin depend on atan
* actually this is nicer
* remove a comment
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* base implementation
* add tests
* actually remove the assertionerror test
* actually only have reflect for this pr
* change the 4 if-else one liner
* maybe use a lambda
* fix
* maybe a lil cleaner
* fix tests
* complete
* small change
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* initial implementation and test
* some other places that can use meshgrid
* revert the onnx_ops change
* add to docs
* revert interpolate too
* update
* improve edge case test
* might as well test grad
* add to test can improve docs
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* move isinf and isnan to new branch
* sneak a roll documentation fix in
* add to docs
* update test coverage for detect_positive and detect_negative
* add types to isinf args
* move hardsigmoid to new branch
* add to test
* add NOTE to mention differing values for alpha and beta that match torch
* shift from relu6
* correct shift implementation
* or we just use relu? no more 666