* move high level stuff to unit tests [pr]
* process replay on unit tests
* fix pr, less compute
* set omp num threads
* set 200MB buffer size limit
* delete junk
* fix tests
* faster
* move test_indexing to unit
* faster
* truncate fp8
* fix
* maybe like that?
* fix linters
* ruff
* move from extra and add ml_types to tests
* minor changes
* str to dtypes and nan support
---------
Co-authored-by: pkotzbach <pawkotz@gmail.com>
* fp8s part 1
* prettier
* fixes
* fixes
* remove stuff that should be in next pr
* revert
* add creation
---------
Co-authored-by: pkotzbach <pawkotz@gmail.com>
* least_upper_float is at least default_float
en route for div rounding mode. dtype of true int division would change from int32 to default_float, which matches torch too.
* fix bert acc
* Solve dims too large errors on webgpu
* Simplify divisor find
* Test square root divisor
* Fix lint
* Refactor into group_dims and split_dims
* Refactor
* Fix lint
* Add back max check in _group_dims
* Prefer grouping over split
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* WebGPU f16 support
* Don't enable f16 yet
* dtype tests passing after bitcast fix
* Maybe all WebGPU green?
* Require shader-f16 in examples
* Minor wgsl touchup
* 1 line shorter
* Simpler
* Add transcendetal support
* log2 nan location mismatch on Vulkan
* Nan skips
* assert to prepare for grad uop [pr]
* fix test_nn
* fix most of test_tensor
* few more tests
* fix multi
* uniform gradient
* acc_dtype
* any for multi
* fix typing
* fix assert, CAST_BEFORE_VIEW is still the issue
* explict test for CAST_BEFORE_VIEW
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
* Start from andredaprato:webgpu-clean
* Fix infs
* inf wgsl function is not needed
* Emulated ulong for threefry, more tests passing
* Randomness tests passing
* Update model export to support new changes in webgpu, efficientnet export works again
* Simplify shift emulation in wgsl
* Delete test file
* Fix bigger than u32 u32 literal
* Why was skip copies added here?
* Python3.12 for webgpu tests
* Fix model export syntax error
* Get test ops passing with some skips
* Fix lint
* Much simpler shift
* Run more tests
* Timestamp queries are not supported in CI, so skip search tests
* All fancy indexing passing
* r is ctx
* Run more dtype tests by using is_dtype_supported
* Cleanup ulong shift rendering
* UPat -> Pat, UOps -> Ops
* Pat -> UPat
* Refactor render_ushift if-else
* Pattern to avoid ulong mul
* Remove vals_dtype
* is_nan trick + rewrite, test_isnan passing
* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)
* No arg, just op
* Support char, uchar, short, ushort
* Run test_index_mnis now that we have uint8
* Fix pyling
* Save 3 lines by using base Compiler
* No more long emulation
* Remove fixup_binops
* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm
* Simpler, faster copyin/out
* Skip some new tests that use long
* Fix typo
* copyout touchup
* Save lines by using render_cast
* WebGL is not supported in core, delete it from is_dtype_supported
* More narrow test skips for some unary tests
* TernaryOps, UnaryOps -> Ops
* TinyGrad supports WebGPU
* StableDiffusion demo: f16tof32 gpu is a lib, update UI
* Packed load/store, no more scale_size, no core tinygrad changes
* Rename copyin, copyout
* Device -> dev
* Fix lint
* Pattern matcher rule for packed load/store
* Refactor
* Shorter packed load/store
* this should fix lint
* Fix mypy
* SD compile script working
* New SD webgpu UI
* New default prompt
* New SD weights
* Fix title when webgpu not available
* Run symbolic tests, simplify is_nan, use round_up
* Show step time on UI
* Bump minimum wgpu version to v0.19
* Fix latent
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>