* WebGPU f16 support
* Don't enable f16 yet
* dtype tests passing after bitcast fix
* Maybe all WebGPU green?
* Require shader-f16 in examples
* Minor wgsl touchup
* 1 line shorter
* Simpler
* Add transcendetal support
* log2 nan location mismatch on Vulkan
* Nan skips
* Start from andredaprato:webgpu-clean
* Fix infs
* inf wgsl function is not needed
* Emulated ulong for threefry, more tests passing
* Randomness tests passing
* Update model export to support new changes in webgpu, efficientnet export works again
* Simplify shift emulation in wgsl
* Delete test file
* Fix bigger than u32 u32 literal
* Why was skip copies added here?
* Python3.12 for webgpu tests
* Fix model export syntax error
* Get test ops passing with some skips
* Fix lint
* Much simpler shift
* Run more tests
* Timestamp queries are not supported in CI, so skip search tests
* All fancy indexing passing
* r is ctx
* Run more dtype tests by using is_dtype_supported
* Cleanup ulong shift rendering
* UPat -> Pat, UOps -> Ops
* Pat -> UPat
* Refactor render_ushift if-else
* Pattern to avoid ulong mul
* Remove vals_dtype
* is_nan trick + rewrite, test_isnan passing
* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)
* No arg, just op
* Support char, uchar, short, ushort
* Run test_index_mnis now that we have uint8
* Fix pyling
* Save 3 lines by using base Compiler
* No more long emulation
* Remove fixup_binops
* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm
* Simpler, faster copyin/out
* Skip some new tests that use long
* Fix typo
* copyout touchup
* Save lines by using render_cast
* WebGL is not supported in core, delete it from is_dtype_supported
* More narrow test skips for some unary tests
* TernaryOps, UnaryOps -> Ops
* TinyGrad supports WebGPU
* StableDiffusion demo: f16tof32 gpu is a lib, update UI
* Packed load/store, no more scale_size, no core tinygrad changes
* Rename copyin, copyout
* Device -> dev
* Fix lint
* Pattern matcher rule for packed load/store
* Refactor
* Shorter packed load/store
* this should fix lint
* Fix mypy
* SD compile script working
* New SD webgpu UI
* New default prompt
* New SD weights
* Fix title when webgpu not available
* Run symbolic tests, simplify is_nan, use round_up
* Show step time on UI
* Bump minimum wgpu version to v0.19
* Fix latent
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* add name support
* use fetch in gpt2
* remove requests from main lib, networkx also optional
* umm, keep that assert
* updates to fetch
* i love the walrus so much
* stop bundling mnist with tinygrad
* err, https
* download cache names
* add DOWNLOAD_CACHE_VERSION
* need env.
* ugh, wrong path
* replace get_child
* WIP: Stable diffusion WebGPU port
* Load whole model: split safetensor to avoid Chrome allocation limit
* Gitignore .DS_Store, remove debug print
* Clip tokenizer in JS
* WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS
* e2e stable diffusion flow
* Create initial random latent tensor in JS
* SD working e2e
* Log if some weights were not loaded properly
* Remove latent_tensor.npy used for debugging
* Cleanup, remove useless logs
* Improve UI
* Add progress bar
* Remove .npy files used for debugging
* Add clip tokenizer as external dependency
* Remove alphas_cumprod.js and load it from safetensors
* Refactor
* Simplify a lot
* Dedup base when limiting elementwise merge (webgpu)
* Add return type to safe_load_metadata
* Do not allow run when webgpu is not supported
* Add progress bar, refactor, fix special names
* Add option to chose from local vs huggingface weights
* lowercase tinygrad :)
* fp16 model dl, decompression client side
* Cache f16 model in browser, better progress
* Cache miss recovery
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* demo somewhy doesn't work on my device and throw eror "Error: GPUPipelineError: [Invalid ShaderModule] is invalid" inside setupNet func
* because of that, JS halts the execution of the rest of the code below and on the screen we see "loading..." forever
* added try catch here to communicate about the error in a proper way
* initial commit
* 81 passing
* 105 passing tests
* 148 passing
* CI tests
* install dep on ci
* try opencl pkgs
* try using vulkan
* down to only 6 failing
* refactor
* cleaning up
* another test skipped due to buffer limit
* linter
* segfault
* indent fix
* another segfault found
* small touchups
* Fix max and maxpool tests
* Add constant folding
* Add javascript export script
* better asserts in codegen
* manual upcasting
* reverted token type change
* skip safetensor test due to unsupported type
* FIx efficientnet and all other model tests
* Remove np copy
* fixed indent and missing import
* manually destroy the buffer
* revert back to length
* linter errors
* removed extra val
* skip broken tests
* skipping more tests
* Make the page pretty
* Save model weights as safetensor
* Fix imagenet to c test
* Fix second imagenet to c bug
* Async and paralel kernel compilation
* workgroup support
* reversed local size
* fixed non local bug
* correct local groups
* ci experiment
* removed typo
* Fix define local by using shared memory
* Refactor
* try running on mac
* match metal tests
* add more workers
* scope down tests
* trying windows runner
* fixed windows env
* see how many it can do
* merged master
* refactor
* missed refactor
* increase test suite coverage
* missing import
* whitespace in test_efficientnet.py
* getting there
* fixed reset
* fixed bufs
* switched to cstyle
* cleanup
* min/max rename
* one more linter issue
* fixed demo
* linter
* testing ci chrome
* add unsafe webgpu arg
* add build step
* remove WEBGPU from cmd line
* use module
* try forcing directx
* trying forced metal backend
* temp disable conv2d for CI
* disable conv_trasnpose2d
---------
Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>