* WIP: Stable diffusion WebGPU port
* Load whole model: split safetensor to avoid Chrome allocation limit
* Gitignore .DS_Store, remove debug print
* Clip tokenizer in JS
* WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS
* e2e stable diffusion flow
* Create initial random latent tensor in JS
* SD working e2e
* Log if some weights were not loaded properly
* Remove latent_tensor.npy used for debugging
* Cleanup, remove useless logs
* Improve UI
* Add progress bar
* Remove .npy files used for debugging
* Add clip tokenizer as external dependency
* Remove alphas_cumprod.js and load it from safetensors
* Refactor
* Simplify a lot
* Dedup base when limiting elementwise merge (webgpu)
* Add return type to safe_load_metadata
* Do not allow run when webgpu is not supported
* Add progress bar, refactor, fix special names
* Add option to chose from local vs huggingface weights
* lowercase tinygrad :)
* fp16 model dl, decompression client side
* Cache f16 model in browser, better progress
* Cache miss recovery
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>