* demo somewhy doesn't work on my device and throw eror "Error: GPUPipelineError: [Invalid ShaderModule] is invalid" inside setupNet func
* because of that, JS halts the execution of the rest of the code below and on the screen we see "loading..." forever
* added try catch here to communicate about the error in a proper way
* [WIP]: implementation of VITS TTS model
* Implemented VITS model, moved all code to examples/vits.py
* Added support for vctk model, auto download, and cleanups
* Invoke tensor.realize() before measuring inference time
* Added support for mmts-tts model, extracted TextMapper class, cleanups
* Removed IPY dep, added argument parser, cleanups
* Tiny fixes to wav writing
* Simplified the code in a few places, set diff log level for some prints
* Some refactoring, added support for uma_trilingual model (anime girls)
* Fixed bug where embeddings are loaded with same backing tensor, oops
* Added emotional embed support, added cjks + voistock models
- voistock is multilingual model with over 2k anime characters
- cjks is multilingual model with 24 speakers
both are kinda bad for english though :c
* Removed `Tensor.Training=False` (not needed and wrong oop)
* Changed default model and speaker to vctk with speaker 6
* Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy
* Removed accidentally pushed test/spline.py
* Some slight refactors
* Replaced masked_fill with tensor.where
* Added y_length estimating, plus installation instructions, plus some cleanups
* Fix overestimation log message.
* Changed default value of `--estimate_max_y_length` to False
This is only useful for larger inputs.
* Removed printing of the phonemes
* Changed default value of `--text_to_synthesize`
* fix CUDAProgram __init__ with DEBUG>=6 on Linux
Replace path generated in f-string by os.path.join
* import os instead of os.path.join
* move import up
* Add additional kernel when reducing multiple dimensions at once.
* Faster for smaller inputs
* Whitespace and naming
* Cleaner, guard for Metal only, and max 1 split rather than N
* Draft of different approach
* One additional kernel call for this test (as expected)
* Fuzz test symbolic and shapetracker
This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8.
* mess again
* no tail
* test shapetracker too
* Revert mess and enable all tests
* removed leftover
* new version
* fix abstractions
* try remove test
* Revert "try remove test"
This reverts commit 2fc18a9f8e.
* assert_allclose
* minimize the test
* minimize the test
* minimize the test
* minimize the test
* Revert "minimize the test"
This reverts commit e0c0929596.
* Revert "minimize the test"
This reverts commit 88240551b1.
* Revert "minimize the test"
This reverts commit 78328a7ce2.
* Revert "minimize the test"
This reverts commit 989523fded.
* skip test inside body
* oops
* oops
* Rename FusedOps to TernaryOps
* Support ternary broadcast
* Add where llop and mlop
* Make where op work in cstyle codegen
* Don't skip test_inf_where
* Add backward path to where op
* Use bool in cstyle codegen
* Add LLVM where op
* Add numpy where op
* Add torch where op
* Simplify where mlop
* Update documentation
* Forgot a rename
* Merged relevant changes from PR #1195 onto PR #1196
* Add test to cover changes to linearizer.ast_parse for WHERE op
Without this METAL will try to use ternary op on float4 and fail
* Make where op work in wgsl backend
* Allow ternary ops to be merged
* Make mypy happy
---------
Co-authored-by: Francis Lam <flam@alum.mit.edu>
* skip nvcc compile target cubin when using PTX
* actually we should generate sass for both ptx and cuda code
* Fixed formatting, should print the error anyway
* ensure subprocess.run throws exception
* fixed linting errors and checked before commit this time
* WIP: `tensor.squeeze` function
* Added `test_except` param to `helper_test_op` to avoid false positives
* Extracted new method `helper_test_exception` for testing exceptions
* Made `squeeze` not throw IndexError when ndim == 0 and dim <= 0 to match PyTorch
* initial commit
* 81 passing
* 105 passing tests
* 148 passing
* CI tests
* install dep on ci
* try opencl pkgs
* try using vulkan
* down to only 6 failing
* refactor
* cleaning up
* another test skipped due to buffer limit
* linter
* segfault
* indent fix
* another segfault found
* small touchups
* Fix max and maxpool tests
* Add constant folding
* Add javascript export script
* better asserts in codegen
* manual upcasting
* reverted token type change
* skip safetensor test due to unsupported type
* FIx efficientnet and all other model tests
* Remove np copy
* fixed indent and missing import
* manually destroy the buffer
* revert back to length
* linter errors
* removed extra val
* skip broken tests
* skipping more tests
* Make the page pretty
* Save model weights as safetensor
* Fix imagenet to c test
* Fix second imagenet to c bug
* Async and paralel kernel compilation
* workgroup support
* reversed local size
* fixed non local bug
* correct local groups
* ci experiment
* removed typo
* Fix define local by using shared memory
* Refactor
* try running on mac
* match metal tests
* add more workers
* scope down tests
* trying windows runner
* fixed windows env
* see how many it can do
* merged master
* refactor
* missed refactor
* increase test suite coverage
* missing import
* whitespace in test_efficientnet.py
* getting there
* fixed reset
* fixed bufs
* switched to cstyle
* cleanup
* min/max rename
* one more linter issue
* fixed demo
* linter
* testing ci chrome
* add unsafe webgpu arg
* add build step
* remove WEBGPU from cmd line
* use module
* try forcing directx
* trying forced metal backend
* temp disable conv2d for CI
* disable conv_trasnpose2d
---------
Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>