* flake8: Ignore frequent violations, correct infrequent ones
* Ignore some rules in test
* Reorder test ignores
* Lint test + main
* EOF indent
* Include all E71,E72 errors
* Test the failing case in CI
* Revert "Test the failing case in CI"
This reverts commit 110add0a70.
* Push to test!
This reverts commit f317532779.
* ok back to passing
This reverts commit ba5052685f.
* Prove that CI fails when formatting is incorrect.
* Fix formatting
* Remove duplicitous E117 rule
* Use flake8 config for precommit
---------
Co-authored-by: waifairer <waifairer@gmail.com>
* Fix max nan
* Adds nan check option to max function
* Calls to max can pass in "ignore_nan=True" argument
* Added max nan CI tests
* Fix max nan
* Adds nan check option to max function
* Calls to max can pass in "ignore_nan=True" argument
* Added max nan CI tests
* Turned off due to the need for granularity
* models matrix
* fix typo and install gpu deps
* install llvm deps if needed
* fix
* testops with cuda
* remove pip cache since not work
* cuda env
* install cuda deps
* maybe it will work now
* i can't read
* all tests in matrix
* trim down more
* opencl stuff in matrix
* opencl pip cache
* test split
* change cuda test exclusion
* test
* fix cuda maybe
* add models
* add more n=auto
* third thing
* fix bug
* cache pip more
* change name
* update tests
* try again cause why not
* balance
* try again...
* try apt cache for cuda
* try on gpu:
* try cuda again
* update packages step
* replace libz-dev with zlib1g-dev
* only cache cuda
* why error
* fix gpuocelot bug
* apt cache err
* apt cache to slow?
* opt and image in single runner
* add a couple n=autos
* remove test matrix
* try cuda apt cache again
* libz-dev -> zlib1g-dev
* remove -s since not supported by xdist
* the cache takes too long and doesn't work
* combine webgpu and metal tests
* combine imagenet to c and cpu tests
* torch tests with linters
* torch back by itself
* small windows clang test with torch tests
* fix a goofy windows bug
* im dumb
* bro
* clang with linters
* fix pylint error
* linter not work on windows
* try with clang again
* clang and imagenet?
* install deps
* fix
* fix quote
* clang by itself (windows too slow)
* env vars for imagenet
* cache pip for metal and webgpu tests
* try torch with metal and webgpu
* doesn't work, too long
* remove -v
* try -n=logical
* don't use logical
* revert accidental thing
* remove some prints unless CI
* fix print unless CI
* ignore speed tests for slow tests
* clang windows in matrix (ubuntu being tested in imagenet->c test)
* try manual pip cache
* fix windows pip cache path
* all manual pip cache
* fix pip cache dir for macos
* print_ci function in helpers
* CI as variable, no print_ci
* missed one
* cuda tests with docker image
* remove setup-python action for cuda
* python->python3?
* remove -s -v
* try fix pip cache
* maybe fix
* try to fix pip cache
* is this the path?
* maybe cache pip
* try again
* create wheels dir
* ?
* cuda pip deps in dockerfile
* disable pip cache for clang
* image from ghcr instead of docker hub
* why is clang like this
* fast deps
* try use different caches
* remove the fast thing
* try with lighter image
* remove setup python for cuda
* small docker and cuda fast deps
* ignore a few more tests
* cool docker thing (maybe)
* oops
* quotes
* fix docker command
* fix bug
* ignore train efficientnet test
* remove dockerfile (docker stuff takes too long)
* remove docker stuff and normal cuda
* oops
* ignore the tests for cuda
* does this work
* ignore test_train on slow backends
* add space
* llvm ignore same tests as cuda
* nvm
* ignore lr scheduler tests
* get some stats
* fix ignore bug
* remove extra '
* remove and
* ignore test for llvm
* change ignored tests and durationon all backends
* fix
* and -> or
* ignore some more cuda tests
* finally?
* does this fix it
* remove durations=0
* add some more tests to llvm
* make last pytest more readable
* fix
* don't train efficientnet on cpu
* try w/out pip cache
* pip cache seems to be generally better
* pytest file markers
* try apt fast for cuda
* use quick install for apt-fast
* apt-fast not worth
* apt-get to apt
* fix typo
* suppress warnings
* register markers
* disable debug on fuzz tests
* change marker names
* apt update and apt install in one command
* update marker names in test.yml
* webgpu pytest marker
* demo somewhy doesn't work on my device and throw eror "Error: GPUPipelineError: [Invalid ShaderModule] is invalid" inside setupNet func
* because of that, JS halts the execution of the rest of the code below and on the screen we see "loading..." forever
* added try catch here to communicate about the error in a proper way
* [WIP]: implementation of VITS TTS model
* Implemented VITS model, moved all code to examples/vits.py
* Added support for vctk model, auto download, and cleanups
* Invoke tensor.realize() before measuring inference time
* Added support for mmts-tts model, extracted TextMapper class, cleanups
* Removed IPY dep, added argument parser, cleanups
* Tiny fixes to wav writing
* Simplified the code in a few places, set diff log level for some prints
* Some refactoring, added support for uma_trilingual model (anime girls)
* Fixed bug where embeddings are loaded with same backing tensor, oops
* Added emotional embed support, added cjks + voistock models
- voistock is multilingual model with over 2k anime characters
- cjks is multilingual model with 24 speakers
both are kinda bad for english though :c
* Removed `Tensor.Training=False` (not needed and wrong oop)
* Changed default model and speaker to vctk with speaker 6
* Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy
* Removed accidentally pushed test/spline.py
* Some slight refactors
* Replaced masked_fill with tensor.where
* Added y_length estimating, plus installation instructions, plus some cleanups
* Fix overestimation log message.
* Changed default value of `--estimate_max_y_length` to False
This is only useful for larger inputs.
* Removed printing of the phonemes
* Changed default value of `--text_to_synthesize`
* fix CUDAProgram __init__ with DEBUG>=6 on Linux
Replace path generated in f-string by os.path.join
* import os instead of os.path.join
* move import up
* Add additional kernel when reducing multiple dimensions at once.
* Faster for smaller inputs
* Whitespace and naming
* Cleaner, guard for Metal only, and max 1 split rather than N
* Draft of different approach
* One additional kernel call for this test (as expected)
* Fuzz test symbolic and shapetracker
This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8.
* mess again
* no tail
* test shapetracker too
* Revert mess and enable all tests
* removed leftover
* new version
* fix abstractions
* try remove test
* Revert "try remove test"
This reverts commit 2fc18a9f8e.
* assert_allclose
* minimize the test
* minimize the test
* minimize the test
* minimize the test
* Revert "minimize the test"
This reverts commit e0c0929596.
* Revert "minimize the test"
This reverts commit 88240551b1.
* Revert "minimize the test"
This reverts commit 78328a7ce2.
* Revert "minimize the test"
This reverts commit 989523fded.
* skip test inside body
* oops
* oops
* Rename FusedOps to TernaryOps
* Support ternary broadcast
* Add where llop and mlop
* Make where op work in cstyle codegen
* Don't skip test_inf_where
* Add backward path to where op
* Use bool in cstyle codegen
* Add LLVM where op
* Add numpy where op
* Add torch where op
* Simplify where mlop
* Update documentation
* Forgot a rename
* Merged relevant changes from PR #1195 onto PR #1196
* Add test to cover changes to linearizer.ast_parse for WHERE op
Without this METAL will try to use ternary op on float4 and fail
* Make where op work in wgsl backend
* Allow ternary ops to be merged
* Make mypy happy
---------
Co-authored-by: Francis Lam <flam@alum.mit.edu>