* testing new memops
* better debugging
* testing padded conv
* branching with load
* refactoring a bit
* first try
* fixing bugs
* fixing some
* eq
* eq2
* do not use x's
* working
* fixing imm
* getting things working
* refactor
* pow not working
* working except one
* refactor: one store mem
* refactor: global load
* refactor: imm
* refactor: cleaning
* fixing big offsets
* refactor with ci
* try ci
* typo
* another typo
* ubuntu default
* forgot git
* do i need git?
* missing packages
* adding python-dev
* with cache?
* buildx action
* buildx name issue?
* maybe now?
* python3
* newline warning
* maybe now
* i actually need this
* ci should work now
* improved caching
* fixing cache
* maybe now it will cache
* this
* testing cache
* trying again
* load
* missing platform
* caching gha
* testing cache
* full testing
* typo
* now?
* why
* adding checkout back
* bad formatting
* fixing convention issues
* supporting python
* adding CI flag
* testing all
* better comments
* adding debugging
* takes 12x longer
* does it output progress now?
* ignore models for speed
* fixing merge
* excluding conv_transpose2d
* only 2 test cuz is to slow
* another approach
* let's see
* faster duh
* my bad
* T_T
* typo
* sup
* with output?
* comment test
* comment test
* comment test
* :?
* no comment
* with cache
* back to normal
* testing that ci works
* back to passing
* trying again
* does it create another entry
* does it create another entry?
* build local
* hey
* Revert "excluding conv_transpose2d"
This reverts commit cc7348de03.
* does it cache if done before?
* does it cache?
* done
* adding test ops
* bad formatting
* no need for this
* working static mem
* sum 1d
* add ndim
* better reg import
* fix stack
* back to np
* working except for softmax
* 5 failing
* no pogress
* remove keystone
* remove keystone
* testops passing
* cleanups
* more cleanup
* typo
* ci
* ci2
* cond import
* ci3
* ci4
* ci4
* ci5
* ci5
* ci6
* aligment
* test all
* correct test
* err read_unmapped
* passing test
* ignore for speed
* ignore for speed
* ci7
* cleanup
* remove docker
* fixing merge
* fixing bugs
* add skipload for const ops
* comments
* First merge to master: Renderer
* fix emulation
* passing all tests arm64
* cleaning
* fix handcoded binary
* cleaning
* fix errs
* fix runtime arg binary
* clean git diff
* fix and clean
* fixing metal test
* cleaning
* fix metal test
* ci ~8 min
* fix pylint and clang
* cache the files in ops_clang
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* do reshaping without merge_views and reshape masks
* added tests
* properly do reshaping of zero or negative masks
* replace while loop with single expression
* remove old condition
* add more tests and comments
* remove empty file
* [WIP]: implementation of SoftVC VITS SVC model
* fix typo
* fix whitespace
* Fully implement Generator & Synthesizer
- implement SineGen & SourceHnNSF to reconstruct source signal from F0
- source signal is added during Generator
- fix various typos
- start loading state dict for synthesizer
* Load Synthesizer weights
- Fix typos in Synthesizer
- Slightly modify vits::load_checkpoint to skip a specified layer
- Test with Saul Goodman model because Drake weights are on mega
* start work on ContentVec
- implement ConvFeatureExtractionModel for ContentVec
- start work on TransformerEncoder for ContentVec:
- this transformer probably needs its own MultiheadAttention implementation
- fix various typos in synthesizer
- add helpers to mask behavior of ~ and % operator of torch
* use normal and kaiming_normal
* Implement ContentVec
- load ContentVec weights and config from fairseq hyperparams
- use MultiHeadAttention from whisper.py
- TransformerSentenceEncoderLayer might still need some tweaking, will see during inference testing
- redid tilde()
- some cleanup
* rename the file so it can be imported
* forgot to lint
* use float() instead of cast()
* add contentvec256l9 and cleanup
* Implement SoVITS fully and run it
- Fully run sovits with .wav file
- Drake weights need to be manually downloaded for now
- Fix bugs
- Add examples/sovits_helpers
- Big TODO: INVALID Kernel for recordings > 4.5 secs
* temp fix for longer audio recordings
* Upsample no more torch
* cleanup & detailed inference time measuring
* Completely remove torch(audio)
- Implement sinc resample in tinygrad
- Load audio via Soundfile
- Some cleanups
* move stuff to helper files
* Cleanup
* fix invalid kernel
* Cleanup & add more models
* Metal sounds good after master merge
- But Synthesizer pass became much slower
* drake weights now marked save
* do load/store in numpy
* no commas needed here
* remove extra newline
* call Tensor::where on object
* use Tensor::cat instead of numpy
* pull out first iteration
* remove Sequential, Dropout, GELU, TransposeLast
* cast during loading
* clean up attention
* remove SamePad
* Major cleanup / line reduction
- Finish implementation of GroupNormMasked
- Simplify parts of TransformerEncoder
- Simplify parts of Generator
- Move all helpers to common section
- Only use repeat_expand_left for interp after SpeechEncoder
- Moved SVC-specfic ContentVec impls up (canonically)
- Proper annotations for get_encoder
- Finished all TODOs
- Squashed some whitespaces
* clean up preprocess as well
* more straightforward bool expr
* add demo mode
Fixes issue:
```
loss_cpu = loss.detach().numpy()[0]
~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
```
Signed-off-by: David Heidelberg <david@ixit.cz>
* use scaled attn from Tensor
* add a test for bert
* linter
* no more tokenizer
* without loading weights
* remove prints
* tribute to linter lords
* smaller input and less runs
* small bert
* feat: world
* feat: tests
* feat: no more backwards
* feat: recv into
* feat: whoops
* feat: test in ci
* feat: some debug logging
* feat: workflow naming
* feat: need to set pythonpath
* feat: just send to same device
* feat: allreduce
* feat: test
* feat: need contiguous
* feat: test in ci
* feat: exit with correct code
* feat: don't need that
* feat: opencl wait_for just doesn't work
* feat: synchronize on out
* feat: try?
* feat: try again?
* feat: add extra realizes
* feat: print
* feat: seed
* feat: tol
* feat: test ones and zeros
* feat: remove print
* feat: are you just flaky
* feat: seperate scatter and gather?
* feat: just try synchronizing
* feat: remove print again
* feat: bring back difference
* feat: no sync
* feat: revert that
* feat: back to wait_for
* fix: typo
* Refactor AttnBlock, CrossAttention, CLIPAttention to share code
* Reshape and transpose in loop
* Bugfix on attention mask
Co-authored-by: Jacky Lee <39754370+jla524@users.noreply.github.com>
* feat: world
* feat: tests
* feat: no more backwards
* feat: recv into
* feat: whoops
* feat: test in ci
* feat: some debug logging
* feat: workflow naming
* feat: need to set pythonpath
* feat: just send to same device
* Implement scaled_dot_product_attention and test
* Support attn_mask
* Support is_causal too
* Use in llama
* Don't forget to reshape
* Set requires_grad=False for causal
* Remove staticmethod
* Remove extra spaces
* add disk_tensor
* fix jit
* new baseline before whitening
* whitening through torch
* whiting done currently at 91.65%
* 91.99%
* clean up mixup and 92.3%
* clean up 92.30%
* 92.49% before searching for new hyper-parameters
* fix CI
* fix white space
* add whitening init in test
* refactor, update hyperpara, 92.72%
* converting whiting to tinygrad operation
* update CI kernels count for CIFAR
* add pad reflect
* add random crop 92.53%
* update hyperpara 93%
* 93.15% on docker container, need to refactor the assignment for hyper param
* print out weights and bias to be separated
* bias/non-bias params separated
* fix whitespace
* clean up
* refactor hyper-param with dict
* refactor lr schedular params
* fix whitespace
* fix cross entropy loss
* fix whitespace
* move opt hyp to hyp dict
* minor fixup
* adjust model, loss scaling
* 92.74% while using half of compute as before
* update hyp for cutmix
* random shuffle during batches
* clean up
* updating the model
* update ConvGroup
* disable gradients for batchnorm layer weights
* whitespace
* 93.92%
* clean up
* finally 94%git add .!
* rewrite whitening to remove dependency on torch
* whitespace
* remove dependency on torch, 93.91%
* back to 94.03%
* clean up
* update test_real_world