Commit Graph

1207 Commits

Author SHA1 Message Date
George Hotz
b9feb1b743 fp16 support in stable diffusion 2023-08-20 05:37:21 +00:00
chenyu
ae39cf84ab Symbolic Shape JIT main PR (#1353)
* Symbolic Shape JIT

update tests

2 variables symbolic ops, adding more tests

test passing

cleanup

* more test cases

* single flag

* review update

* jit attention one piece

* realize

* symbolic_jit test for cuda

* old artifact

* works with cuda gpu but failed ci

* CUDACPU
2023-08-18 14:39:55 -07:00
wozeparrot
50decf0d45 train cifar using multigpu (#1529)
* feat: train cifar using multigpu

* feat: split eval batch across 5

* feat: cleaner allreduce

* feat: 93.88%

* feat: cleaner batch chunking from bert

* feat: cleaner grad sync

* feat: tinygrad argmax

* feat: make it work with different gpu counts

* feat: move some stuff into the normal __init__

* feat: autodetect gpu count

* feat: move import inside
2023-08-18 09:35:44 -07:00
wozeparrot
55d95d1658 llama 70b (#1558)
* feat: llama 70b

* feat: llama 70b but simpler
2023-08-16 11:36:12 -07:00
JaSpa99
2fd7004980 Implementation of SoftVC VITS SVC model (#1371)
* [WIP]: implementation of SoftVC VITS SVC model

* fix typo

* fix whitespace

* Fully implement Generator & Synthesizer

- implement SineGen & SourceHnNSF to reconstruct source signal from F0
- source signal is added during Generator
- fix various typos
- start loading state dict for synthesizer

* Load Synthesizer weights

- Fix typos in Synthesizer
- Slightly modify vits::load_checkpoint to skip a specified layer
- Test with Saul Goodman model because Drake weights are on mega

* start work on ContentVec

- implement ConvFeatureExtractionModel for ContentVec
- start work on TransformerEncoder for ContentVec:
- this transformer probably needs its own MultiheadAttention implementation
- fix various typos in synthesizer
- add helpers to mask behavior of ~ and % operator of torch

* use normal and kaiming_normal

* Implement ContentVec

- load ContentVec weights and config from fairseq hyperparams
- use MultiHeadAttention from whisper.py
- TransformerSentenceEncoderLayer might still need some tweaking, will see during inference testing
- redid tilde()
- some cleanup

* rename the file so it can be imported

* forgot to lint

* use float() instead of cast()

* add contentvec256l9 and cleanup

* Implement SoVITS fully and run it

- Fully run sovits with .wav file
- Drake weights need to be manually downloaded for now
- Fix bugs
- Add examples/sovits_helpers
- Big TODO: INVALID Kernel for recordings > 4.5 secs

* temp fix for longer audio recordings

* Upsample no more torch

* cleanup & detailed inference time measuring

* Completely remove torch(audio)

- Implement sinc resample in tinygrad
- Load audio via Soundfile
- Some cleanups

* move stuff to helper files

* Cleanup

* fix invalid kernel

* Cleanup & add more models

* Metal sounds good after master merge

- But Synthesizer pass became much slower

* drake weights now marked save

* do load/store in numpy

* no commas needed here

* remove extra newline

* call Tensor::where on object

* use Tensor::cat instead of numpy

* pull out first iteration

* remove Sequential, Dropout, GELU, TransposeLast

* cast during loading

* clean up attention

* remove SamePad

* Major cleanup / line reduction

- Finish implementation of GroupNormMasked
- Simplify parts of TransformerEncoder
- Simplify parts of Generator
- Move all helpers to common section
- Only use repeat_expand_left for interp after SpeechEncoder
- Moved SVC-specfic ContentVec impls up (canonically)
- Proper annotations for get_encoder
- Finished all TODOs
- Squashed some whitespaces

* clean up preprocess as well

* more straightforward bool expr

* add demo mode
2023-08-13 19:43:23 -07:00
David Heidelberg
13659ac6fa examples: numpy() array returns only one value, not an array (#1534)
Fixes issue:
```
    loss_cpu = loss.detach().numpy()[0]
               ~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
```

Signed-off-by: David Heidelberg <david@ixit.cz>
2023-08-13 14:33:05 -07:00
George Hotz
47f18f4d60 [New] SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1516) (#1518)
* Refactor AttnBlock, CrossAttention, CLIPAttention to share code

* Reshape and transpose in loop

* Bugfix on attention mask

Co-authored-by: Jacky Lee <39754370+jla524@users.noreply.github.com>
2023-08-10 15:04:18 -07:00
George Hotz
e3c6c0c6db add GPT2 example (#1511) (#1514)
* add gpt2 to examples

* some cleanup

* fixes

* argparse + scaled_dot_product_attention

* add timing

* add to benchmark

Co-authored-by: YassineYousfi <yassine.y10@gmail.com>
2023-08-10 09:09:47 -07:00
George Hotz
c82bd59b85 Revert "SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1513)" (#1515)
This reverts commit 85e02311a2.
2023-08-10 09:08:51 -07:00
Jacky Lee
85e02311a2 SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1513)
* Refactor AttnBlock, CrossAttention, CLIPAttention to share code

* Reshape and transpose in loop
2023-08-10 08:52:33 -07:00
Jacky Lee
ef5f648e2f Tensor.scaled_dot_product_attention to match torch, used in LLaMA, and tested (#1502)
* Implement scaled_dot_product_attention and test

* Support attn_mask

* Support is_causal too

* Use in llama

* Don't forget to reshape

* Set requires_grad=False for causal

* Remove staticmethod

* Remove extra spaces
2023-08-08 23:27:13 -07:00
chenyu
827d13e64e correct patch JIT llama chat (#1500) 2023-08-08 19:52:09 -04:00
chenyu
0415a48cfc patch JIT llama chat mode (#1496) 2023-08-08 15:15:56 -07:00
Yixiang Gao
6480a1a180 CIFAR 94.03% (#1340)
* add disk_tensor

* fix jit

* new baseline before whitening

* whitening through torch

* whiting done currently at 91.65%

* 91.99%

* clean up mixup and 92.3%

* clean up 92.30%

* 92.49% before searching for new hyper-parameters

* fix CI

* fix white space

* add whitening init in test

* refactor, update hyperpara, 92.72%

* converting whiting to tinygrad operation

* update CI kernels count for CIFAR

* add pad reflect

* add random crop 92.53%

* update hyperpara 93%

* 93.15% on docker container, need to refactor the assignment for hyper param

* print out weights and bias to be separated

* bias/non-bias params separated

* fix whitespace

* clean up

* refactor hyper-param with dict

* refactor lr schedular params

* fix whitespace

* fix cross entropy loss

* fix whitespace

* move opt hyp to hyp dict

* minor fixup

* adjust model, loss scaling

* 92.74% while using half of compute as before

* update hyp for cutmix

* random shuffle during batches

* clean up

* updating the model

* update ConvGroup

* disable gradients for batchnorm layer weights

* whitespace

* 93.92%

* clean up

* finally 94%git add .!

* rewrite whitening to remove dependency on torch

* whitespace

* remove dependency on torch, 93.91%

* back to 94.03%

* clean up

* update test_real_world
2023-08-08 15:13:24 -07:00
George Hotz
d78fb8f4ed add stable diffusion and llama (#1471)
* add stable diffusion and llama

* pretty in CI

* was CI not true

* that

* CI=true, wtf

* pythonpath

* debug=1

* oops, wrong place

* uops test broken for wgpu

* wgpu tests flaky
2023-08-06 21:31:51 -07:00
George Hotz
67781fcf5d fix fail fast in CI 2023-08-05 10:24:24 -07:00
Felix
97a6029cf7 Corrected a few misspelled words (#1435) 2023-08-04 16:51:08 -07:00
ian
c08ed1949f Fix plt output comment (#1428) 2023-08-03 23:35:52 -07:00
Paolo Gavazzi
9ffa1eb7e2 Removed dep of torch, torchaudio, kept librosa only (#1264) 2023-08-02 13:52:04 -04:00
Diogo
4dc8595069 simple exporting models (#1344)
* unified exporting

* json exporting

* ignore more

* simplified buffer export

* added dtypes

* added assert

* swift example

* fix tests

* linter

* remove whitespace

* fixed tests

* remove swift example

* remove unintended changes

* allow callable models to be used

* whitespace

* more readable json export

* name change

* whitespace

* whitespace
2023-08-01 09:35:48 -07:00
George Hotz
f27df835a6 delete dead stuff (#1382)
* delete bpe from repo

* remove yolo examples

* Revert "remove yolo examples"

This reverts commit cd1f49d466.

* no windows
2023-07-31 11:17:49 -07:00
George Hotz
37fa7e96fb Revert "update editorconfig, enforce via CI (#1343)" (#1380)
This reverts commit da2efecbe2.
2023-07-31 10:35:50 -07:00
Pavol Rusnak
da2efecbe2 update editorconfig, enforce via CI (#1343)
* update editorconfig to set unix-style newlines and trim whitespace

* add editorconfig github action to the CI

* fix whitespace
2023-07-30 18:44:30 -07:00
Francis Lam
9d142430cb Add option in llama.py to quantize weights to int8 at runtime (#1289)
* Add option in llama.py to quantize weights to int8 at runtime

Also added lm-eval to external

* Add support for llama-2 evaluation
2023-07-24 17:22:38 -07:00
Pavol Rusnak
cd60b8561c Add LLaMA-2 support (#1284)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2023-07-24 17:12:02 -04:00
Giles Bathgate
c4238b4ea0 Fix discriminator balancing in mnist_gan example (#1332) 2023-07-23 12:43:05 -07:00
Cole Sutyak
2d4e182294 change fetch to allow for local file selection (#1309) 2023-07-23 15:00:16 -04:00
Maxim Zakharov
48c4df1263 fix: prevent infinite "loading..." state (#1319)
* demo somewhy doesn't work on my device and throw eror "Error: GPUPipelineError: [Invalid ShaderModule] is invalid" inside setupNet func
* because of that, JS halts the execution of the rest of the code below and on the screen we see "loading..." forever
* added try catch here to communicate about the error in a proper way
2023-07-21 14:01:53 -07:00
Jacob Pradels
b112edd2c3 Add pylint trailing whitespace rule (#1314) 2023-07-21 13:37:55 -04:00
George Hotz
bfbb8d3d0f fix ones, BS=2 stable diffusion, caching optimizer (#1312)
* fix ones, BS=2 stable diffusion

* caching optimizer

* print search time

* minor bug fix
2023-07-21 09:55:49 -07:00
madt2709
d2c1e8409a Update arange to be (start, stop, step) (#1308) 2023-07-21 00:27:23 -04:00
George Hotz
f45013f0a3 stable diffusion: remove realizes we don't need 2023-07-20 19:53:07 -07:00
George Hotz
b58dd015e3 stable diffusion: remove import numpy as np 2023-07-20 19:35:44 -07:00
George Hotz
35bc46289c stable diffusion: use new tinygrad primitives 2023-07-20 19:25:49 -07:00
Stan
0a3d4f8103 Implementation of VITS TTS model (#1188)
* [WIP]: implementation of VITS TTS model

* Implemented VITS model, moved all code to examples/vits.py

* Added support for vctk model, auto download, and cleanups

* Invoke tensor.realize() before measuring inference time

* Added support for mmts-tts model, extracted TextMapper class, cleanups

* Removed IPY dep, added argument parser, cleanups

* Tiny fixes to wav writing

* Simplified the code in a few places, set diff log level for some prints

* Some refactoring, added support for uma_trilingual model (anime girls)

* Fixed bug where embeddings are loaded with same backing tensor, oops

* Added emotional embed support, added cjks + voistock models

- voistock is multilingual model with over 2k anime characters
- cjks is multilingual model with 24 speakers

both are kinda bad for english though :c

* Removed `Tensor.Training=False` (not needed and wrong oop)

* Changed default model and speaker to vctk with speaker 6

* Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy

* Removed accidentally pushed test/spline.py

* Some slight refactors

* Replaced masked_fill with tensor.where

* Added y_length estimating, plus installation instructions, plus some cleanups

* Fix overestimation log message.

* Changed default value of `--estimate_max_y_length` to False

This is only useful for larger inputs.

* Removed printing of the phonemes

* Changed default value of `--text_to_synthesize`
2023-07-20 17:37:14 -07:00
George Hotz
f7b0320d8b add cifar training regression test (#1287)
* add cifar training regression test

* clean up print
2023-07-19 14:17:09 -07:00
Francis Lam
3db57d3118 Fix llama.py to load and concatenate 13B, 30B, and 65B models (#1275) 2023-07-19 13:22:33 -04:00
Yixiang Gao
a8f2c16f8e add contiguous (#1246) 2023-07-15 08:36:34 -07:00
Diogo
a9a1df785f Webgpu support (#1077)
* initial commit

* 81 passing

* 105 passing tests

* 148 passing

* CI tests

* install dep on ci

* try opencl pkgs

* try using vulkan

* down to only 6 failing

* refactor

* cleaning up

* another test skipped due to buffer limit

* linter

* segfault

* indent fix

* another segfault found

* small touchups

* Fix max and maxpool tests

* Add constant folding

* Add javascript export script

* better asserts in codegen

* manual upcasting

* reverted token type change

* skip safetensor test due to unsupported type

* FIx efficientnet and all other model tests

* Remove np copy

* fixed indent and missing import

* manually destroy the buffer

* revert back to length

* linter errors

* removed extra val

* skip broken tests

* skipping more tests

* Make the page pretty

* Save model weights as safetensor

* Fix imagenet to c test

* Fix second imagenet to c bug

* Async and paralel kernel compilation

* workgroup support

* reversed local size

* fixed non local bug

* correct local groups

* ci experiment

* removed typo

* Fix define local by using shared memory

* Refactor

* try running on mac

* match metal tests

* add more workers

* scope down tests

* trying windows runner

* fixed windows env

* see how many it can do

* merged master

* refactor

* missed refactor

* increase test suite coverage

* missing import

* whitespace in test_efficientnet.py

* getting there

* fixed reset

* fixed bufs

* switched to cstyle

* cleanup

* min/max rename

* one more linter issue

* fixed demo

* linter

* testing ci chrome

* add unsafe webgpu arg

* add build step

* remove WEBGPU from cmd line

* use module

* try forcing directx

* trying forced metal backend

* temp disable conv2d for CI

* disable conv_trasnpose2d

---------

Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-12 12:52:06 -07:00
Hey
4f72eb823c Outdated repository URL (#1218)
* Update outdated repo url

* Update more outdated repo url's
2023-07-11 23:14:19 -07:00
AN Long
f75de602df fix typo in stable diffusion example (#1219) 2023-07-11 15:26:40 -07:00
Stan
f40f8cd055 Initialise numpy arrays as float32 in DDPG (#1171)
float64 is not supported by tinygrad
2023-07-07 12:05:31 -07:00
terafo
aa60feda48 Fix naming conflict with huggingface datasets (#1161)
* Rename in files

* Move files

* Moved to extra/datasets as suggested

* Changes to files

* Fixed stupid mistake

---------

Co-authored-by: terafo <terafo@protonmail.com>
2023-07-07 10:43:44 -07:00
Stan
9b6e57eccd helpers.py: improved test coverage + exception handling (#1165)
* Fixes + improved test coverage for helpers.py

- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread

* Made `_early_exec_process` catch any Exception

 Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`

* Fixed `from tinygrad.helpers import Timing` import

oops, for some reason my IDE cleaned that import from extra/helpers.

* Fixed import in llama.py

Another one that I skipped by accident, mybad

* Extracted a class for tests of early exec

* Normalize line endings, windows uses /r/n

* Made `cross_process` not a daemon
2023-07-07 10:26:05 -07:00
Kunwar Raj Singh
8391648822 Over 90% on CIFAR with examples/hlb_cifar10.py (#1073)
* fix eval, lr decay, best eval

* 82.27

* 82.64

* 82.79, reproducable

* add lr sched, 85.26

* 87.42

* 87.94

* 87.42

* tta with flip

* training flip aug

* refactor

* using Tensor for LR is faster

* 89.5

* refactor, flip only train set

* 90.01

* 90.64

* eval jit

* refactor

* only JIT model

* fix eval JIT

* fix eval JIT

* 90.82

* STEPS=900 reaches 90.22

* TTA envvar

* TTA default 0

* fully jit training

* refactor optim

* fix sched

* add label smoothing

* param changes

* patial gelu

* OneCycle with pause

* gelu maybe works

* 90.12

* remove pause lr

* maybe fix lr schedulers

* scheduler test passing

* comments

* try mixup

* shuffle!

* add back the missing last eval

* fix shuffle bugs

* add mixup prob

* fix mixup prob

* 90.19

* correct mixup

* correct mixup

* correct mixup

* 90.24

* 90.33

* refactor, add type hints

* add gradient clipping

* maybe fix test

* full JIT

* back to relu for now

* pass mixup prob as param

* add typehints

* maybe CI works

* try erf gelu

* CI, types

* remove useless import/

* refactor optim

* refactor optim

* try leakyrelu

* try celu

* gelu

* 90.67

* remove grad clip

* remove grad clip tests

* revert params

* add test for OneCycleLR

* 90.62

* fix eval timing

* fix eval timing again

* so where i calculate mixup_prob matters

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-06 20:46:22 -07:00
nimlgen
d363d25ee2 fix imports for examples/transformer.py (#1136) 2023-07-05 08:15:13 -07:00
George Hotz
87d21ea979 examples: simple conv bn 2023-07-04 13:50:26 -07:00
Reza Rezvan
8ae9a054ae Refactor nn.optim (#1091)
* Refactor: nn.optim.py

* Refactor: nn.optim.py; Fix all tests

* Refactor: Replace all optim.get_parameters()

* Refactor: Revert list comp.

* Refactor: Replace optim.get_state_dict

* Refactor: Change quickstart.md
2023-07-02 15:07:30 -07:00
Eli Frigo
10f1aeb144 fixed broken link (#1097) 2023-07-02 15:06:59 -07:00
nmarwell26
12ce68c1ee Renamed examples/yolo to examples/vgg7_helpers because that directory contains no yolo-related code and only helper code for vgg7. This was confusing to a new user when trying to understand the examples. (#1086) 2023-07-01 12:04:28 -07:00