Commit Graph

10633 Commits

Author SHA1 Message Date
wozeparrot
074c467020 hotfix for broken ci (#1559) 2023-08-16 13:52:03 -04:00
madt2709
962972ee68 Fix uops int32 for llvm (#1554)
* fix-uops-int32-llvm

* fix tests

* Ignore mypy error
2023-08-15 23:22:32 -07:00
Sam Barani
2cde667d40 Change Any to List[Optional[RawBuffer]] in JIT (#1553)
* Change Any to List[Optional[RawBuffer]] in JIT

* remove ignore[no-redef]

* remove ignore

* pick different names
2023-08-15 23:21:33 -07:00
nimlgen
fa81e282c2 fix missing dtypes in is_int,is_float,is_unsigned (#1550) 2023-08-15 21:22:29 -04:00
Diogo
d17ecccd78 Torch/LLVM/arm F64 support (#1551) 2023-08-15 21:21:08 -04:00
YiMing Han
913263c155 add double: c_type.double for CLANG (#1549) 2023-08-15 13:19:33 -07:00
George Hotz
0b5930d406 more uops testing, who isn't passing right now... (#1522)
* more uops

* llvm refactor

* update test uops

* rest of the nodes

* ors and ands
2023-08-15 09:07:26 -07:00
George Hotz
f8109b830c promote assembly to the main codebase (#1544)
* promote assembly to the main codebase

* not namedtuple
2023-08-14 22:47:45 -07:00
wozeparrot
666ac61070 support for p2p buffer transfers (#1523)
* feat: RawBufferTransfer

* feat: gate behind P2P

* feat: gate properly

* feat: raise error when not implemented
2023-08-14 22:39:57 -07:00
Steven Anderson
93a36c3659 Arm (#1421)
* testing new memops

* better debugging

* testing padded conv

* branching with load

* refactoring a bit

* first try

* fixing bugs

* fixing some

* eq

* eq2

* do not use x's

* working

* fixing imm

* getting things working

* refactor

* pow not working

* working except one

* refactor: one store mem

* refactor: global load

* refactor: imm

* refactor: cleaning

* fixing big offsets

* refactor with ci

* try ci

* typo

* another typo

* ubuntu default

* forgot git

* do i need git?

* missing packages

* adding python-dev

* with cache?

* buildx action

* buildx name issue?

* maybe now?

* python3

* newline warning

* maybe now

* i actually need this

* ci should work now

* improved caching

* fixing cache

* maybe now it will cache

* this

* testing cache

* trying again

* load

* missing platform

* caching gha

* testing cache

* full testing

* typo

* now?

* why

* adding checkout back

* bad formatting

* fixing convention issues

* supporting python

* adding CI flag

* testing all

* better comments

* adding debugging

* takes 12x longer

* does it output progress now?

* ignore models for speed

* fixing merge

* excluding conv_transpose2d

* only 2 test cuz is to slow

* another approach

* let's see

* faster duh

* my bad

* T_T

* typo

* sup

* with output?

* comment test

* comment test

* comment test

* :?

* no comment

* with cache

* back to normal

* testing that ci works

* back to passing

* trying again

* does it create another entry

* does it create another entry?

* build local

* hey

* Revert "excluding conv_transpose2d"

This reverts commit cc7348de03.

* does it cache if done before?

* does it cache?

* done

* adding test ops

* bad formatting

* no need for this

* working static mem

* sum 1d

* add ndim

* better reg import

* fix stack

* back to np

* working except for softmax

* 5 failing

* no pogress

* remove keystone

* remove keystone

* testops passing

* cleanups

* more cleanup

* typo

* ci

* ci2

* cond import

* ci3

* ci4

* ci4

* ci5

* ci5

* ci6

* aligment

* test all

* correct test

* err read_unmapped

* passing test

* ignore for speed

* ignore for speed

* ci7

* cleanup

* remove docker

* fixing merge

* fixing bugs

* add skipload for const ops

* comments

* First merge to master: Renderer

* fix emulation

* passing all tests arm64

* cleaning

* fix handcoded binary

* cleaning

* fix errs

* fix runtime arg binary

* clean git diff

* fix and clean

* fixing metal test

* cleaning

* fix metal test

* ci ~8 min

* fix pylint and clang

* cache the files in ops_clang

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-08-14 19:29:30 -07:00
chenyu
a89142e46f ShapeTracker.var_vals (#1540) 2023-08-14 18:53:37 -07:00
Pavol Rusnak
a453d718a1 fix file race condition in ops_clang via pid in the filename (#1541)
* fix file race condition in ops_clang via pid in the filename

as suggested in https://github.com/tinygrad/tinygrad/pull/1458/files#r1292819054

* add explanation why a temp file is required on ops_clang
2023-08-14 18:50:10 -07:00
wozeparrot
9cb2bda34f Revert "Better reshape (#1423)" (#1538) 2023-08-14 13:04:54 -04:00
Sieds Lykles
cf2bf1518d Better reshape (#1423)
* do reshaping without merge_views and reshape masks

* added tests

* properly do reshaping of zero or negative masks

* replace while loop with single expression

* remove old condition

* add more tests and comments

* remove empty file
2023-08-14 09:09:04 -07:00
YiMing Han
e00acb1eaf fix deepwalk ctx check (#1536) 2023-08-13 23:03:17 -07:00
JaSpa99
2fd7004980 Implementation of SoftVC VITS SVC model (#1371)
* [WIP]: implementation of SoftVC VITS SVC model

* fix typo

* fix whitespace

* Fully implement Generator & Synthesizer

- implement SineGen & SourceHnNSF to reconstruct source signal from F0
- source signal is added during Generator
- fix various typos
- start loading state dict for synthesizer

* Load Synthesizer weights

- Fix typos in Synthesizer
- Slightly modify vits::load_checkpoint to skip a specified layer
- Test with Saul Goodman model because Drake weights are on mega

* start work on ContentVec

- implement ConvFeatureExtractionModel for ContentVec
- start work on TransformerEncoder for ContentVec:
- this transformer probably needs its own MultiheadAttention implementation
- fix various typos in synthesizer
- add helpers to mask behavior of ~ and % operator of torch

* use normal and kaiming_normal

* Implement ContentVec

- load ContentVec weights and config from fairseq hyperparams
- use MultiHeadAttention from whisper.py
- TransformerSentenceEncoderLayer might still need some tweaking, will see during inference testing
- redid tilde()
- some cleanup

* rename the file so it can be imported

* forgot to lint

* use float() instead of cast()

* add contentvec256l9 and cleanup

* Implement SoVITS fully and run it

- Fully run sovits with .wav file
- Drake weights need to be manually downloaded for now
- Fix bugs
- Add examples/sovits_helpers
- Big TODO: INVALID Kernel for recordings > 4.5 secs

* temp fix for longer audio recordings

* Upsample no more torch

* cleanup & detailed inference time measuring

* Completely remove torch(audio)

- Implement sinc resample in tinygrad
- Load audio via Soundfile
- Some cleanups

* move stuff to helper files

* Cleanup

* fix invalid kernel

* Cleanup & add more models

* Metal sounds good after master merge

- But Synthesizer pass became much slower

* drake weights now marked save

* do load/store in numpy

* no commas needed here

* remove extra newline

* call Tensor::where on object

* use Tensor::cat instead of numpy

* pull out first iteration

* remove Sequential, Dropout, GELU, TransposeLast

* cast during loading

* clean up attention

* remove SamePad

* Major cleanup / line reduction

- Finish implementation of GroupNormMasked
- Simplify parts of TransformerEncoder
- Simplify parts of Generator
- Move all helpers to common section
- Only use repeat_expand_left for interp after SpeechEncoder
- Moved SVC-specfic ContentVec impls up (canonically)
- Proper annotations for get_encoder
- Finished all TODOs
- Squashed some whitespaces

* clean up preprocess as well

* more straightforward bool expr

* add demo mode
2023-08-13 19:43:23 -07:00
nimlgen
b6937acb7e fix casting behavior for interpreted buffers (#1525) 2023-08-13 19:21:37 -07:00
David Heidelberg
13659ac6fa examples: numpy() array returns only one value, not an array (#1534)
Fixes issue:
```
    loss_cpu = loss.detach().numpy()[0]
               ~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
```

Signed-off-by: David Heidelberg <david@ixit.cz>
2023-08-13 14:33:05 -07:00
chenyu
3e0c2d256f symbolic shapetracker (#1506)
* symbolic shapetracker

* no need

* keep only symbolic and clean up

* explicit // and % Node support

* NumNode * Node
2023-08-12 12:22:58 -07:00
Pavol Rusnak
875da762a8 fix file race condition in ops_clang (#1458) 2023-08-12 09:31:46 -07:00
JaSpa99
d3d58a37e5 Bert: use Tensor.scaled_dot_product_attention (#1528)
* use scaled attn from Tensor

* add a test for bert

* linter

* no more tokenizer

* without loading weights

* remove prints

* tribute to linter lords

* smaller input and less runs

* small bert
2023-08-12 08:46:04 -07:00
Szymon Ożóg
330fb7b1a3 Print more meaningfull hip error messages (#1530) 2023-08-12 07:16:20 -07:00
wozeparrot
29d5801387 distributed collectives (#1519)
* feat: world

* feat: tests

* feat: no more backwards

* feat: recv into

* feat: whoops

* feat: test in ci

* feat: some debug logging

* feat: workflow naming

* feat: need to set pythonpath

* feat: just send to same device

* feat: allreduce

* feat: test

* feat: need contiguous

* feat: test in ci

* feat: exit with correct code

* feat: don't need that

* feat: opencl wait_for just doesn't work

* feat: synchronize on out

* feat: try?

* feat: try again?

* feat: add extra realizes

* feat: print

* feat: seed

* feat: tol

* feat: test ones and zeros

* feat: remove print

* feat: are you just flaky

* feat: seperate scatter and gather?

* feat: just try synchronizing

* feat: remove print again

* feat: bring back difference

* feat: no sync

* feat: revert that

* feat: back to wait_for

* fix: typo
2023-08-11 10:22:07 -07:00
Jacky Lee
2e85fce068 Transformer: use Tensor.scaled_dot_product_attention (#1520) 2023-08-11 09:00:37 -07:00
George Hotz
38fe84d92b cleanup mlops (#1521)
* cleanup mlops

* that line belongs there
2023-08-10 19:53:28 -07:00
George Hotz
47f18f4d60 [New] SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1516) (#1518)
* Refactor AttnBlock, CrossAttention, CLIPAttention to share code

* Reshape and transpose in loop

* Bugfix on attention mask

Co-authored-by: Jacky Lee <39754370+jla524@users.noreply.github.com>
2023-08-10 15:04:18 -07:00
wozeparrot
7e7c9001e9 distributed world (#1481)
* feat: world

* feat: tests

* feat: no more backwards

* feat: recv into

* feat: whoops

* feat: test in ci

* feat: some debug logging

* feat: workflow naming

* feat: need to set pythonpath

* feat: just send to same device
2023-08-10 10:00:51 -07:00
George Hotz
e3c6c0c6db add GPT2 example (#1511) (#1514)
* add gpt2 to examples

* some cleanup

* fixes

* argparse + scaled_dot_product_attention

* add timing

* add to benchmark

Co-authored-by: YassineYousfi <yassine.y10@gmail.com>
2023-08-10 09:09:47 -07:00
George Hotz
c82bd59b85 Revert "SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1513)" (#1515)
This reverts commit 85e02311a2.
2023-08-10 09:08:51 -07:00
Jacky Lee
85e02311a2 SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1513)
* Refactor AttnBlock, CrossAttention, CLIPAttention to share code

* Reshape and transpose in loop
2023-08-10 08:52:33 -07:00
geohotstan
07b79f210f llvmir support for bool <-> float casting (#1492) 2023-08-09 13:12:52 -04:00
wozeparrot
351684395c dont run on fork (#1510) 2023-08-09 13:06:45 -04:00
wozeparrot
88e2e0c8a3 Revert "don't try to run benchmark on forks" (#1508) 2023-08-09 12:59:49 -04:00
wozeparrot
65b65b760b don't try to run benchmark on forks (#1507) 2023-08-09 12:59:19 -04:00
George Hotz
c417cd3c97 fast HIP gemm -> 100 TFLOPS (#1476)
* fast HIP gemm

* wmma

* correct b

* fix spilling

* 60 TFLOPS

* 64 TFLOPS

* 65 TFLOPS
2023-08-09 06:54:15 -07:00
David Hou
1766f0c0cf use ConstOp for valid.max == 0 (#1501)
* use ConstOp for valid.max == 0

* don't render valid for invalid load cache key

* Update linearizer.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-08-09 00:01:59 -07:00
Jacky Lee
ef5f648e2f Tensor.scaled_dot_product_attention to match torch, used in LLaMA, and tested (#1502)
* Implement scaled_dot_product_attention and test

* Support attn_mask

* Support is_causal too

* Use in llama

* Don't forget to reshape

* Set requires_grad=False for causal

* Remove staticmethod

* Remove extra spaces
2023-08-08 23:27:13 -07:00
nimlgen
dabfd7569a use allclose instead of equals in test_jit (#1504)
Closes #1503
2023-08-08 22:22:17 -07:00
chenyu
827d13e64e correct patch JIT llama chat (#1500) 2023-08-08 19:52:09 -04:00
Yixiang Gao
7c2ea85bb0 Raise memory limit for CIFAR test (#1499) 2023-08-08 19:40:56 -04:00
Thiago Franco de Moraes
293a10204b Add tinygrad.renderer to packages in setup.py (#1497) 2023-08-08 15:51:49 -07:00
chenyu
0415a48cfc patch JIT llama chat mode (#1496) 2023-08-08 15:15:56 -07:00
Yixiang Gao
6480a1a180 CIFAR 94.03% (#1340)
* add disk_tensor

* fix jit

* new baseline before whitening

* whitening through torch

* whiting done currently at 91.65%

* 91.99%

* clean up mixup and 92.3%

* clean up 92.30%

* 92.49% before searching for new hyper-parameters

* fix CI

* fix white space

* add whitening init in test

* refactor, update hyperpara, 92.72%

* converting whiting to tinygrad operation

* update CI kernels count for CIFAR

* add pad reflect

* add random crop 92.53%

* update hyperpara 93%

* 93.15% on docker container, need to refactor the assignment for hyper param

* print out weights and bias to be separated

* bias/non-bias params separated

* fix whitespace

* clean up

* refactor hyper-param with dict

* refactor lr schedular params

* fix whitespace

* fix cross entropy loss

* fix whitespace

* move opt hyp to hyp dict

* minor fixup

* adjust model, loss scaling

* 92.74% while using half of compute as before

* update hyp for cutmix

* random shuffle during batches

* clean up

* updating the model

* update ConvGroup

* disable gradients for batchnorm layer weights

* whitespace

* 93.92%

* clean up

* finally 94%git add .!

* rewrite whitening to remove dependency on torch

* whitespace

* remove dependency on torch, 93.91%

* back to 94.03%

* clean up

* update test_real_world
2023-08-08 15:13:24 -07:00
Roelof van Dijk
aa83a9e910 ci: fix gpuocelot build cache (#1474)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-08 14:00:04 -07:00
George Hotz
d24f936501 just cmplt (#1493)
* just cmplt

* fix maximum

* don't save, there's no backward

* ugh, no slot either

* eq is a scam
2023-08-08 13:58:10 -07:00
Roelof van Dijk
e2cf0f322e [READY] ci: missing n=auto (#1486)
* ci: missing n=auto

* fix: add to commented test

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-08 07:37:24 -07:00
Roelof van Dijk
0ce7511110 fix: is not use with a literal (#1487)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-08 07:35:30 -07:00
nimlgen
932dad1a2b fix cast bool->float in llvmir (#1480)
Closes #1479
2023-08-07 21:30:51 -07:00
nimlgen
046fd7437a use fake buffer for external_test_speed_llama.py (#1478) 2023-08-07 22:05:44 -04:00
George Hotz
5fdd248617 don't download cifar (#1472) 2023-08-06 21:38:59 -07:00