Commit Graph

2167 Commits

Author SHA1 Message Date
waifairer
d89fb729e5 flake8 (#1323)
* flake8: Ignore frequent violations, correct infrequent ones

* Ignore some rules in test

* Reorder test ignores

* Lint test + main

* EOF indent

* Include all E71,E72 errors

* Test the failing case in CI

* Revert "Test the failing case in CI"

This reverts commit 110add0a70.

* Push to test!
This reverts commit f317532779.

* ok back to passing
This reverts commit ba5052685f.

* Prove that CI fails when formatting is incorrect.

* Fix formatting

* Remove duplicitous E117 rule

* Use flake8 config for precommit

---------

Co-authored-by: waifairer <waifairer@gmail.com>
2023-07-24 11:19:58 -04:00
wozeparrot
51173f0a48 HIP backend fixes (#1336)
* feat: hip trains cifar

* feat: test_dtype fixes
2023-07-24 08:16:57 -07:00
George Hotz
086382b64e Revert "Fix max nan (#1298)" (#1334)
This reverts commit 50774470b2.
2023-07-23 20:41:28 -07:00
uncommonSensor
50774470b2 Fix max nan (#1298)
* Fix max nan

* Adds nan check option to max function
* Calls to max can pass in "ignore_nan=True" argument
* Added max nan CI tests

* Fix max nan

* Adds nan check option to max function
* Calls to max can pass in "ignore_nan=True" argument
* Added max nan CI tests
* Turned off due to the need for granularity
2023-07-23 19:39:44 -07:00
cheeetoo
a0965ee198 CI < 5 minutes (#1252)
* models matrix

* fix typo and install gpu deps

* install llvm deps if needed

* fix

* testops with cuda

* remove pip cache since not work

* cuda env

* install cuda deps

* maybe it will work now

* i can't read

* all tests in matrix

* trim down more

* opencl stuff in matrix

* opencl pip cache

* test split

* change cuda test exclusion

* test

* fix cuda maybe

* add models

* add more n=auto

* third thing

* fix bug

* cache pip more

* change name

* update tests

* try again cause why not

* balance

* try again...

* try apt cache for cuda

* try on gpu:

* try cuda again

* update packages step

* replace libz-dev with zlib1g-dev

* only cache cuda

* why error

* fix gpuocelot bug

* apt cache err

* apt cache to slow?

* opt and image in single runner

* add a couple n=autos

* remove test matrix

* try cuda apt cache again

* libz-dev -> zlib1g-dev

* remove -s since not supported by xdist

* the cache takes too long and doesn't work

* combine webgpu and metal tests

* combine imagenet to c and cpu tests

* torch tests with linters

* torch back by itself

* small windows clang test with torch tests

* fix a goofy windows bug

* im dumb

* bro

* clang with linters

* fix pylint error

* linter not work on windows

* try with clang again

* clang and imagenet?

* install deps

* fix

* fix quote

* clang by itself (windows too slow)

* env vars for imagenet

* cache pip for metal and webgpu tests

* try torch with metal and webgpu

* doesn't work, too long

* remove -v

* try -n=logical

* don't use logical

* revert accidental thing

* remove some prints unless CI

* fix print unless CI

* ignore speed tests for slow tests

* clang windows in matrix (ubuntu being tested in imagenet->c test)

* try manual pip cache

* fix windows pip cache path

* all manual pip cache

* fix pip cache dir for macos

* print_ci function in helpers

* CI as variable, no print_ci

* missed one

* cuda tests with docker image

* remove setup-python action for cuda

* python->python3?

* remove -s -v

* try fix pip cache

* maybe fix

* try to fix pip cache

* is this the path?

* maybe cache pip

* try again

* create wheels dir

* ?

* cuda pip deps in dockerfile

* disable pip cache for clang

* image from ghcr instead of docker hub

* why is clang like this

* fast deps

* try use different caches

* remove the fast thing

* try with lighter image

* remove setup python for cuda

* small docker and cuda fast deps

* ignore a few more tests

* cool docker thing (maybe)

* oops

* quotes

* fix docker command

* fix bug

* ignore train efficientnet test

* remove dockerfile (docker stuff takes too long)

* remove docker stuff and normal cuda

* oops

* ignore the tests for cuda

* does this work

* ignore test_train on slow backends

* add space

* llvm ignore same tests as cuda

* nvm

* ignore lr scheduler tests

* get some stats

* fix ignore bug

* remove extra '

* remove and

* ignore test for llvm

* change ignored tests and durationon all backends

* fix

* and -> or

* ignore some more cuda tests

* finally?

* does this fix it

* remove durations=0

* add some more tests to llvm

* make last pytest more readable

* fix

* don't train efficientnet on cpu

* try w/out pip cache

* pip cache seems to be generally better

* pytest file markers

* try apt fast for cuda

* use quick install for apt-fast

* apt-fast not worth

* apt-get to apt

* fix typo

* suppress warnings

* register markers

* disable debug on fuzz tests

* change marker names

* apt update and apt install in one command

* update marker names in test.yml

* webgpu pytest marker
2023-07-23 13:00:56 -07:00
George Hotz
47f9d82722 test_conv: relax to 0.93 2023-07-23 12:57:29 -07:00
Giles Bathgate
c4238b4ea0 Fix discriminator balancing in mnist_gan example (#1332) 2023-07-23 12:43:05 -07:00
chenyu
aa05495620 symbolic stride (#1326) 2023-07-23 12:41:22 -07:00
Cole Sutyak
2d4e182294 change fetch to allow for local file selection (#1309) 2023-07-23 15:00:16 -04:00
waifairer
7cac5ea16c [GH-1305] Refactor test_dtypes.py to be cleaner (#1306)
Co-authored-by: waifairer <waifairer@gmail.com>
2023-07-21 18:18:02 -04:00
Maxim Zakharov
48c4df1263 fix: prevent infinite "loading..." state (#1319)
* demo somewhy doesn't work on my device and throw eror "Error: GPUPipelineError: [Invalid ShaderModule] is invalid" inside setupNet func
* because of that, JS halts the execution of the rest of the code below and on the screen we see "loading..." forever
* added try catch here to communicate about the error in a proper way
2023-07-21 14:01:53 -07:00
Jacob Pradels
b112edd2c3 Add pylint trailing whitespace rule (#1314) 2023-07-21 13:37:55 -04:00
George Hotz
bfbb8d3d0f fix ones, BS=2 stable diffusion, caching optimizer (#1312)
* fix ones, BS=2 stable diffusion

* caching optimizer

* print search time

* minor bug fix
2023-07-21 09:55:49 -07:00
George Hotz
9746f6d094 move hand coded optimizer (#1310)
* move hand coded optimizer

* llvm can optimize

* fix llvm

* save linearizer
2023-07-21 07:53:12 -07:00
madt2709
d2c1e8409a Update arange to be (start, stop, step) (#1308) 2023-07-21 00:27:23 -04:00
George Hotz
f45013f0a3 stable diffusion: remove realizes we don't need 2023-07-20 19:53:07 -07:00
George Hotz
b58dd015e3 stable diffusion: remove import numpy as np 2023-07-20 19:35:44 -07:00
George Hotz
35bc46289c stable diffusion: use new tinygrad primitives 2023-07-20 19:25:49 -07:00
Francis Lam
78a7a15753 Fix WSGL to render NaN and prevent shader compile error (#1268) 2023-07-20 18:00:33 -07:00
Stan
0a3d4f8103 Implementation of VITS TTS model (#1188)
* [WIP]: implementation of VITS TTS model

* Implemented VITS model, moved all code to examples/vits.py

* Added support for vctk model, auto download, and cleanups

* Invoke tensor.realize() before measuring inference time

* Added support for mmts-tts model, extracted TextMapper class, cleanups

* Removed IPY dep, added argument parser, cleanups

* Tiny fixes to wav writing

* Simplified the code in a few places, set diff log level for some prints

* Some refactoring, added support for uma_trilingual model (anime girls)

* Fixed bug where embeddings are loaded with same backing tensor, oops

* Added emotional embed support, added cjks + voistock models

- voistock is multilingual model with over 2k anime characters
- cjks is multilingual model with 24 speakers

both are kinda bad for english though :c

* Removed `Tensor.Training=False` (not needed and wrong oop)

* Changed default model and speaker to vctk with speaker 6

* Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy

* Removed accidentally pushed test/spline.py

* Some slight refactors

* Replaced masked_fill with tensor.where

* Added y_length estimating, plus installation instructions, plus some cleanups

* Fix overestimation log message.

* Changed default value of `--estimate_max_y_length` to False

This is only useful for larger inputs.

* Removed printing of the phonemes

* Changed default value of `--text_to_synthesize`
2023-07-20 17:37:14 -07:00
George Hotz
d963024a13 optimizer small fix: return if there's nothing to optimize 2023-07-20 16:57:30 -07:00
George Hotz
9dffc9ba23 Use nevergrad to optimize kernels (try 2) (#1301)
* nevergrad try 2

* touchups

* no ones

* opt fixup

* cleanups

* touchup

* make new optimizer file
2023-07-20 16:46:45 -07:00
Diogo
8562b5a04f fixes error when trying to convert float4 -> half4 (#1300) 2023-07-20 14:20:05 -07:00
George Hotz
50a399ffa3 real world test: relax memory 2023-07-20 14:06:22 -07:00
George Hotz
17830e25da real world tests (#1297)
* real world test

* touchup

* sync device
2023-07-20 10:50:22 -07:00
George Hotz
ca77d6cd72 bfloat16 in LLVM (enough for llama 2) (#1293)
* add bf16 support to LLVM

* bf16 read works
2023-07-19 20:18:32 -07:00
Umut Zengin
74e63fe4ee Added test_chunk and fixed (#1283) 2023-07-19 22:21:26 -04:00
George Hotz
3f2497160c strip whitespace 2023-07-19 19:01:53 -07:00
George Hotz
65fe72f10b Cleanup loadop (#1291)
* cleanup loadop

* llvm fix

* fix llvm dtype

* fix clang
2023-07-19 18:59:47 -07:00
Alexander Schlögl
e3f717f614 fix CUDAProgram __init__ with DEBUG>=6 on Linux (#1288)
* fix CUDAProgram __init__ with DEBUG>=6 on Linux

Replace path generated in f-string by os.path.join

* import os instead of os.path.join

* move import up
2023-07-19 14:36:58 -07:00
George Hotz
f7b0320d8b add cifar training regression test (#1287)
* add cifar training regression test

* clean up print
2023-07-19 14:17:09 -07:00
George Hotz
45ecae1ab3 Revert "Match Torch speed for sum reduction on M1 (#1187)" (#1286)
This reverts commit 59af9b81c5.
2023-07-19 13:39:16 -07:00
chenyu
120ae74008 Enable JIT test for size 1 tensor (#1285) 2023-07-19 11:06:40 -07:00
chenyu
940b6fd21a Revert "Fix constant folding for Tensor([3]) (#1227)" (#1274)
This reverts commit ab645317c9.
2023-07-19 10:51:06 -07:00
chenyu
0aed3f73da More JIT test cases (#1280)
* More JIT test cases

* test against jit_cache directly

* remove unused
2023-07-19 10:45:43 -07:00
Francis Lam
3db57d3118 Fix llama.py to load and concatenate 13B, 30B, and 65B models (#1275) 2023-07-19 13:22:33 -04:00
George Hotz
d6637623e3 torch test touchup 2023-07-19 09:37:23 -07:00
Alexander Edwards
59af9b81c5 Match Torch speed for sum reduction on M1 (#1187)
* Add additional kernel when reducing multiple dimensions at once.

* Faster for smaller inputs

* Whitespace and naming

* Cleaner, guard for Metal only, and max 1 split rather than N

* Draft of different approach

* One additional kernel call for this test (as expected)
2023-07-19 09:18:58 -07:00
Umut Zengin
fde9f0e60d Slice migrated in Eye op (#1281)
* Migrated from slice to pad and shrink, made cleaner

* Changed repeat with reshape and expand
2023-07-19 09:08:38 -07:00
chenyu
a5f5330d91 Add Fuzz Test symbolic / shapetracker to CI. (#1278)
* Fuzz test symbolic and shapetracker

This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8.

* mess again

* no tail

* test shapetracker too

* Revert mess and enable all tests

* removed leftover
2023-07-19 09:05:45 -07:00
David Hou
56ee97b37f dedup kernel args v2 (#1272)
* new version

* fix abstractions

* try remove test

* Revert "try remove test"

This reverts commit 2fc18a9f8e.

* assert_allclose

* minimize the test

* minimize the test

* minimize the test

* minimize the test

* Revert "minimize the test"

This reverts commit e0c0929596.

* Revert "minimize the test"

This reverts commit 88240551b1.

* Revert "minimize the test"

This reverts commit 78328a7ce2.

* Revert "minimize the test"

This reverts commit 989523fded.

* skip test inside body

* oops

* oops
2023-07-18 20:03:42 -07:00
wozeparrot
37cc33269a cl fixes for multigpu (#1276)
* feat: opencl fixes for multigpu usage

* clean: who needs this import anyways
2023-07-18 19:59:30 -07:00
Umut Zengin
fa0265b173 Fix: AssertionError Transpose/Permute when WHERE Op in LB (#1266) 2023-07-18 16:09:19 -04:00
chenyu
c96bf395df Enable JIT tests for supported devices, skip METAL and WEBGPU (#1265)
* Enable JIT test

* really test metal

* Skip some device
2023-07-18 11:40:37 -07:00
Umut Zengin
f8c539989e Re-open create cumsum speed test (#1255)
* Reduced tensor size in testing

* Update formatting test_speed_v_torch.py
2023-07-17 18:59:36 -07:00
George Hotz
ab3d281a6e Refactor MemOps (#1256)
* metal tests pass locally

* define global

* refactor DEFINE_GLOBAL

* move assembly out. it isn't tested

* fix llvm
2023-07-17 16:36:33 -07:00
Stan
ed472bffea Fix: negative axis in tensor.cumsum (#1261) 2023-07-17 16:16:38 -07:00
Oddity
64d39188ad Assembly ptx target current arch (#1250)
* updated .target to use the current arch version

* undid docstring
2023-07-17 08:45:43 -07:00
Adrian Kretz
5a8ad57163 Add WHERE ternary (or trinary?) op (#1196)
* Rename FusedOps to TernaryOps

* Support ternary broadcast

* Add where llop and mlop

* Make where op work in cstyle codegen

* Don't skip test_inf_where

* Add backward path to where op

* Use bool in cstyle codegen

* Add LLVM where op

* Add numpy where op

* Add torch where op

* Simplify where mlop

* Update documentation

* Forgot a rename

* Merged relevant changes from PR #1195 onto PR #1196

* Add test to cover changes to linearizer.ast_parse for WHERE op

Without this METAL will try to use ternary op on float4 and fail

* Make where op work in wgsl backend

* Allow ternary ops to be merged

* Make mypy happy

---------

Co-authored-by: Francis Lam <flam@alum.mit.edu>
2023-07-16 00:31:55 -07:00
Stan
91f797cd52 Moved mkdir in utils.download_file to diff line (#1249)
* Moved mkdir to diff line

.mkdir does not return the actual directory being created.

* use walrus operator to simplify
2023-07-16 00:30:46 -07:00