Commit Graph

4433 Commits

Author SHA1 Message Date
madt2709
d2c1e8409a Update arange to be (start, stop, step) (#1308) 2023-07-21 00:27:23 -04:00
George Hotz
f45013f0a3 stable diffusion: remove realizes we don't need 2023-07-20 19:53:07 -07:00
George Hotz
9dffc9ba23 Use nevergrad to optimize kernels (try 2) (#1301)
* nevergrad try 2

* touchups

* no ones

* opt fixup

* cleanups

* touchup

* make new optimizer file
2023-07-20 16:46:45 -07:00
George Hotz
50a399ffa3 real world test: relax memory 2023-07-20 14:06:22 -07:00
George Hotz
17830e25da real world tests (#1297)
* real world test

* touchup

* sync device
2023-07-20 10:50:22 -07:00
George Hotz
ca77d6cd72 bfloat16 in LLVM (enough for llama 2) (#1293)
* add bf16 support to LLVM

* bf16 read works
2023-07-19 20:18:32 -07:00
Umut Zengin
74e63fe4ee Added test_chunk and fixed (#1283) 2023-07-19 22:21:26 -04:00
George Hotz
f7b0320d8b add cifar training regression test (#1287)
* add cifar training regression test

* clean up print
2023-07-19 14:17:09 -07:00
George Hotz
45ecae1ab3 Revert "Match Torch speed for sum reduction on M1 (#1187)" (#1286)
This reverts commit 59af9b81c5.
2023-07-19 13:39:16 -07:00
chenyu
120ae74008 Enable JIT test for size 1 tensor (#1285) 2023-07-19 11:06:40 -07:00
chenyu
940b6fd21a Revert "Fix constant folding for Tensor([3]) (#1227)" (#1274)
This reverts commit ab645317c9.
2023-07-19 10:51:06 -07:00
chenyu
0aed3f73da More JIT test cases (#1280)
* More JIT test cases

* test against jit_cache directly

* remove unused
2023-07-19 10:45:43 -07:00
George Hotz
d6637623e3 torch test touchup 2023-07-19 09:37:23 -07:00
Alexander Edwards
59af9b81c5 Match Torch speed for sum reduction on M1 (#1187)
* Add additional kernel when reducing multiple dimensions at once.

* Faster for smaller inputs

* Whitespace and naming

* Cleaner, guard for Metal only, and max 1 split rather than N

* Draft of different approach

* One additional kernel call for this test (as expected)
2023-07-19 09:18:58 -07:00
Umut Zengin
fde9f0e60d Slice migrated in Eye op (#1281)
* Migrated from slice to pad and shrink, made cleaner

* Changed repeat with reshape and expand
2023-07-19 09:08:38 -07:00
chenyu
a5f5330d91 Add Fuzz Test symbolic / shapetracker to CI. (#1278)
* Fuzz test symbolic and shapetracker

This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8.

* mess again

* no tail

* test shapetracker too

* Revert mess and enable all tests

* removed leftover
2023-07-19 09:05:45 -07:00
David Hou
56ee97b37f dedup kernel args v2 (#1272)
* new version

* fix abstractions

* try remove test

* Revert "try remove test"

This reverts commit 2fc18a9f8e.

* assert_allclose

* minimize the test

* minimize the test

* minimize the test

* minimize the test

* Revert "minimize the test"

This reverts commit e0c0929596.

* Revert "minimize the test"

This reverts commit 88240551b1.

* Revert "minimize the test"

This reverts commit 78328a7ce2.

* Revert "minimize the test"

This reverts commit 989523fded.

* skip test inside body

* oops

* oops
2023-07-18 20:03:42 -07:00
Umut Zengin
fa0265b173 Fix: AssertionError Transpose/Permute when WHERE Op in LB (#1266) 2023-07-18 16:09:19 -04:00
chenyu
c96bf395df Enable JIT tests for supported devices, skip METAL and WEBGPU (#1265)
* Enable JIT test

* really test metal

* Skip some device
2023-07-18 11:40:37 -07:00
Umut Zengin
f8c539989e Re-open create cumsum speed test (#1255)
* Reduced tensor size in testing

* Update formatting test_speed_v_torch.py
2023-07-17 18:59:36 -07:00
Stan
ed472bffea Fix: negative axis in tensor.cumsum (#1261) 2023-07-17 16:16:38 -07:00
Adrian Kretz
5a8ad57163 Add WHERE ternary (or trinary?) op (#1196)
* Rename FusedOps to TernaryOps

* Support ternary broadcast

* Add where llop and mlop

* Make where op work in cstyle codegen

* Don't skip test_inf_where

* Add backward path to where op

* Use bool in cstyle codegen

* Add LLVM where op

* Add numpy where op

* Add torch where op

* Simplify where mlop

* Update documentation

* Forgot a rename

* Merged relevant changes from PR #1195 onto PR #1196

* Add test to cover changes to linearizer.ast_parse for WHERE op

Without this METAL will try to use ternary op on float4 and fail

* Make where op work in wgsl backend

* Allow ternary ops to be merged

* Make mypy happy

---------

Co-authored-by: Francis Lam <flam@alum.mit.edu>
2023-07-16 00:31:55 -07:00
Stan
872e2198fe Added nn.ConvTranspose1d (#1243)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-15 00:42:42 -07:00
Stan
264d467f2b Added tensor.squeeze and support for testing exceptions (#1241)
* WIP: `tensor.squeeze` function

* Added `test_except` param to `helper_test_op` to avoid false positives

* Extracted new method `helper_test_exception` for testing exceptions

* Made `squeeze` not throw IndexError when ndim == 0 and dim <= 0 to match PyTorch
2023-07-15 00:33:24 -07:00
Stan
a8f3b3f4ed Added test for nn.Conv1d (#1242) 2023-07-15 00:30:50 -07:00
chenyu
32be39554c Simplify symbolic.SumNode.__floordiv__ logic (#1220) 2023-07-12 12:54:12 -07:00
Diogo
a9a1df785f Webgpu support (#1077)
* initial commit

* 81 passing

* 105 passing tests

* 148 passing

* CI tests

* install dep on ci

* try opencl pkgs

* try using vulkan

* down to only 6 failing

* refactor

* cleaning up

* another test skipped due to buffer limit

* linter

* segfault

* indent fix

* another segfault found

* small touchups

* Fix max and maxpool tests

* Add constant folding

* Add javascript export script

* better asserts in codegen

* manual upcasting

* reverted token type change

* skip safetensor test due to unsupported type

* FIx efficientnet and all other model tests

* Remove np copy

* fixed indent and missing import

* manually destroy the buffer

* revert back to length

* linter errors

* removed extra val

* skip broken tests

* skipping more tests

* Make the page pretty

* Save model weights as safetensor

* Fix imagenet to c test

* Fix second imagenet to c bug

* Async and paralel kernel compilation

* workgroup support

* reversed local size

* fixed non local bug

* correct local groups

* ci experiment

* removed typo

* Fix define local by using shared memory

* Refactor

* try running on mac

* match metal tests

* add more workers

* scope down tests

* trying windows runner

* fixed windows env

* see how many it can do

* merged master

* refactor

* missed refactor

* increase test suite coverage

* missing import

* whitespace in test_efficientnet.py

* getting there

* fixed reset

* fixed bufs

* switched to cstyle

* cleanup

* min/max rename

* one more linter issue

* fixed demo

* linter

* testing ci chrome

* add unsafe webgpu arg

* add build step

* remove WEBGPU from cmd line

* use module

* try forcing directx

* trying forced metal backend

* temp disable conv2d for CI

* disable conv_trasnpose2d

---------

Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-12 12:52:06 -07:00
Yosef Frost
613bcd945d Added Test Coverage to Int32 and Make Sure Tests Succeed (#1174)
* Added test coverage for int32 in `test/test_dtype.py`

Tests for int32 include:
- testing that int32 can be converted into a numpy array
- testing that float and int64 can be cast into int32
- testing that int32 can be cast into float and int64
- testing addition, multiplication, and matrix multiplication with int32
- testing that addition, multiplication, and matrix multiplication with int32 and either float or int64 gets successfully cast into float and int64, respectively

Additional changes include testing that int8 casts into int32 and testing that float16 casts into int32

* Added type casting to the add, subtract, and divide binary operations

* Added automatic type casting when types differ to FusedOps.MULACC

I moved the match_types function back so that I could call it in einsum_mulacc where it would cast the types of the MULACC to be the same

* Added unit test for match_types and added type hints to the parameters

* Added tests for ops_cpu.match_types

* Changed ops_cpu.einsum logic to play nicely with PyTorch

Changed `tinygrad.runtime.ops_cpu.einsum_mulacc` logic to not perform type matching. Type matching was instead moved to the numpy_fxn_for_op dictionary in the ops_cpu file. Since ops_torch uses the same einsum_mulacc function, this should fix all the broken pytorch tests.

* empty commit to rerun ci

* reverting PR#1213 in attempt to fix broken test

* Removed all tests I added to see if they are causing CI issues

* Added back type matching tests

* removed type matching tests and added back int tests

* added back part of the type matching tests

* removed braking type matching tests

* empty commit for testing

* added test back but inside comment

* removed a test from the comment to see if it breaks CI

* removed another function

* more testing

* emptied test comment

* cleaned up comments

* Added optimize=True flag to einsum_mullac in cpu_ops.py

* Removed unnecessary imports from tests

* optimized match_types by removing unnecessary array copying
2023-07-12 10:29:15 -07:00
Francis Lam
df86672bd4 Fix LazyBuffer SHUFFLE_PAD_OPS to prevent invalid pad movement (#1223)
In addition to div, any ops that will generate non-zero outputs from
zero inputs need to be guarded.
2023-07-11 15:30:35 -07:00
chenyu
ab645317c9 Fix constant folding for Tensor([3]) (#1227)
* Fix constant folding for Tensor([3])

* Remove duplicated prod import

* load in the same device

* better numpy

* add constant fold shape test cases

* improve tests
2023-07-11 14:01:32 -07:00
madt2709
bb316a42af Fix pow to work with negative tensors (#1191) 2023-07-09 17:33:04 -07:00
George Hotz
43385c7dbf remove contiguous on full (#1212) 2023-07-09 17:31:15 -07:00
George Hotz
67e34b356a good stuff from tensor cores branch (#1199) 2023-07-08 16:58:26 -07:00
George Hotz
7151382364 Refactor load/store before tensor cores (#1193)
* minor cleanups

* render_const

* now that's a nice refactor

* clean up vload/vstore

* clean up render_load

* debugs there

* dumb

* err, this?

* const float4

* what's failing

* bugfix

* statement includes semicolon

* bugfix
2023-07-08 15:54:58 -07:00
fluffy χατγιρλ
628ee46627 Fix bug where Tensor.randn returns inf (#1192)
* fix randn inf bug

* add test

* more compact test

* clarify test purpose
2023-07-08 12:03:46 -07:00
George Hotz
0ad99038ef Revert "Revert "Fix ShapeTracker mismatch in LazyBuffer.fromCPU (#1156)" (#1181)" + add test
This reverts commit a374b62bfe.
2023-07-07 18:37:04 -07:00
George Hotz
a374b62bfe Revert "Fix ShapeTracker mismatch in LazyBuffer.fromCPU (#1156)" (#1181)
This reverts commit 8ff7184b1b.
2023-07-07 18:29:05 -07:00
fluffy χατγιρλ
8ff7184b1b Fix ShapeTracker mismatch in LazyBuffer.fromCPU (#1156)
* init shape tracker with strides to fix mismatch

Author:    sekstini <sekstinilol@gmail.com>

* fix whitespace

* add tests
2023-07-07 18:28:21 -07:00
Stan
69d33cab0d Fix: auto create parent dir when downloading file (#1173)
* Fix: auto create parent dir when downloading file

also removed duplicate import `os`

* Added test for auto parent dir creation when downloading file
2023-07-07 13:40:29 -07:00
terafo
aa60feda48 Fix naming conflict with huggingface datasets (#1161)
* Rename in files

* Move files

* Moved to extra/datasets as suggested

* Changes to files

* Fixed stupid mistake

---------

Co-authored-by: terafo <terafo@protonmail.com>
2023-07-07 10:43:44 -07:00
Stan
9b6e57eccd helpers.py: improved test coverage + exception handling (#1165)
* Fixes + improved test coverage for helpers.py

- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread

* Made `_early_exec_process` catch any Exception

 Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`

* Fixed `from tinygrad.helpers import Timing` import

oops, for some reason my IDE cleaned that import from extra/helpers.

* Fixed import in llama.py

Another one that I skipped by accident, mybad

* Extracted a class for tests of early exec

* Normalize line endings, windows uses /r/n

* Made `cross_process` not a daemon
2023-07-07 10:26:05 -07:00
Kunwar Raj Singh
8391648822 Over 90% on CIFAR with examples/hlb_cifar10.py (#1073)
* fix eval, lr decay, best eval

* 82.27

* 82.64

* 82.79, reproducable

* add lr sched, 85.26

* 87.42

* 87.94

* 87.42

* tta with flip

* training flip aug

* refactor

* using Tensor for LR is faster

* 89.5

* refactor, flip only train set

* 90.01

* 90.64

* eval jit

* refactor

* only JIT model

* fix eval JIT

* fix eval JIT

* 90.82

* STEPS=900 reaches 90.22

* TTA envvar

* TTA default 0

* fully jit training

* refactor optim

* fix sched

* add label smoothing

* param changes

* patial gelu

* OneCycle with pause

* gelu maybe works

* 90.12

* remove pause lr

* maybe fix lr schedulers

* scheduler test passing

* comments

* try mixup

* shuffle!

* add back the missing last eval

* fix shuffle bugs

* add mixup prob

* fix mixup prob

* 90.19

* correct mixup

* correct mixup

* correct mixup

* 90.24

* 90.33

* refactor, add type hints

* add gradient clipping

* maybe fix test

* full JIT

* back to relu for now

* pass mixup prob as param

* add typehints

* maybe CI works

* try erf gelu

* CI, types

* remove useless import/

* refactor optim

* refactor optim

* try leakyrelu

* try celu

* gelu

* 90.67

* remove grad clip

* remove grad clip tests

* revert params

* add test for OneCycleLR

* 90.62

* fix eval timing

* fix eval timing again

* so where i calculate mixup_prob matters

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-06 20:46:22 -07:00
Rayan Hatout
9975f24452 Fold expand preceding reduce if the reduction is on the same axis as the expansion (#1134)
* fold expands that precede a reduce if the reduction is on the same axis as the expansion

* add deterministic test for SIMPLIFY_SUM_RESHAPE_EXPAND_SUM optimization

* add a test case to make sure we don't fold reduce-expand-reduce on different axes
2023-07-06 13:41:05 -07:00
Eli Frigo
801564f31b Remove POW llop and add SQRT llop (#1104)
* fixed division by zero for fast operations

* made et closer to 0

* replace POW llop with SQRT

* updated mlops to swap SQRT and POW llops

* updated hlops to swap POW and SQRT

* added sqrt llop to cpu runtime

* added sqrt llop to cstyle codegen

* added POW llop to llvm ir codegen

* added SQRT llop to torch runtime

* moved pow from mlops to hlops

* found a better way to do reverse pow

* fixed indentation

* added SQRT llop to triton

* update docs to match new llops

* removed POW operator from assembly codegen

* added sqrt and rsqrt to pow hlop

* rewrote pow function in tensor.py

* Adjust tolerance

* Adjust for adamw

* Reduce for Adam too

* removed accidental leftover code

* removed all of accidental code

* added rsqrt test

* removed pow from mlops again

it was added back when resolving merge conflicts

---------

Co-authored-by: Jacky Lee <jla524@sfu.ca>
2023-07-05 18:07:58 -07:00
Reza Rezvan
d1356cac27 Fix: Jacobian tests [WIP] (#1126)
* Fix: Jacobian tests; num_jacobian either bugged or not accurate enough;

* Fix: Jacobian tests;

* Fix: Gradcheck;
2023-07-05 15:36:22 -07:00
George Hotz
793a670187 from tensor cores + lb touchup (#1127) 2023-07-04 15:45:20 -07:00
Reza Rezvan
535224ac20 Remove float64 (#1101)
* Refactor: Remove float64

* Refactor: Remove unused imports

* Refactor: Remove float64

* Refactor: Remove float64

* Refactor: Exclude float64 onnx backend

* Add: Skip jacobian and gradcheck tests;
2023-07-04 08:40:51 -07:00
Daniel Hipke
b4ce23e4b8 Make cross_process use cloudpickle (#1118)
* fix syntax issues in imagenet_download.py

* use cloudpickle in cross_process to make it work in Python 3.9+

* add cross_process test

* prevent unpickling on every function call

* add cloudpickle to setup.py

* add support for args/kwargs
2023-07-04 00:47:34 -07:00
George Hotz
c709dec8b5 gelu: weird test was broken for metal 2023-07-04 00:43:54 -07:00
George Hotz
daf8e1942f sigmoid: test large postive also and add note 2023-07-04 00:18:31 -07:00