Commit Graph

2136 Commits

Author SHA1 Message Date
George Hotz
45ecae1ab3 Revert "Match Torch speed for sum reduction on M1 (#1187)" (#1286)
This reverts commit 59af9b81c5.
2023-07-19 13:39:16 -07:00
chenyu
120ae74008 Enable JIT test for size 1 tensor (#1285) 2023-07-19 11:06:40 -07:00
chenyu
940b6fd21a Revert "Fix constant folding for Tensor([3]) (#1227)" (#1274)
This reverts commit ab645317c9.
2023-07-19 10:51:06 -07:00
chenyu
0aed3f73da More JIT test cases (#1280)
* More JIT test cases

* test against jit_cache directly

* remove unused
2023-07-19 10:45:43 -07:00
Francis Lam
3db57d3118 Fix llama.py to load and concatenate 13B, 30B, and 65B models (#1275) 2023-07-19 13:22:33 -04:00
George Hotz
d6637623e3 torch test touchup 2023-07-19 09:37:23 -07:00
Alexander Edwards
59af9b81c5 Match Torch speed for sum reduction on M1 (#1187)
* Add additional kernel when reducing multiple dimensions at once.

* Faster for smaller inputs

* Whitespace and naming

* Cleaner, guard for Metal only, and max 1 split rather than N

* Draft of different approach

* One additional kernel call for this test (as expected)
2023-07-19 09:18:58 -07:00
Umut Zengin
fde9f0e60d Slice migrated in Eye op (#1281)
* Migrated from slice to pad and shrink, made cleaner

* Changed repeat with reshape and expand
2023-07-19 09:08:38 -07:00
chenyu
a5f5330d91 Add Fuzz Test symbolic / shapetracker to CI. (#1278)
* Fuzz test symbolic and shapetracker

This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8.

* mess again

* no tail

* test shapetracker too

* Revert mess and enable all tests

* removed leftover
2023-07-19 09:05:45 -07:00
David Hou
56ee97b37f dedup kernel args v2 (#1272)
* new version

* fix abstractions

* try remove test

* Revert "try remove test"

This reverts commit 2fc18a9f8e.

* assert_allclose

* minimize the test

* minimize the test

* minimize the test

* minimize the test

* Revert "minimize the test"

This reverts commit e0c0929596.

* Revert "minimize the test"

This reverts commit 88240551b1.

* Revert "minimize the test"

This reverts commit 78328a7ce2.

* Revert "minimize the test"

This reverts commit 989523fded.

* skip test inside body

* oops

* oops
2023-07-18 20:03:42 -07:00
wozeparrot
37cc33269a cl fixes for multigpu (#1276)
* feat: opencl fixes for multigpu usage

* clean: who needs this import anyways
2023-07-18 19:59:30 -07:00
Umut Zengin
fa0265b173 Fix: AssertionError Transpose/Permute when WHERE Op in LB (#1266) 2023-07-18 16:09:19 -04:00
chenyu
c96bf395df Enable JIT tests for supported devices, skip METAL and WEBGPU (#1265)
* Enable JIT test

* really test metal

* Skip some device
2023-07-18 11:40:37 -07:00
Umut Zengin
f8c539989e Re-open create cumsum speed test (#1255)
* Reduced tensor size in testing

* Update formatting test_speed_v_torch.py
2023-07-17 18:59:36 -07:00
George Hotz
ab3d281a6e Refactor MemOps (#1256)
* metal tests pass locally

* define global

* refactor DEFINE_GLOBAL

* move assembly out. it isn't tested

* fix llvm
2023-07-17 16:36:33 -07:00
Stan
ed472bffea Fix: negative axis in tensor.cumsum (#1261) 2023-07-17 16:16:38 -07:00
Oddity
64d39188ad Assembly ptx target current arch (#1250)
* updated .target to use the current arch version

* undid docstring
2023-07-17 08:45:43 -07:00
Adrian Kretz
5a8ad57163 Add WHERE ternary (or trinary?) op (#1196)
* Rename FusedOps to TernaryOps

* Support ternary broadcast

* Add where llop and mlop

* Make where op work in cstyle codegen

* Don't skip test_inf_where

* Add backward path to where op

* Use bool in cstyle codegen

* Add LLVM where op

* Add numpy where op

* Add torch where op

* Simplify where mlop

* Update documentation

* Forgot a rename

* Merged relevant changes from PR #1195 onto PR #1196

* Add test to cover changes to linearizer.ast_parse for WHERE op

Without this METAL will try to use ternary op on float4 and fail

* Make where op work in wgsl backend

* Allow ternary ops to be merged

* Make mypy happy

---------

Co-authored-by: Francis Lam <flam@alum.mit.edu>
2023-07-16 00:31:55 -07:00
Stan
91f797cd52 Moved mkdir in utils.download_file to diff line (#1249)
* Moved mkdir to diff line

.mkdir does not return the actual directory being created.

* use walrus operator to simplify
2023-07-16 00:30:46 -07:00
Yixiang Gao
a8f2c16f8e add contiguous (#1246) 2023-07-15 08:36:34 -07:00
Stan
872e2198fe Added nn.ConvTranspose1d (#1243)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-15 00:42:42 -07:00
Oddity
7399f6dad7 display sass for both cuda code and ptx (#1240)
* skip nvcc compile target cubin when using PTX

* actually we should generate sass for both ptx and cuda code

* Fixed formatting, should print the error anyway

* ensure subprocess.run throws exception

* fixed linting errors and checked before commit this time
2023-07-15 00:36:04 -07:00
Stan
264d467f2b Added tensor.squeeze and support for testing exceptions (#1241)
* WIP: `tensor.squeeze` function

* Added `test_except` param to `helper_test_op` to avoid false positives

* Extracted new method `helper_test_exception` for testing exceptions

* Made `squeeze` not throw IndexError when ndim == 0 and dim <= 0 to match PyTorch
2023-07-15 00:33:24 -07:00
Stan
a8f3b3f4ed Added test for nn.Conv1d (#1242) 2023-07-15 00:30:50 -07:00
David Hou
9c135c9450 add sqrt to ptx (#1236) 2023-07-13 07:26:11 -07:00
chenyu
32be39554c Simplify symbolic.SumNode.__floordiv__ logic (#1220) 2023-07-12 12:54:12 -07:00
Diogo
a9a1df785f Webgpu support (#1077)
* initial commit

* 81 passing

* 105 passing tests

* 148 passing

* CI tests

* install dep on ci

* try opencl pkgs

* try using vulkan

* down to only 6 failing

* refactor

* cleaning up

* another test skipped due to buffer limit

* linter

* segfault

* indent fix

* another segfault found

* small touchups

* Fix max and maxpool tests

* Add constant folding

* Add javascript export script

* better asserts in codegen

* manual upcasting

* reverted token type change

* skip safetensor test due to unsupported type

* FIx efficientnet and all other model tests

* Remove np copy

* fixed indent and missing import

* manually destroy the buffer

* revert back to length

* linter errors

* removed extra val

* skip broken tests

* skipping more tests

* Make the page pretty

* Save model weights as safetensor

* Fix imagenet to c test

* Fix second imagenet to c bug

* Async and paralel kernel compilation

* workgroup support

* reversed local size

* fixed non local bug

* correct local groups

* ci experiment

* removed typo

* Fix define local by using shared memory

* Refactor

* try running on mac

* match metal tests

* add more workers

* scope down tests

* trying windows runner

* fixed windows env

* see how many it can do

* merged master

* refactor

* missed refactor

* increase test suite coverage

* missing import

* whitespace in test_efficientnet.py

* getting there

* fixed reset

* fixed bufs

* switched to cstyle

* cleanup

* min/max rename

* one more linter issue

* fixed demo

* linter

* testing ci chrome

* add unsafe webgpu arg

* add build step

* remove WEBGPU from cmd line

* use module

* try forcing directx

* trying forced metal backend

* temp disable conv2d for CI

* disable conv_trasnpose2d

---------

Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-12 12:52:06 -07:00
Yosef Frost
613bcd945d Added Test Coverage to Int32 and Make Sure Tests Succeed (#1174)
* Added test coverage for int32 in `test/test_dtype.py`

Tests for int32 include:
- testing that int32 can be converted into a numpy array
- testing that float and int64 can be cast into int32
- testing that int32 can be cast into float and int64
- testing addition, multiplication, and matrix multiplication with int32
- testing that addition, multiplication, and matrix multiplication with int32 and either float or int64 gets successfully cast into float and int64, respectively

Additional changes include testing that int8 casts into int32 and testing that float16 casts into int32

* Added type casting to the add, subtract, and divide binary operations

* Added automatic type casting when types differ to FusedOps.MULACC

I moved the match_types function back so that I could call it in einsum_mulacc where it would cast the types of the MULACC to be the same

* Added unit test for match_types and added type hints to the parameters

* Added tests for ops_cpu.match_types

* Changed ops_cpu.einsum logic to play nicely with PyTorch

Changed `tinygrad.runtime.ops_cpu.einsum_mulacc` logic to not perform type matching. Type matching was instead moved to the numpy_fxn_for_op dictionary in the ops_cpu file. Since ops_torch uses the same einsum_mulacc function, this should fix all the broken pytorch tests.

* empty commit to rerun ci

* reverting PR#1213 in attempt to fix broken test

* Removed all tests I added to see if they are causing CI issues

* Added back type matching tests

* removed type matching tests and added back int tests

* added back part of the type matching tests

* removed braking type matching tests

* empty commit for testing

* added test back but inside comment

* removed a test from the comment to see if it breaks CI

* removed another function

* more testing

* emptied test comment

* cleaned up comments

* Added optimize=True flag to einsum_mullac in cpu_ops.py

* Removed unnecessary imports from tests

* optimized match_types by removing unnecessary array copying
2023-07-12 10:29:15 -07:00
Roelof van Dijk
8f2e2f5ee2 style: else-after-return (#1216)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-07-12 10:26:38 -07:00
George Hotz
ab663c46e8 tensor cores: don't upcast if we can't. fix stable diffusion 2023-07-12 10:21:02 -07:00
Hey
4f72eb823c Outdated repository URL (#1218)
* Update outdated repo url

* Update more outdated repo url's
2023-07-11 23:14:19 -07:00
Roelof van Dijk
d0e21a7398 ci: don't install recommended packages for GPU (#1215)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-07-11 15:38:49 -07:00
Francis Lam
df86672bd4 Fix LazyBuffer SHUFFLE_PAD_OPS to prevent invalid pad movement (#1223)
In addition to div, any ops that will generate non-zero outputs from
zero inputs need to be guarded.
2023-07-11 15:30:35 -07:00
AN Long
f75de602df fix typo in stable diffusion example (#1219) 2023-07-11 15:26:40 -07:00
chenyu
ab645317c9 Fix constant folding for Tensor([3]) (#1227)
* Fix constant folding for Tensor([3])

* Remove duplicated prod import

* load in the same device

* better numpy

* add constant fold shape test cases

* improve tests
2023-07-11 14:01:32 -07:00
Carson Radtke
e2f6b09ffd [perf] optimize=True kwarg for np.einsum (#1213) 2023-07-09 18:31:04 -07:00
madt2709
bb316a42af Fix pow to work with negative tensors (#1191) 2023-07-09 17:33:04 -07:00
George Hotz
43385c7dbf remove contiguous on full (#1212) 2023-07-09 17:31:15 -07:00
Carson Radtke
13a1abf9e7 remove tuple from type annotation in Tensor.__init__ (#1211) 2023-07-09 16:27:07 -07:00
Roelof van Dijk
e27f098946 View as namedtuple, cached methods (#1075)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-07-09 14:26:02 -07:00
Carson Radtke
1eb0e0cb3f implement common subexpression elimination (#1204)
* implement common subexpr elimination

* Revert "implement common subexpr elimination"

This reverts commit 40c5487d20.

* move cse to ast_parse + add type annotations

* oneline if

* improve saved_exprs lookup
2023-07-09 14:22:53 -07:00
George Hotz
beb4d3ab01 Tensor Cores 2: Local Buffers Edition (#1057)
* local buffers

* work

* works

* invert_strides

* work

* non tc

* fix shapetracker bug

* stride priority

* touchups

* gate tensor cores

* tensor core conv

* cleanups

* bug fixes

* fix metal_matmul

* fast tensor cores

* more speed

* buffer selection bug fix

* fix CI maybe

* ugh, CI is set to true, not 1

* tc allowed

* add_gl_dimension

* split out padding conv tests

* does padding add fail

* test_padded_conv2d_1x1

* skip metal ci stuff

* more strict on yellow

* float2

* strip parens

* fix float2

* touch up

* dtype

* strip parens

* no alias

* bugfix

* cast float2 and test tensor core ops

* oops, don't hardcode 4
2023-07-09 09:06:00 -07:00
George Hotz
67e34b356a good stuff from tensor cores branch (#1199) 2023-07-08 16:58:26 -07:00
George Hotz
7151382364 Refactor load/store before tensor cores (#1193)
* minor cleanups

* render_const

* now that's a nice refactor

* clean up vload/vstore

* clean up render_load

* debugs there

* dumb

* err, this?

* const float4

* what's failing

* bugfix

* statement includes semicolon

* bugfix
2023-07-08 15:54:58 -07:00
fluffy χατγιρλ
ef1909500e remove superfluous parentheses (#1197) 2023-07-08 15:11:02 -07:00
fluffy χατγιρλ
628ee46627 Fix bug where Tensor.randn returns inf (#1192)
* fix randn inf bug

* add test

* more compact test

* clarify test purpose
2023-07-08 12:03:46 -07:00
George Hotz
d9c1d81e99 Revert "feat: cancel previous workflow runs on new commits (#1184)" (#1194)
This reverts commit d66a0c285d.
2023-07-08 11:26:13 -07:00
George Hotz
52600d532e add 20 minute timeout 2023-07-07 23:02:28 -07:00
wozeparrot
d66a0c285d feat: cancel previous workflow runs on new commits (#1184) 2023-07-07 22:55:35 -07:00
Jacky Lee
e0c2ae8984 Update file paths (#1179) 2023-07-07 18:41:58 -07:00