Commit Graph

4618 Commits

Author SHA1 Message Date
chenyu
f798b60338 add METAL_FAST_MATH env var to disable metal fast math (#3369)
* env var METAL_FAST_MATH to disable fastmath for metal

use this to test impact of fast math. might need to disable compiler cache with DISABLE_COMPILER_CACHE

* failed onnx test with fast math

METAL_FAST_MATH=0 DISABLE_COMPILER_CACHE=1 NOOPT=1 python -m pytest -n=auto test/external/external_test_onnx_backend.py -k test_MaxPool3d_stride_padding_cpu
2024-02-11 04:26:09 -05:00
chenyu
1156a27619 cleanup atol in test_ops (#3368)
removed the explicit set value if it's the same as default 1e-6, or higher but can be set to default.
2024-02-10 19:44:44 -05:00
Francis Lam
ddb22a60c8 linearizer: fix up edge case bugs in UNROLL opt (#3362)
Fully UNROLLing the first_reduce should not change the number of
local_dims.

Fully UNROLLing a GROUP dim should reduce the number of
group_for_reduces by one.

Also changed group_for_reduces to be a count as the axis number
isn't used anywhere (they are always the first reduce dims).
2024-02-10 11:49:25 +01:00
andresgit
28ba1c5406 fix Tensor.randint ignoring kwargs (#3350)
* fix Tensor.randint ignoring kwargs

* randint kwargs fix
2024-02-09 17:12:16 +01:00
Francis Lam
ce21fdfb67 ops_python: add HIP tensor core mock and refactor METAL (#3354)
* ops_python: add HIP tensor core mock and refactor METAL

* Add tests to CI

* add DEBUG=2 to full tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-02-09 12:46:06 +01:00
chenyu
c151131d1b update onnx tests that no longer fail on CI (#3353)
was debugging fast math and turned out it passed on CI now. more like a bug in CI
2024-02-08 21:19:00 -05:00
chenyu
7c1c6efee5 exclude half with PYTHON in test_dtype.is_dtype_supported (#3351)
half memoryview only in 3.12+. rest of the test_dtype (bounty) seems to be legit issue in ops_python.
2024-02-08 20:10:25 -05:00
George Hotz
c32ea95d7d Python uop emulator (#3327)
* start uop emu

* tiny_add passes

* more ops

* emulate the whole warp

* test_gemm passes

* metal gemm test pass

* works on big gemm

* works on big gemm

* more tests pass

* touch ups

* fix mypy

* cleanups

* exp2 mypy

* arch is where it belongs

* actually emulate tensor cores

* fix test

* new style
2024-02-08 19:24:55 +01:00
Francis Lam
2266152b28 linearizer: added FUZZ_BEAM to fuzz_linearizer and additional tests (#3340)
Fixed test_tensor_core_opts to test all the TCs.

Added commented out failing tests in test_color_shapes_with_local.
2024-02-08 16:12:58 +01:00
chenyu
b110c4a7b8 explicitly set input low and high in test_ops (#3347)
easier to set `(low, high)` than figuring out a,b for `(x+a)*b`. this pr kept the same input ranges
2024-02-08 04:11:45 -05:00
chenyu
0d2dacb549 test intermediate tensors created by function have same device as input (#3338)
run on TORCH since it's the fastest one on CI.
caught a bug in multinomial, and update the behavior of fancy index and gather to move the indices Tensor to same device as self.
2024-02-07 09:24:36 -05:00
chenyu
02636ff62d re-enable test_reduce_0d_default int test case in test_dtype (#3336) 2024-02-07 05:30:14 -05:00
chenyu
ca66be6a70 add failed Tensor.pow test cases (#3334)
tried refactoring pow and found some bugs
2024-02-07 04:28:24 -05:00
chenyu
d9ef8e25b3 fix Tensor.var with 0 in reduce dim. (#3324)
fix when correction is too big. it seems to only work when input size is 0 though.
torch can output -inf in var when correction is too big, which does not make sense.
2024-02-05 20:59:13 -05:00
Obada Khalili
ee25f73283 Fix Tensor.mean to compute the mean correctly when 0-length axes are selected (#3318)
* fix Tensor.mean to compute the mean correctly with 0-length axes are selected

* add a regression test

* rename sum variable to sum_t to avoid conflict with built it function

* refactor Tensor.mean to has less lines
2024-02-05 01:40:37 -05:00
chenyu
97275101e9 fix safetensor load uint32 and uint64 (#3315)
the correct keys are U32 and U64.
2024-02-04 10:46:27 -05:00
Yoshinori Sano
edb74897b2 support safe load bf16 (#3310)
* support safe load bf16

* fix lint error E501

* add test for loading safetensors

* key should be BOOL

* fix lint
2024-02-04 10:08:39 -05:00
chenyu
d459956966 move TestGetContraction to test_helpers (#3313)
also cleaned long lines in test_shapetracker and enabled the line length check
2024-02-04 06:05:01 -05:00
Obada Khalili
b4ea0e18e3 Fix dot product on buffers with zero strides (#3303)
* skip matacc opt if the all src buffers of mul op are const buffers

* add noqa directive for long test

* unskip MALACC opt

* ensure that a_axes at least includes summation axes in order to perform np.einsum correctly

* add regression test for mulacc op

* compute a_slices using a_axes

* refactor  helper of  function to retrieve axes and slices for nonzero strides as well as summation axes

* include a regression test that uses  and  to test the behaviour indirectly
2024-02-04 05:15:06 -05:00
chenyu
30a3288c4a touchup canonicalize empty mask (#3308)
empty list -> None. also added env SEED for fuzz_shapetracker_math
2024-02-03 21:05:10 -05:00
David Hou
aebaab011f faster wino compile by catting consts across data expand dim (#3293)
* PoC faster wino compile by catting consts across data expand dim

* fix fusions

* faster + golf it

* noqa 501

* implicit broadcast

* Revert "implicit broadcast"

This reverts commit 5915a9083d045ec1e6be84dcb492333325d48666.

* shorter

* shorter

* oops

* 216 upcasts is probably fine

* wino kernel count test

* test winograd number of sts

* specify device for apply_matrix mat elements
2024-02-02 03:47:45 -05:00
Felix Wu
021eea3a52 fix UnboundLocalError when running Compiler with DISABLE_COMPILER_CACHE (#3296) 2024-02-01 21:12:33 -05:00
chenyu
9196b11dfb test_ops sinh/cosh/asinh/acosh/atanh (#3294)
some have numerical issues at large input similar to sigmoid
2024-02-01 03:10:11 -05:00
Francis Lam
927f2dd24d wmma: add HIP FP16 to FP16 tensor core (#3287)
* wmma: add HIP FP16 to FP16 tensor core

* test: fix test_tensor_core to use separate tolerances for half
2024-01-31 23:00:51 -05:00
chenyu
18e854cdbf shrink MLB on sharded axis (#3255)
* shrink MLB on sharded axis

use onehot structure to store the real partition. goal is unsynced batchnorm2d that can be run on multigpu for training.

draft version in https://github.com/chenyuxyz/tinygrad/pull/109

* SYNCBN flag

* test unclean shrinks

* UnsyncedBatchNorm reuses BatchNorm

* more robust pad arg check

* better types

* more tests!

* 6 gpus in benchmark

* disable slow GPUS=6 benchmark
2024-01-31 21:48:25 -05:00
chenyu
a3652e6ddc minor cleanups to test_ops (#3290)
- removed noop a=0
- fixed integer div test
- added test for both python expression and Tensor method call
- reordered for consistency and added some spaces
2024-01-31 19:01:25 -05:00
chenyu
7816c3b692 onnx update for trilu and argmax (#3283)
* support 0 in shape for tril and triu

* select_last_index for ArgMax and ArgMin

* pass **kwargs
2024-01-30 18:39:16 -05:00
qazal
5b46b0ff3d Simple RDNA3 emulator (#2974)
* mockhip->hipcpu

* allocate buffers

* launch a kernel

read_asm api

* run remu in CI

* remu 0.0.2, real test ops

* simple driver

* 0.0.3, all test_ops

* run the latest emulator

* 9 minutes is way too long, drop backprop in CI

* bring back the backward pass

* Revert "bring back the backward pass"

This reverts commit 3781e1bc56.

* Print slowest tests

* emulated device directly in ops_hip

* fix ruff, override mypy for specific rules

* test in the same code path

- hip backend env variables

- install packages and verify autogen

- run certain tests

- remove the other hip tests path

- verify Device.DEFAULT

* remove the emulated hip in extra

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-01-30 10:39:28 -08:00
George Hotz
247a8a2a6c add canonicalization to View.create (#3280)
* Reapply "take merge views from corsix branch" (#3278)

This reverts commit d298916232.

* reintroduce merge views

* update second any

* isinstance -> not

* 25% less same but unequal
2024-01-30 10:26:48 -08:00
George Hotz
d8f6280ffb hotfix: add CHECK_NEQ to fuzz_shapetracker_math 2024-01-30 10:07:54 -08:00
George Hotz
09f2952dc3 reintroduce merge views in update benchmark (#3279)
* Reapply "take merge views from corsix branch" (#3278)

This reverts commit d298916232.

* reintroduce merge views
2024-01-30 09:47:20 -08:00
George Hotz
d298916232 Revert "take merge views from corsix branch" (#3278) 2024-01-30 09:34:28 -08:00
George Hotz
b57a16aa89 take merge views from corsix branch (#3273)
* take merge views from corsix branch

* better DEBUG

* max views

* remove view.py change

* Revert "remove view.py change"

This reverts commit f3025f4f39.

* only allow filter on non symbolic

* oops, correct fix

* comment to explain
2024-01-30 09:25:16 -08:00
George Hotz
6a4a5dc79d fix pad 0 size (#3277)
* fix pad 0 size

* put in view, not pad

* test was wrong
2024-01-30 08:58:10 -08:00
Francis Lam
861d5ac224 wmma: fix the upcasts after WMMA to be hcopt ordering invariant (#3250)
will correctly handle and permutation of optops after the TC one
2024-01-29 11:51:57 -08:00
George Hotz
085dc87bed winograd should be 4 kernels (#3268) 2024-01-28 09:21:26 -08:00
George Hotz
9e17378b60 Fix metal tests (#3266)
* small fixes for tests on mac

* remove device from TensorCore
2024-01-27 18:09:42 -08:00
Hristo Georgiev
3ae811af21 tests for Tensor init data dtype and resulting dtype (#3247)
Co-authored-by: Hristo Georgiev <6043312+hristog@users.noreply.github.com>
2024-01-27 00:13:42 -08:00
George Hotz
3c728d1082 compiler support (#3260)
* compiler support

* revert that

* fix tests
2024-01-26 23:36:40 -08:00
Francis Lam
4273aabe31 extra/gemm: add a simple_conv.py along with correctness check (#3236)
* extra/gemm: add a simple_conv.py along with correctness check

The goal is to easily test tensor core triggering situations

* test: add tests for acc_dtype handling and fixed typing
2024-01-26 19:06:57 -08:00
George Hotz
473935125a use comgr to compile (#3248)
* use comgr to compile

* fast

* bfloat16

* move comgr to it's own file

* cleaner style

* comgr in new place

* comgr free + dtype cleanup
2024-01-26 18:27:49 -08:00
George Hotz
c4d870db0d fix jit realize issue (#3258) 2024-01-26 18:27:35 -08:00
chenyu
4197ef17c4 const cleanup with dtype.Scalar (#3257)
moved Scalar to dtype.py. assert in _broadcasted when y is a Scalar and
fix some tests
2024-01-26 21:16:22 -05:00
chenyu
bc92c4cc32 onnx Einsum, CumSum, DepthToSpace, SpaceToDepth (#3252)
* onnx Einsum, CumSum, DepthToSpace, SpaceToDepth

Einsum inner product and `...` are not supported

* --durations=20
2024-01-26 10:47:53 -05:00
chenyu
e45ffdb6cf cleanup onnx (#3249)
* add onnx test_reduce_log_sum_exp

* more reuse

* more

* stuff

* good CenterCropPad

* imports

* good ArrayFeatureExtractor

* pretty good Pad

* stuff

* stuff

* onnx.py

* Atan

* pass int8 test

* dtype related

* fastmath stuff

* Resize linear

* fix CI

* move back
2024-01-25 20:39:59 -05:00
George Hotz
7feeb118e6 hip launch speed (#3246)
* faster HIP kernel launch

* args

* expand compile_hip
2024-01-25 15:13:55 -08:00
George Hotz
cb372b053f add device speed test (#3244) 2024-01-25 12:01:22 -08:00
geohotstan
d0e116c6d6 fix maximum/where Scalar casting (#3194)
* init

* test: added dtype tests for maximum

* fix: seperate maximum const and maximum tensors

* fix: del useless line

* fix: some dtypes

* CODE GOLF: we golfing at mar-a-lago golf club tonight boyyyys

* fix: add lil helper function

* fix: some test refactoring

* done

* sike: not done yet lol

* wtf I missed an assert, am I drunk

* yeah idk

* fix: line save from redundant check

* revert: line save

* fix: simplify test_broadcast cuz I'm stumped

* change some test name

* fix: bool max bool  works

* test: add a maximum bool test

* test: make sure minimum also works with bool

* fix: something like this? :s

* fix: maybe this?

* fix: how about this? tighter check

* fix: this.

* revert: nvm mul(0.5) and div(2) has the same kernel for backward

* fix: .is_floating_point() xD

* revert: maximum and minimum and add cast

* fix: cover negative const case in test

* fix: use eq because I don't understand clang :D

* WHOOOOPS
2024-01-25 12:26:04 -05:00
geohotstan
3628bea910 fix: big round even rounder round (#3242)
* fix: big round even rounder round

* fix: variable name lol

* feat: 1 less potential cast

* consistant naming (im just spaming commits now)

* LOL MISSED ONNX ANOTHER COMMIT

* test: fix test_ops and remove _round

* test: tensor methods oops
2024-01-25 12:24:15 -05:00
chenyu
da5e27968c failed test cases for Tensor.round (#3240)
it should round to even
2024-01-25 02:12:50 -05:00