Commit Graph

3592 Commits

Author SHA1 Message Date
zku
2d702ca073 If feasible, do not truncate float64 down to float32 in cstyle renderer (#3420)
* do not truncate float64 precision

* use l suffix to try avoid overload confusion

* long line, ruff bloats the function otherwise

* fmt

* remove long double suffix (l), it's sufficient to have the float32 (f) suffix to avoid function overload ambigouity; add test showcasing rtol=1e-12 precision increase, the test fails without the renderer changes

* use more reasonable test values, same as test_int_to_float_unary_func

* disable test for CUDACPU, does not support half and segfaults on some operations per dtypes_alu test

* disable test for HIP, renderer does not support f64 precision

* do not use noqa E501, break up condition
2024-02-16 10:08:59 +01:00
chenyu
30f26279c5 add back "CPU" in test_onnx_backend supports_device (#3426)
the onnx tests were all skipped.
2024-02-16 00:49:30 -05:00
xarkes
28a8b72024 Remove Interpreted device & remaining CPU/TORCH ref (#3423)
* Remove Interpreted device & remaining CPU/TORCH ref

* Oops

* supports_device was useful

* Fix doc wording

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-16 00:30:21 -05:00
chenyu
6efa68f97b remove use of TORCH in pre-commit (#3424)
it's silently using DEFAULT after removing TORCH
2024-02-15 19:38:37 -05:00
geohotstan
5eb4c902f6 correct division dtype casting (#3405)
* 新年快乐

* fix: exclude floordiv onnx tests

* fix: less weird if statements in div

* 龙年大吉

* fix: tempfix onnx div

* fix: use reference impl for div
2024-02-15 19:34:40 -05:00
George Hotz
5de660ca0d disk runner (prereq for interpreted removal) (#3421)
* disk runner

* simpler diskrunner
2024-02-15 18:14:05 +01:00
qazal
e1a57fe58a test the behavior, not the implementation (#3419) 2024-02-15 17:23:42 +01:00
George Hotz
b1c0d8c99d remove cpu and torch backends (#3399)
* remove cpu and torch backends

* don't copy to cpu

* use clang instead of cpu

* multitensor gathers on the first device

* clang is cpu + use default

* fixup

* bugfix
2024-02-15 16:55:39 +01:00
Obada Khalili
75f7e21a80 Make tests in test/test_ops.py pass for Python emulator (#3384)
* fix OverflowError in UnaryOps.EXP2

* avoid accessing outputs for void uops

* skip execution for UOps.IF and UOps.ENDIF

* initialize bytearray to the correct size in UOps.DEFINE_LOCAL

* validate len of input that has .sz > 1

* remove comment in code

* reinitialize loop of already iterated

* validate first value in input to be a list for inputs with .sz > 1

* add python ops tests to CI

* skip long runtime tests for PYTHON backend

* respect dtype.sz arg in UOps.CONST, and remove incorrect validation in UOps.STORE

* use math.inf instead of float('int')

* handle 0 args to UnaryOPs.LOG2

* handle load op with default of .sz > 1

* initialize the loop correctly using UOps.LOOP arg

* remove unnecessary TODO comment

* remove newline

* select a subset of 22 ops tests to skip in CI when PYTHON=1

* handle gated UOps.LOAD referencing values that have .sz > 1

* Revert "select a subset of 22 ops tests to skip in CI when PYTHON=1"

This reverts commit 7674fee81d.

* skip tests in python backend CI command

* push fix lost in conflict resolve

* Revert "skip long runtime tests for PYTHON backend"

This reverts commit 5dd2a0376e.

* clear loop state after last iteration
2024-02-15 16:40:25 +01:00
Obada Khalili
18bb6a22e0 make tensors sizes smaller in maxpool2d tests (#3417) 2024-02-15 15:53:52 +01:00
Maciej Fijalkowski
736c74b010 Rename .sz to .count on DType (#3413)
* rename .sz for .count on dtype (and ANETensor for completeness)

* revert the changes to extra, as per review

* try to make linter happier

* remove the change to extra
2024-02-15 15:03:49 +01:00
qazal
7919a1e6ec dtypes: delete the float cast in realize.py (#3401)
* remove float cast

* cast scalars to the correct value in creation time

* cast scalar in the correct place

* wrong, use y_dtype

* make consts have a unique cache key

* add cast_scalar back

* test_load_cache_const_bufs

* add bool dtype

* test_const_dtype

* fix linters
2024-02-15 14:20:30 +01:00
nimlgen
002bf380b0 hsa runtime (#3382)
* hsa init

* handles transfer

* linter

* clean up hwqueue

* fix sync freezes

* print errors
2024-02-15 14:14:34 +01:00
George Hotz
93eceef727 remove cpu prereqs (#3410) 2024-02-15 13:45:06 +01:00
George Hotz
a40df14fef ops_ext to replace cpu import (#3409)
* ops_ext to replace cpu import

* don't allow zero copy with as buffer

* memoryview(bytearray

* reenable test

* fix jit issue
2024-02-15 13:03:42 +01:00
George Hotz
ede4fd4705 hotfix: test_jit_copyin 2024-02-15 12:37:53 +01:00
George Hotz
6356474d6d Revert "ops_ext to replace cpu import (#3406)" (#3408)
This reverts commit 91eb93f85a.
2024-02-15 12:16:10 +01:00
George Hotz
91eb93f85a ops_ext to replace cpu import (#3406)
* ops_ext to replace cpu import

* don't allow zero copy with as buffer

* memoryview(bytearray

* reenable test
2024-02-15 12:14:58 +01:00
qazal
49cb1fee54 run test_indexing on remu (#3404)
* emulated ops_hip infra

* add int4

* include test_indexing in remu

* Revert "Merge branch 'remu-dev-mac'"

This reverts commit 6870457e57, reversing
changes made to 3c4c8c9e16.
2024-02-15 11:52:40 +01:00
qazal
9d4d63fcfc dynamic tc function render (#3387)
hip cant be done right now
2024-02-15 11:19:46 +01:00
chenyu
3c4c8c9e16 bump db version to 11 (#3398)
followup after disabled fast math on metal.
2024-02-14 10:13:18 -05:00
qazal
27f4de2ce4 delete half_prekernel (#3388)
* generic rendering of half and bf16

hotfix

* fix uops + regression test

* fix the test for metal's half4

* uop.uop fixup

* mypy with --strict-equality, fix ops_gpu
2024-02-14 15:40:48 +01:00
chenyu
078a2603d5 set metal fast math default to 0 (disabled) (#3370)
* set metal fast math default to 0 (disabled)

It's a correctness fix because we use inf and nan. Let's see how slow it is

* skip failed onnx tests

* tmp DISABLE_COMPILER_CACHE=1 in metal benchmark

* Revert "tmp DISABLE_COMPILER_CACHE=1 in metal benchmark"

This reverts commit 22267df380.
2024-02-14 11:42:33 +01:00
Francis Lam
668324d92b wmma: protect TC locals from modification and use only LOCAL (#3379)
also remove unnecesssary upcast_dim from tensor_core and calculate
it from the dimensions and thread sizes
2024-02-13 10:19:35 +01:00
Francis Lam
f1ad01fd91 test_linearizer_failures: add new linearizer compile failure on METAL (#3380) 2024-02-12 20:28:34 -05:00
George Hotz
ce1f9f5556 hotfix: new linearizer docs 2024-02-12 18:56:30 +01:00
George Hotz
2e60012bcf move create schedule and delete old API (#3377)
* move create schedule and delete old API

* fix test multitensor
2024-02-12 18:10:45 +01:00
George Hotz
41efaa848c move graph.py and jit.py into features (#3376)
* move graph.py into features

* move jit into features

* fix quickstart
2024-02-12 17:34:34 +01:00
George Hotz
0f6cde243d import from wino_cleanup (#3374) 2024-02-12 16:26:50 +01:00
George Hotz
f47e297d4e refactor: END -> ENDLOOP 2024-02-12 15:46:18 +01:00
George Hotz
29d68ae637 uops endif (#3372)
* use is instead of ==

* add endif
2024-02-12 15:43:37 +01:00
George Hotz
1d45f3899d use is instead of == (#3371) 2024-02-12 15:35:55 +01:00
David Hou
323393b650 verbose apply_matrix (#3333)
* verbose apply_matrix

* types

* not so verbose

* small comment change

* fix typo

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-02-12 12:06:12 +01:00
Jyotirmaya Mahanta
d55f99e881 patch merge_views (#3311) 2024-02-12 11:53:55 +01:00
Jyotirmaya Mahanta
b6a2600c86 fix merging condition in merge_dims (#3363)
* fix merging condition in merge_dims

* add tests

* set contiguous after mask is canonicalized

* minor fix
2024-02-12 11:50:26 +01:00
qazal
c8fd66a131 Run RDNA3 tensor core tests in CI (#3367)
* add test_linearizer

* skip test_padto_matmul
2024-02-11 19:54:06 -05:00
chenyu
f798b60338 add METAL_FAST_MATH env var to disable metal fast math (#3369)
* env var METAL_FAST_MATH to disable fastmath for metal

use this to test impact of fast math. might need to disable compiler cache with DISABLE_COMPILER_CACHE

* failed onnx test with fast math

METAL_FAST_MATH=0 DISABLE_COMPILER_CACHE=1 NOOPT=1 python -m pytest -n=auto test/external/external_test_onnx_backend.py -k test_MaxPool3d_stride_padding_cpu
2024-02-11 04:26:09 -05:00
chenyu
1156a27619 cleanup atol in test_ops (#3368)
removed the explicit set value if it's the same as default 1e-6, or higher but can be set to default.
2024-02-10 19:44:44 -05:00
Yoshinori Sano
98c732cf9d fix metal compile error in extra/gemm (#3365) 2024-02-10 12:54:41 +01:00
George Hotz
d1fb1e0ba4 full sync to fix HIP memory leak (#3364) 2024-02-10 11:50:27 +01:00
Francis Lam
ddb22a60c8 linearizer: fix up edge case bugs in UNROLL opt (#3362)
Fully UNROLLing the first_reduce should not change the number of
local_dims.

Fully UNROLLing a GROUP dim should reduce the number of
group_for_reduces by one.

Also changed group_for_reduces to be a count as the axis number
isn't used anywhere (they are always the first reduce dims).
2024-02-10 11:49:25 +01:00
George Hotz
dc82ef6660 hotfix: swap HIP/CUDA bringup order to prevent delay on tinybox 2024-02-09 18:41:25 +01:00
andresgit
28ba1c5406 fix Tensor.randint ignoring kwargs (#3350)
* fix Tensor.randint ignoring kwargs

* randint kwargs fix
2024-02-09 17:12:16 +01:00
Francis Lam
ce21fdfb67 ops_python: add HIP tensor core mock and refactor METAL (#3354)
* ops_python: add HIP tensor core mock and refactor METAL

* Add tests to CI

* add DEBUG=2 to full tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-02-09 12:46:06 +01:00
George Hotz
b385234961 oops, change to 3.12 (#3357) 2024-02-09 12:21:06 +01:00
George Hotz
7726eef464 ops_python: add image support (#3356)
* ops_python: add image support

* uops tests in their own CI

* fix ci
2024-02-09 12:02:06 +01:00
George Hotz
5f93061f67 ops_python: gated load support (#3355)
* start uop emu

* tiny_add passes

* more ops

* emulate the whole warp

* test_gemm passes

* metal gemm test pass

* works on big gemm

* works on big gemm

* more tests pass

* touch ups

* fix mypy

* cleanups

* exp2 mypy

* arch is where it belongs

* actually emulate tensor cores

* fix test

* new style

* add gated load support to PYTHON

* out of bounds error message

* cleaner
2024-02-09 11:16:25 +01:00
chenyu
c151131d1b update onnx tests that no longer fail on CI (#3353)
was debugging fast math and turned out it passed on CI now. more like a bug in CI
2024-02-08 21:19:00 -05:00
chenyu
7c1c6efee5 exclude half with PYTHON in test_dtype.is_dtype_supported (#3351)
half memoryview only in 3.12+. rest of the test_dtype (bounty) seems to be legit issue in ops_python.
2024-02-08 20:10:25 -05:00
George Hotz
c32ea95d7d Python uop emulator (#3327)
* start uop emu

* tiny_add passes

* more ops

* emulate the whole warp

* test_gemm passes

* metal gemm test pass

* works on big gemm

* works on big gemm

* more tests pass

* touch ups

* fix mypy

* cleanups

* exp2 mypy

* arch is where it belongs

* actually emulate tensor cores

* fix test

* new style
2024-02-08 19:24:55 +01:00