Commit Graph

243 Commits

Author SHA1 Message Date
George Hotz
ac02e7347d ptx timing vs cuda timing (#3659) 2024-03-08 10:17:49 -08:00
chenyu
e25879d50e don't get new var_val for the same ast in fuzz_linearizer (#3657)
fixed result comparison for kernels with variables
2024-03-08 09:49:24 -05:00
chenyu
1130c73844 add FUZZ_NTH to fuzz_linearizer (#3656)
* add FUZZ_NTH to fuzz_linearizer

also update tests in test_linearizer_failures to not just run on METAL

* update failures for HIP/HSA

* test_failure_21 LLVM PADTO
2024-03-08 09:16:49 -05:00
David Hou
9f66dcf718 PolynomialDecayWithWarmup + tests (#3649)
* working PolynomialDecayWithWarmup + tests.......

add lars_util.py, oops

* keep lars_util.py as intact as possible, simplify our interface

* whitespace

* clean up

* clean up

* asserts

* test polylr for full resnet training run

* add comment

* rename

* fix do_optim

* don't cast lr

* info

* calculate from train_files

* skip it
2024-03-07 18:53:36 -05:00
chenyu
57df8e8d82 update fuzz_linearizer (#3648)
included non-reduce kernel and kernel with variables. green msg when everything passed
it's possible that creating rawbufs failed due to memory error, included that in failure cases
2024-03-07 18:41:22 -05:00
chenyu
906cc3a69b cleanup tests Device[Device.DEFAULT] is always Compiled (#3645) 2024-03-07 11:15:42 -05:00
qazal
bdd62c7fd8 make the bf16 include dynamic (#3642)
* dynamic prefix

* add common ones above

these are common dtypes

aesthetics

* regression test

fuzz it

test

* run in CI

* use .append

* faster
2024-03-07 10:31:35 -05:00
David Hou
0afaf70d57 lars optimizer + tests (#3631)
* lars optimizer + tests

* fix skip list!

* use id to compare in skip list

* go back to using set

* Tensor(bool) * Tensor(bool) is and

* don't lint external/mlperf_resnet

* whitespace

* add external_test_optim to opencl tests

* give mlperf task a name

* mlperf under onnx

* remove track_gnorm

* contiguous instead of realize

* assert momentum and weight decay positive

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-06 18:11:01 -05:00
George Hotz
8500265561 this mem fault still happening (#3620)
* this mem fault still happening

* smaller

* that print doesn't work

* overflows test

* hip doesn't uses_ptr_arithmetic

* only with locals

* test overflow new name

* it's not ptr arith

* simpler

* simple repro

* old compiler

* simpler

* put that back
2024-03-05 10:39:32 -08:00
George Hotz
f500be1313 out of bounds access caused by launch bounds (#3615)
* lin overflow

* remove launch bounds

* remove launch bounds infra

* oops, fix bufs type
2024-03-05 06:34:00 -08:00
Francis Lam
162dfb07d9 fuzz_linearizer: fix uops and add to test.yml (#3588) 2024-03-02 15:03:42 -08:00
George Hotz
83530a585f add quick external data select test 2024-03-02 05:38:32 -08:00
chenyu
d89e3c4e08 enable METAL tests now runner is M1 and no fast-math (#3523) 2024-02-28 14:14:23 -05:00
Francis Lam
11da65bccd test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option (#3455)
* test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option

this allows us to limit the size of the kernel and reduce running
times by avoiding ones that take a long time

* fix spacing and re-order to put parameters together
2024-02-27 07:34:59 -05:00
chenyu
30f26279c5 add back "CPU" in test_onnx_backend supports_device (#3426)
the onnx tests were all skipped.
2024-02-16 00:49:30 -05:00
xarkes
28a8b72024 Remove Interpreted device & remaining CPU/TORCH ref (#3423)
* Remove Interpreted device & remaining CPU/TORCH ref

* Oops

* supports_device was useful

* Fix doc wording

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-16 00:30:21 -05:00
George Hotz
b1c0d8c99d remove cpu and torch backends (#3399)
* remove cpu and torch backends

* don't copy to cpu

* use clang instead of cpu

* multitensor gathers on the first device

* clang is cpu + use default

* fixup

* bugfix
2024-02-15 16:55:39 +01:00
George Hotz
a40df14fef ops_ext to replace cpu import (#3409)
* ops_ext to replace cpu import

* don't allow zero copy with as buffer

* memoryview(bytearray

* reenable test

* fix jit issue
2024-02-15 13:03:42 +01:00
George Hotz
6356474d6d Revert "ops_ext to replace cpu import (#3406)" (#3408)
This reverts commit 91eb93f85a.
2024-02-15 12:16:10 +01:00
George Hotz
91eb93f85a ops_ext to replace cpu import (#3406)
* ops_ext to replace cpu import

* don't allow zero copy with as buffer

* memoryview(bytearray

* reenable test
2024-02-15 12:14:58 +01:00
chenyu
078a2603d5 set metal fast math default to 0 (disabled) (#3370)
* set metal fast math default to 0 (disabled)

It's a correctness fix because we use inf and nan. Let's see how slow it is

* skip failed onnx tests

* tmp DISABLE_COMPILER_CACHE=1 in metal benchmark

* Revert "tmp DISABLE_COMPILER_CACHE=1 in metal benchmark"

This reverts commit 22267df380.
2024-02-14 11:42:33 +01:00
George Hotz
2e60012bcf move create schedule and delete old API (#3377)
* move create schedule and delete old API

* fix test multitensor
2024-02-12 18:10:45 +01:00
George Hotz
41efaa848c move graph.py and jit.py into features (#3376)
* move graph.py into features

* move jit into features

* fix quickstart
2024-02-12 17:34:34 +01:00
chenyu
f798b60338 add METAL_FAST_MATH env var to disable metal fast math (#3369)
* env var METAL_FAST_MATH to disable fastmath for metal

use this to test impact of fast math. might need to disable compiler cache with DISABLE_COMPILER_CACHE

* failed onnx test with fast math

METAL_FAST_MATH=0 DISABLE_COMPILER_CACHE=1 NOOPT=1 python -m pytest -n=auto test/external/external_test_onnx_backend.py -k test_MaxPool3d_stride_padding_cpu
2024-02-11 04:26:09 -05:00
chenyu
c151131d1b update onnx tests that no longer fail on CI (#3353)
was debugging fast math and turned out it passed on CI now. more like a bug in CI
2024-02-08 21:19:00 -05:00
Francis Lam
2266152b28 linearizer: added FUZZ_BEAM to fuzz_linearizer and additional tests (#3340)
Fixed test_tensor_core_opts to test all the TCs.

Added commented out failing tests in test_color_shapes_with_local.
2024-02-08 16:12:58 +01:00
chenyu
30a3288c4a touchup canonicalize empty mask (#3308)
empty list -> None. also added env SEED for fuzz_shapetracker_math
2024-02-03 21:05:10 -05:00
chenyu
7816c3b692 onnx update for trilu and argmax (#3283)
* support 0 in shape for tril and triu

* select_last_index for ArgMax and ArgMin

* pass **kwargs
2024-01-30 18:39:16 -05:00
George Hotz
247a8a2a6c add canonicalization to View.create (#3280)
* Reapply "take merge views from corsix branch" (#3278)

This reverts commit d298916232.

* reintroduce merge views

* update second any

* isinstance -> not

* 25% less same but unequal
2024-01-30 10:26:48 -08:00
George Hotz
d8f6280ffb hotfix: add CHECK_NEQ to fuzz_shapetracker_math 2024-01-30 10:07:54 -08:00
George Hotz
c4d870db0d fix jit realize issue (#3258) 2024-01-26 18:27:35 -08:00
chenyu
bc92c4cc32 onnx Einsum, CumSum, DepthToSpace, SpaceToDepth (#3252)
* onnx Einsum, CumSum, DepthToSpace, SpaceToDepth

Einsum inner product and `...` are not supported

* --durations=20
2024-01-26 10:47:53 -05:00
chenyu
e45ffdb6cf cleanup onnx (#3249)
* add onnx test_reduce_log_sum_exp

* more reuse

* more

* stuff

* good CenterCropPad

* imports

* good ArrayFeatureExtractor

* pretty good Pad

* stuff

* stuff

* onnx.py

* Atan

* pass int8 test

* dtype related

* fastmath stuff

* Resize linear

* fix CI

* move back
2024-01-25 20:39:59 -05:00
nimlgen
f87ecbb0f3 fuzzer validates outputs + (partially) oob accesses (#3178)
* fuzzer validates outputs + (partially) oob accesses

* +random

* oob check only for compiled

* type cmp fixes

* fix zeroing

* no prints

* add seed
2024-01-19 13:34:51 -05:00
chenyu
1b508e0f71 fix fuzz_linearizer toCPU to as_buffer (#3158) 2024-01-17 13:18:46 -05:00
chenyu
537fb8b0b8 separate try except blocks in onnx2torch in model benchmark (#3126)
exceptions can be raised from either model conversion or individual backend failed. openpilot on torch mps works, but does not work with torch cpu.
seperate the expcetion block so that the benchmark can inlcude torch mps for openpilot.
2024-01-15 00:39:33 -05:00
George Hotz
ac3f246c11 cached size (#3060)
* cached size

* simplify simplify

* 0 doesn't have base

* fix test

* cleaner cache

* hmm, metal is flaky on this...might be real(ish) but useless as test

* short circuit reshape/expand properly

* better reshape bypass
2024-01-09 16:37:37 -08:00
George Hotz
2c6f2e899d No extra vars call (#3054)
* remove unused reciprocal

* comment

* remove unneeded call to vars

* free speedup
2024-01-09 09:52:58 -08:00
chenyu
19298e7a3f Device._buffers -> Device._devices (#3052)
backend devices used to be called buffers
2024-01-08 21:30:38 -05:00
George Hotz
2a2d3233d2 add test that the compiler isn't used (#3025)
* add test that the compiler isn't used

* one print_tree

* improve speed with st size cache

* switch to gpt-2
2024-01-05 17:24:01 -08:00
geohotstan
57817028bb removed redundant dtype hacks in onnx_ops (#2939)
* updated most dtype hacks in onnx_ops

* temporarily revert dequantizelinear change

* I think this is right...

* MORE FIXES WOOOO NEW DTYPE IS AWESOME

* ok

* oops missed a print

* half -> float32 for CI

* is npdtype

* some more

* fix if ordering

* more clean ups

* final cleanups

* casting to half not allowed

* k nvm

* revert ArgMax change

* only GPU

* llvm begone

* teeny tiny change

* fix: attempt to add cast tests

* try this

* fix dequantizelinear

* revert some stuff

* tests pass pls

* less lines in onnx_tests

* oops missed string tensor tests

* clean up

* try: revert default behavior changes

* fix: disabled Cast and Castlike tests

* docs: small changes

* fix: fixed isNaN op and enabled associated tests

* fix: forgot about float16

* done

* update disabled test

* gah missed another float16

* disable rest of failing tests

* rm extra line

* try...

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-01-04 01:45:24 -05:00
chenyu
58d3d5030b vars_from_ast -> LazyOp.vars (#2965) 2024-01-01 18:12:38 -05:00
George Hotz
a280cfe169 move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
c81ce9643d move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
George Hotz
56f44bd10e move the compiler cache to be global (#2957)
* move the compiler cache to be global

* remove non robust test

* remove dead code
2024-01-01 10:59:56 -08:00
chenyu
1fb815e77e hotfix fix coder. RMSNorm cannot have float16 input (#2932)
* hotfix fix coder. RMSNorm cannot have float16 input

* update real world test due to new kernels

* more type casts
2023-12-25 02:28:11 -05:00
chenyu
50927defad s/lazydata.realized/lazydata.base.realized/g (#2914)
* s/lazydata.realized/lazydata.base.realized/g

* not that
2023-12-22 14:45:13 -05:00
chenyu
50cfb1fb3a update onnx model links (#2908)
updated in https://github.com/onnx/models/pull/644
2023-12-22 00:19:41 -05:00
chenyu
1bbeb3fe2f remove the different rtol / atol for openpilot CUDA in benchmark (#2907)
not sure what the issue was but seems to be fixed on master
2023-12-21 22:23:39 -05:00
chenyu
5bf43c9634 reenable one onnx test failed due to dtype (#2902) 2023-12-21 15:50:02 -05:00