Commit Graph

3661 Commits

Author SHA1 Message Date
chenyu
a3652e6ddc minor cleanups to test_ops (#3290)
- removed noop a=0
- fixed integer div test
- added test for both python expression and Tensor method call
- reordered for consistency and added some spaces
2024-01-31 19:01:25 -05:00
chenyu
77251336d5 fix handcode_resnet50_opt.py (#3289)
linearizer_opts has moved. also update the logging to print after total_tm update
2024-01-31 19:01:08 -05:00
chenyu
9b8c1a0408 Tensor.batchnorm works more than 2d and reuse in onnx (#3284) 2024-01-30 19:02:45 -05:00
chenyu
7816c3b692 onnx update for trilu and argmax (#3283)
* support 0 in shape for tril and triu

* select_last_index for ArgMax and ArgMin

* pass **kwargs
2024-01-30 18:39:16 -05:00
qazal
5b46b0ff3d Simple RDNA3 emulator (#2974)
* mockhip->hipcpu

* allocate buffers

* launch a kernel

read_asm api

* run remu in CI

* remu 0.0.2, real test ops

* simple driver

* 0.0.3, all test_ops

* run the latest emulator

* 9 minutes is way too long, drop backprop in CI

* bring back the backward pass

* Revert "bring back the backward pass"

This reverts commit 3781e1bc56.

* Print slowest tests

* emulated device directly in ops_hip

* fix ruff, override mypy for specific rules

* test in the same code path

- hip backend env variables

- install packages and verify autogen

- run certain tests

- remove the other hip tests path

- verify Device.DEFAULT

* remove the emulated hip in extra

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-01-30 10:39:28 -08:00
George Hotz
247a8a2a6c add canonicalization to View.create (#3280)
* Reapply "take merge views from corsix branch" (#3278)

This reverts commit d298916232.

* reintroduce merge views

* update second any

* isinstance -> not

* 25% less same but unequal
2024-01-30 10:26:48 -08:00
George Hotz
d8f6280ffb hotfix: add CHECK_NEQ to fuzz_shapetracker_math 2024-01-30 10:07:54 -08:00
George Hotz
09f2952dc3 reintroduce merge views in update benchmark (#3279)
* Reapply "take merge views from corsix branch" (#3278)

This reverts commit d298916232.

* reintroduce merge views
2024-01-30 09:47:20 -08:00
George Hotz
d298916232 Revert "take merge views from corsix branch" (#3278) 2024-01-30 09:34:28 -08:00
George Hotz
b57a16aa89 take merge views from corsix branch (#3273)
* take merge views from corsix branch

* better DEBUG

* max views

* remove view.py change

* Revert "remove view.py change"

This reverts commit f3025f4f39.

* only allow filter on non symbolic

* oops, correct fix

* comment to explain
2024-01-30 09:25:16 -08:00
George Hotz
6a4a5dc79d fix pad 0 size (#3277)
* fix pad 0 size

* put in view, not pad

* test was wrong
2024-01-30 08:58:10 -08:00
chenyu
b0a755288f cifar EVAL_BS set default value to BS (#3274)
less compile time for eval due to cache. 500 was a slow uneven number for 6 GPU too. eval time 5.9s -> 3.4s
2024-01-29 17:37:12 -05:00
Francis Lam
861d5ac224 wmma: fix the upcasts after WMMA to be hcopt ordering invariant (#3250)
will correctly handle and permutation of optops after the TC one
2024-01-29 11:51:57 -08:00
chenyu
af4ca85594 MultiLazyBuffer.reshape new_axis without real_strides (#3272)
similar to contraction, but this is one is for finding the mapped single axis
2024-01-28 23:53:52 -05:00
chenyu
34c7621556 HIP=1 NOCLANG=1 for tinybox external_model_benchmark (#3270)
used HIP instead of GPU and disabled slow CLANG
2024-01-28 22:05:26 -05:00
George Hotz
085dc87bed winograd should be 4 kernels (#3268) 2024-01-28 09:21:26 -08:00
George Hotz
f48b6aca77 long running beam pool (#3267) 2024-01-28 08:06:03 -08:00
George Hotz
9e17378b60 Fix metal tests (#3266)
* small fixes for tests on mac

* remove device from TensorCore
2024-01-27 18:09:42 -08:00
Francis Lata
86748f4a8c fix bbox format to be a list (#3265) 2024-01-27 17:54:19 -08:00
George Hotz
67a78615e5 uoptimizer (#3262)
* uoptimizer

* uops

* self.uoptimize
2024-01-27 10:26:04 -08:00
Hristo Georgiev
3ae811af21 tests for Tensor init data dtype and resulting dtype (#3247)
Co-authored-by: Hristo Georgiev <6043312+hristog@users.noreply.github.com>
2024-01-27 00:13:42 -08:00
George Hotz
3c728d1082 compiler support (#3260)
* compiler support

* revert that

* fix tests
2024-01-26 23:36:40 -08:00
Francis Lam
4273aabe31 extra/gemm: add a simple_conv.py along with correctness check (#3236)
* extra/gemm: add a simple_conv.py along with correctness check

The goal is to easily test tensor core triggering situations

* test: add tests for acc_dtype handling and fixed typing
2024-01-26 19:06:57 -08:00
George Hotz
0aad8d238b rebuild ocelot (#3259)
* rebuild

* strip trailing whitespace
2024-01-26 18:46:36 -08:00
George Hotz
473935125a use comgr to compile (#3248)
* use comgr to compile

* fast

* bfloat16

* move comgr to it's own file

* cleaner style

* comgr in new place

* comgr free + dtype cleanup
2024-01-26 18:27:49 -08:00
George Hotz
c4d870db0d fix jit realize issue (#3258) 2024-01-26 18:27:35 -08:00
chenyu
4197ef17c4 const cleanup with dtype.Scalar (#3257)
moved Scalar to dtype.py. assert in _broadcasted when y is a Scalar and
fix some tests
2024-01-26 21:16:22 -05:00
George Hotz
03a6bc59c1 move autogen to runtime/autogen (#3254) 2024-01-26 12:44:19 -08:00
George Hotz
a3869ffd46 move gpuctypes in tree (#3253)
* move gpuctypes in tree

* fix mypy

* regex exclude

* autogen sh

* mypy exclude

* does that fix it

* fix mypy

* add hip confirm

* verify all autogens

* build clang2py

* opencl headers

* gpu on 22.04
2024-01-26 12:25:03 -08:00
chenyu
bc92c4cc32 onnx Einsum, CumSum, DepthToSpace, SpaceToDepth (#3252)
* onnx Einsum, CumSum, DepthToSpace, SpaceToDepth

Einsum inner product and `...` are not supported

* --durations=20
2024-01-26 10:47:53 -05:00
chenyu
e45ffdb6cf cleanup onnx (#3249)
* add onnx test_reduce_log_sum_exp

* more reuse

* more

* stuff

* good CenterCropPad

* imports

* good ArrayFeatureExtractor

* pretty good Pad

* stuff

* stuff

* onnx.py

* Atan

* pass int8 test

* dtype related

* fastmath stuff

* Resize linear

* fix CI

* move back
2024-01-25 20:39:59 -05:00
Ahmed Harmouche
168b1f879c Fix hip_matmul gemm in extra (#3241) 2024-01-25 16:03:04 -08:00
George Hotz
7feeb118e6 hip launch speed (#3246)
* faster HIP kernel launch

* args

* expand compile_hip
2024-01-25 15:13:55 -08:00
George Hotz
cb372b053f add device speed test (#3244) 2024-01-25 12:01:22 -08:00
geohotstan
d0e116c6d6 fix maximum/where Scalar casting (#3194)
* init

* test: added dtype tests for maximum

* fix: seperate maximum const and maximum tensors

* fix: del useless line

* fix: some dtypes

* CODE GOLF: we golfing at mar-a-lago golf club tonight boyyyys

* fix: add lil helper function

* fix: some test refactoring

* done

* sike: not done yet lol

* wtf I missed an assert, am I drunk

* yeah idk

* fix: line save from redundant check

* revert: line save

* fix: simplify test_broadcast cuz I'm stumped

* change some test name

* fix: bool max bool  works

* test: add a maximum bool test

* test: make sure minimum also works with bool

* fix: something like this? :s

* fix: maybe this?

* fix: how about this? tighter check

* fix: this.

* revert: nvm mul(0.5) and div(2) has the same kernel for backward

* fix: .is_floating_point() xD

* revert: maximum and minimum and add cast

* fix: cover negative const case in test

* fix: use eq because I don't understand clang :D

* WHOOOOPS
2024-01-25 12:26:04 -05:00
geohotstan
3628bea910 fix: big round even rounder round (#3242)
* fix: big round even rounder round

* fix: variable name lol

* feat: 1 less potential cast

* consistant naming (im just spaming commits now)

* LOL MISSED ONNX ANOTHER COMMIT

* test: fix test_ops and remove _round

* test: tensor methods oops
2024-01-25 12:24:15 -05:00
chenyu
da5e27968c failed test cases for Tensor.round (#3240)
it should round to even
2024-01-25 02:12:50 -05:00
geohotstan
b0b5eba535 fix _round in onnx_ops to look more like new Tensor.round (#3239)
* fix: _round in onnxops

* fix: minor things

* fix: no more n

* fix: smol

* fix: smoller
2024-01-25 01:18:58 -05:00
George Hotz
aa0d1b6330 hotfix: don't use noqa: E702 that's just dumb 2024-01-24 20:01:00 -08:00
George Hotz
b92945c98d hotfix: DEBUG >= 2 for kernels 2024-01-24 23:55:17 +00:00
George Hotz
a8fbb03438 minor hip cleanups (#3237) 2024-01-24 15:13:38 -08:00
nimlgen
3205fd8481 fix cuda device var rewrite (#3233) 2024-01-24 16:57:49 -05:00
George Hotz
ed8a32722a hip mutex signal (#3234)
* hip mutex

* hip mutex 2

* sync
2024-01-24 13:23:09 -08:00
George Hotz
47f9887ce4 hip events work (#3229)
* hip events work

* event
2024-01-24 11:49:53 -08:00
George Hotz
de7a3a56ff save lines in llvm (#3231)
* save lines in llvm

* no implied cast in load

* no cast in gate
2024-01-24 11:40:53 -08:00
George Hotz
83d614295e reduce lines (#3230) 2024-01-24 10:35:59 -08:00
chenyu
afeadbedc9 touch up Tensor.round and Tensor.neg (#3228) 2024-01-24 12:29:37 -05:00
Obada Khalili
0e103b4aa0 implement Tensor.round (#3225) 2024-01-24 11:49:17 -05:00
geohotstan
842053873d fix neg logical_not inconsistencies (#3222)
* try

* test: add logical_not tests

* gah im retarded, but this doesn't match types for const()

* fix: can't we jsut do this?

* big change: I don't actually know what I'm doing

* WOOO IM JUST CHANGING EVERYTHING WOW probably gon revert later

* BYE BYE noqa: E501

* fix: less lines and add test

* fix: rm 2 redundant tests

* fix: eq with False so we don't unintentionally implicit upcast, but it's bool anyways so w/e
2024-01-24 11:48:40 -05:00
George Hotz
e2e4632aea LoadOps SYNC (#3223)
* LoadOps SYNC and WAIT

* no wait, only sync

* DEBUG >= 1

* track cross device
2024-01-23 21:59:18 -08:00