Commit Graph

3519 Commits

Author SHA1 Message Date
David Hou
aebaab011f faster wino compile by catting consts across data expand dim (#3293)
* PoC faster wino compile by catting consts across data expand dim

* fix fusions

* faster + golf it

* noqa 501

* implicit broadcast

* Revert "implicit broadcast"

This reverts commit 5915a9083d045ec1e6be84dcb492333325d48666.

* shorter

* shorter

* oops

* 216 upcasts is probably fine

* wino kernel count test

* test winograd number of sts

* specify device for apply_matrix mat elements
2024-02-02 03:47:45 -05:00
David Hou
cf6f478901 limit group_for_reduce bufs to 32kb (#3299)
hipcc crashes for buffers that are too large
2024-02-02 03:13:12 -05:00
chenyu
b564660637 type annotation for Compiler.cachekey and minor cleanup (#3298) 2024-02-01 21:31:21 -05:00
Felix Wu
021eea3a52 fix UnboundLocalError when running Compiler with DISABLE_COMPILER_CACHE (#3296) 2024-02-01 21:12:33 -05:00
chenyu
a5bf4afc1a update ruff.toml for v0.2.0 (#3297)
select -> lint.select.

also added rule names for fully specified ones
2024-02-01 20:50:20 -05:00
chenyu
9196b11dfb test_ops sinh/cosh/asinh/acosh/atanh (#3294)
some have numerical issues at large input similar to sigmoid
2024-02-01 03:10:11 -05:00
Francis Lam
927f2dd24d wmma: add HIP FP16 to FP16 tensor core (#3287)
* wmma: add HIP FP16 to FP16 tensor core

* test: fix test_tensor_core to use separate tolerances for half
2024-01-31 23:00:51 -05:00
chenyu
18e854cdbf shrink MLB on sharded axis (#3255)
* shrink MLB on sharded axis

use onehot structure to store the real partition. goal is unsynced batchnorm2d that can be run on multigpu for training.

draft version in https://github.com/chenyuxyz/tinygrad/pull/109

* SYNCBN flag

* test unclean shrinks

* UnsyncedBatchNorm reuses BatchNorm

* more robust pad arg check

* better types

* more tests!

* 6 gpus in benchmark

* disable slow GPUS=6 benchmark
2024-01-31 21:48:25 -05:00
chenyu
a3652e6ddc minor cleanups to test_ops (#3290)
- removed noop a=0
- fixed integer div test
- added test for both python expression and Tensor method call
- reordered for consistency and added some spaces
2024-01-31 19:01:25 -05:00
chenyu
77251336d5 fix handcode_resnet50_opt.py (#3289)
linearizer_opts has moved. also update the logging to print after total_tm update
2024-01-31 19:01:08 -05:00
chenyu
9b8c1a0408 Tensor.batchnorm works more than 2d and reuse in onnx (#3284) 2024-01-30 19:02:45 -05:00
chenyu
7816c3b692 onnx update for trilu and argmax (#3283)
* support 0 in shape for tril and triu

* select_last_index for ArgMax and ArgMin

* pass **kwargs
2024-01-30 18:39:16 -05:00
qazal
5b46b0ff3d Simple RDNA3 emulator (#2974)
* mockhip->hipcpu

* allocate buffers

* launch a kernel

read_asm api

* run remu in CI

* remu 0.0.2, real test ops

* simple driver

* 0.0.3, all test_ops

* run the latest emulator

* 9 minutes is way too long, drop backprop in CI

* bring back the backward pass

* Revert "bring back the backward pass"

This reverts commit 3781e1bc56.

* Print slowest tests

* emulated device directly in ops_hip

* fix ruff, override mypy for specific rules

* test in the same code path

- hip backend env variables

- install packages and verify autogen

- run certain tests

- remove the other hip tests path

- verify Device.DEFAULT

* remove the emulated hip in extra

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-01-30 10:39:28 -08:00
George Hotz
247a8a2a6c add canonicalization to View.create (#3280)
* Reapply "take merge views from corsix branch" (#3278)

This reverts commit d298916232.

* reintroduce merge views

* update second any

* isinstance -> not

* 25% less same but unequal
2024-01-30 10:26:48 -08:00
George Hotz
d8f6280ffb hotfix: add CHECK_NEQ to fuzz_shapetracker_math 2024-01-30 10:07:54 -08:00
George Hotz
09f2952dc3 reintroduce merge views in update benchmark (#3279)
* Reapply "take merge views from corsix branch" (#3278)

This reverts commit d298916232.

* reintroduce merge views
2024-01-30 09:47:20 -08:00
George Hotz
d298916232 Revert "take merge views from corsix branch" (#3278) 2024-01-30 09:34:28 -08:00
George Hotz
b57a16aa89 take merge views from corsix branch (#3273)
* take merge views from corsix branch

* better DEBUG

* max views

* remove view.py change

* Revert "remove view.py change"

This reverts commit f3025f4f39.

* only allow filter on non symbolic

* oops, correct fix

* comment to explain
2024-01-30 09:25:16 -08:00
George Hotz
6a4a5dc79d fix pad 0 size (#3277)
* fix pad 0 size

* put in view, not pad

* test was wrong
2024-01-30 08:58:10 -08:00
chenyu
b0a755288f cifar EVAL_BS set default value to BS (#3274)
less compile time for eval due to cache. 500 was a slow uneven number for 6 GPU too. eval time 5.9s -> 3.4s
2024-01-29 17:37:12 -05:00
Francis Lam
861d5ac224 wmma: fix the upcasts after WMMA to be hcopt ordering invariant (#3250)
will correctly handle and permutation of optops after the TC one
2024-01-29 11:51:57 -08:00
chenyu
af4ca85594 MultiLazyBuffer.reshape new_axis without real_strides (#3272)
similar to contraction, but this is one is for finding the mapped single axis
2024-01-28 23:53:52 -05:00
chenyu
34c7621556 HIP=1 NOCLANG=1 for tinybox external_model_benchmark (#3270)
used HIP instead of GPU and disabled slow CLANG
2024-01-28 22:05:26 -05:00
George Hotz
085dc87bed winograd should be 4 kernels (#3268) 2024-01-28 09:21:26 -08:00
George Hotz
f48b6aca77 long running beam pool (#3267) 2024-01-28 08:06:03 -08:00
George Hotz
9e17378b60 Fix metal tests (#3266)
* small fixes for tests on mac

* remove device from TensorCore
2024-01-27 18:09:42 -08:00
Francis Lata
86748f4a8c fix bbox format to be a list (#3265) 2024-01-27 17:54:19 -08:00
George Hotz
67a78615e5 uoptimizer (#3262)
* uoptimizer

* uops

* self.uoptimize
2024-01-27 10:26:04 -08:00
Hristo Georgiev
3ae811af21 tests for Tensor init data dtype and resulting dtype (#3247)
Co-authored-by: Hristo Georgiev <6043312+hristog@users.noreply.github.com>
2024-01-27 00:13:42 -08:00
George Hotz
3c728d1082 compiler support (#3260)
* compiler support

* revert that

* fix tests
2024-01-26 23:36:40 -08:00
Francis Lam
4273aabe31 extra/gemm: add a simple_conv.py along with correctness check (#3236)
* extra/gemm: add a simple_conv.py along with correctness check

The goal is to easily test tensor core triggering situations

* test: add tests for acc_dtype handling and fixed typing
2024-01-26 19:06:57 -08:00
George Hotz
0aad8d238b rebuild ocelot (#3259)
* rebuild

* strip trailing whitespace
2024-01-26 18:46:36 -08:00
George Hotz
473935125a use comgr to compile (#3248)
* use comgr to compile

* fast

* bfloat16

* move comgr to it's own file

* cleaner style

* comgr in new place

* comgr free + dtype cleanup
2024-01-26 18:27:49 -08:00
George Hotz
c4d870db0d fix jit realize issue (#3258) 2024-01-26 18:27:35 -08:00
chenyu
4197ef17c4 const cleanup with dtype.Scalar (#3257)
moved Scalar to dtype.py. assert in _broadcasted when y is a Scalar and
fix some tests
2024-01-26 21:16:22 -05:00
George Hotz
03a6bc59c1 move autogen to runtime/autogen (#3254) 2024-01-26 12:44:19 -08:00
George Hotz
a3869ffd46 move gpuctypes in tree (#3253)
* move gpuctypes in tree

* fix mypy

* regex exclude

* autogen sh

* mypy exclude

* does that fix it

* fix mypy

* add hip confirm

* verify all autogens

* build clang2py

* opencl headers

* gpu on 22.04
2024-01-26 12:25:03 -08:00
chenyu
bc92c4cc32 onnx Einsum, CumSum, DepthToSpace, SpaceToDepth (#3252)
* onnx Einsum, CumSum, DepthToSpace, SpaceToDepth

Einsum inner product and `...` are not supported

* --durations=20
2024-01-26 10:47:53 -05:00
chenyu
e45ffdb6cf cleanup onnx (#3249)
* add onnx test_reduce_log_sum_exp

* more reuse

* more

* stuff

* good CenterCropPad

* imports

* good ArrayFeatureExtractor

* pretty good Pad

* stuff

* stuff

* onnx.py

* Atan

* pass int8 test

* dtype related

* fastmath stuff

* Resize linear

* fix CI

* move back
2024-01-25 20:39:59 -05:00
Ahmed Harmouche
168b1f879c Fix hip_matmul gemm in extra (#3241) 2024-01-25 16:03:04 -08:00
George Hotz
7feeb118e6 hip launch speed (#3246)
* faster HIP kernel launch

* args

* expand compile_hip
2024-01-25 15:13:55 -08:00
George Hotz
cb372b053f add device speed test (#3244) 2024-01-25 12:01:22 -08:00
geohotstan
d0e116c6d6 fix maximum/where Scalar casting (#3194)
* init

* test: added dtype tests for maximum

* fix: seperate maximum const and maximum tensors

* fix: del useless line

* fix: some dtypes

* CODE GOLF: we golfing at mar-a-lago golf club tonight boyyyys

* fix: add lil helper function

* fix: some test refactoring

* done

* sike: not done yet lol

* wtf I missed an assert, am I drunk

* yeah idk

* fix: line save from redundant check

* revert: line save

* fix: simplify test_broadcast cuz I'm stumped

* change some test name

* fix: bool max bool  works

* test: add a maximum bool test

* test: make sure minimum also works with bool

* fix: something like this? :s

* fix: maybe this?

* fix: how about this? tighter check

* fix: this.

* revert: nvm mul(0.5) and div(2) has the same kernel for backward

* fix: .is_floating_point() xD

* revert: maximum and minimum and add cast

* fix: cover negative const case in test

* fix: use eq because I don't understand clang :D

* WHOOOOPS
2024-01-25 12:26:04 -05:00
geohotstan
3628bea910 fix: big round even rounder round (#3242)
* fix: big round even rounder round

* fix: variable name lol

* feat: 1 less potential cast

* consistant naming (im just spaming commits now)

* LOL MISSED ONNX ANOTHER COMMIT

* test: fix test_ops and remove _round

* test: tensor methods oops
2024-01-25 12:24:15 -05:00
chenyu
da5e27968c failed test cases for Tensor.round (#3240)
it should round to even
2024-01-25 02:12:50 -05:00
geohotstan
b0b5eba535 fix _round in onnx_ops to look more like new Tensor.round (#3239)
* fix: _round in onnxops

* fix: minor things

* fix: no more n

* fix: smol

* fix: smoller
2024-01-25 01:18:58 -05:00
George Hotz
aa0d1b6330 hotfix: don't use noqa: E702 that's just dumb 2024-01-24 20:01:00 -08:00
George Hotz
b92945c98d hotfix: DEBUG >= 2 for kernels 2024-01-24 23:55:17 +00:00
George Hotz
a8fbb03438 minor hip cleanups (#3237) 2024-01-24 15:13:38 -08:00
nimlgen
3205fd8481 fix cuda device var rewrite (#3233) 2024-01-24 16:57:49 -05:00