George Hotz
09f2952dc3
reintroduce merge views in update benchmark ( #3279 )
...
* Reapply "take merge views from corsix branch" (#3278 )
This reverts commit d298916232 .
* reintroduce merge views
2024-01-30 09:47:20 -08:00
George Hotz
d298916232
Revert "take merge views from corsix branch" ( #3278 )
2024-01-30 09:34:28 -08:00
George Hotz
b57a16aa89
take merge views from corsix branch ( #3273 )
...
* take merge views from corsix branch
* better DEBUG
* max views
* remove view.py change
* Revert "remove view.py change"
This reverts commit f3025f4f39 .
* only allow filter on non symbolic
* oops, correct fix
* comment to explain
2024-01-30 09:25:16 -08:00
George Hotz
6a4a5dc79d
fix pad 0 size ( #3277 )
...
* fix pad 0 size
* put in view, not pad
* test was wrong
2024-01-30 08:58:10 -08:00
chenyu
b0a755288f
cifar EVAL_BS set default value to BS ( #3274 )
...
less compile time for eval due to cache. 500 was a slow uneven number for 6 GPU too. eval time 5.9s -> 3.4s
2024-01-29 17:37:12 -05:00
Francis Lam
861d5ac224
wmma: fix the upcasts after WMMA to be hcopt ordering invariant ( #3250 )
...
will correctly handle and permutation of optops after the TC one
2024-01-29 11:51:57 -08:00
chenyu
af4ca85594
MultiLazyBuffer.reshape new_axis without real_strides ( #3272 )
...
similar to contraction, but this is one is for finding the mapped single axis
2024-01-28 23:53:52 -05:00
chenyu
34c7621556
HIP=1 NOCLANG=1 for tinybox external_model_benchmark ( #3270 )
...
used HIP instead of GPU and disabled slow CLANG
2024-01-28 22:05:26 -05:00
George Hotz
085dc87bed
winograd should be 4 kernels ( #3268 )
2024-01-28 09:21:26 -08:00
George Hotz
f48b6aca77
long running beam pool ( #3267 )
2024-01-28 08:06:03 -08:00
George Hotz
9e17378b60
Fix metal tests ( #3266 )
...
* small fixes for tests on mac
* remove device from TensorCore
2024-01-27 18:09:42 -08:00
Francis Lata
86748f4a8c
fix bbox format to be a list ( #3265 )
2024-01-27 17:54:19 -08:00
George Hotz
67a78615e5
uoptimizer ( #3262 )
...
* uoptimizer
* uops
* self.uoptimize
2024-01-27 10:26:04 -08:00
Hristo Georgiev
3ae811af21
tests for Tensor init data dtype and resulting dtype ( #3247 )
...
Co-authored-by: Hristo Georgiev <6043312+hristog@users.noreply.github.com >
2024-01-27 00:13:42 -08:00
George Hotz
3c728d1082
compiler support ( #3260 )
...
* compiler support
* revert that
* fix tests
2024-01-26 23:36:40 -08:00
Francis Lam
4273aabe31
extra/gemm: add a simple_conv.py along with correctness check ( #3236 )
...
* extra/gemm: add a simple_conv.py along with correctness check
The goal is to easily test tensor core triggering situations
* test: add tests for acc_dtype handling and fixed typing
2024-01-26 19:06:57 -08:00
George Hotz
0aad8d238b
rebuild ocelot ( #3259 )
...
* rebuild
* strip trailing whitespace
2024-01-26 18:46:36 -08:00
George Hotz
473935125a
use comgr to compile ( #3248 )
...
* use comgr to compile
* fast
* bfloat16
* move comgr to it's own file
* cleaner style
* comgr in new place
* comgr free + dtype cleanup
2024-01-26 18:27:49 -08:00
George Hotz
c4d870db0d
fix jit realize issue ( #3258 )
2024-01-26 18:27:35 -08:00
chenyu
4197ef17c4
const cleanup with dtype.Scalar ( #3257 )
...
moved Scalar to dtype.py. assert in _broadcasted when y is a Scalar and
fix some tests
2024-01-26 21:16:22 -05:00
George Hotz
03a6bc59c1
move autogen to runtime/autogen ( #3254 )
2024-01-26 12:44:19 -08:00
George Hotz
a3869ffd46
move gpuctypes in tree ( #3253 )
...
* move gpuctypes in tree
* fix mypy
* regex exclude
* autogen sh
* mypy exclude
* does that fix it
* fix mypy
* add hip confirm
* verify all autogens
* build clang2py
* opencl headers
* gpu on 22.04
2024-01-26 12:25:03 -08:00
chenyu
bc92c4cc32
onnx Einsum, CumSum, DepthToSpace, SpaceToDepth ( #3252 )
...
* onnx Einsum, CumSum, DepthToSpace, SpaceToDepth
Einsum inner product and `...` are not supported
* --durations=20
2024-01-26 10:47:53 -05:00
chenyu
e45ffdb6cf
cleanup onnx ( #3249 )
...
* add onnx test_reduce_log_sum_exp
* more reuse
* more
* stuff
* good CenterCropPad
* imports
* good ArrayFeatureExtractor
* pretty good Pad
* stuff
* stuff
* onnx.py
* Atan
* pass int8 test
* dtype related
* fastmath stuff
* Resize linear
* fix CI
* move back
2024-01-25 20:39:59 -05:00
Ahmed Harmouche
168b1f879c
Fix hip_matmul gemm in extra ( #3241 )
2024-01-25 16:03:04 -08:00
George Hotz
7feeb118e6
hip launch speed ( #3246 )
...
* faster HIP kernel launch
* args
* expand compile_hip
2024-01-25 15:13:55 -08:00
George Hotz
cb372b053f
add device speed test ( #3244 )
2024-01-25 12:01:22 -08:00
geohotstan
d0e116c6d6
fix maximum/where Scalar casting ( #3194 )
...
* init
* test: added dtype tests for maximum
* fix: seperate maximum const and maximum tensors
* fix: del useless line
* fix: some dtypes
* CODE GOLF: we golfing at mar-a-lago golf club tonight boyyyys
* fix: add lil helper function
* fix: some test refactoring
* done
* sike: not done yet lol
* wtf I missed an assert, am I drunk
* yeah idk
* fix: line save from redundant check
* revert: line save
* fix: simplify test_broadcast cuz I'm stumped
* change some test name
* fix: bool max bool works
* test: add a maximum bool test
* test: make sure minimum also works with bool
* fix: something like this? :s
* fix: maybe this?
* fix: how about this? tighter check
* fix: this.
* revert: nvm mul(0.5) and div(2) has the same kernel for backward
* fix: .is_floating_point() xD
* revert: maximum and minimum and add cast
* fix: cover negative const case in test
* fix: use eq because I don't understand clang :D
* WHOOOOPS
2024-01-25 12:26:04 -05:00
geohotstan
3628bea910
fix: big round even rounder round ( #3242 )
...
* fix: big round even rounder round
* fix: variable name lol
* feat: 1 less potential cast
* consistant naming (im just spaming commits now)
* LOL MISSED ONNX ANOTHER COMMIT
* test: fix test_ops and remove _round
* test: tensor methods oops
2024-01-25 12:24:15 -05:00
chenyu
da5e27968c
failed test cases for Tensor.round ( #3240 )
...
it should round to even
2024-01-25 02:12:50 -05:00
geohotstan
b0b5eba535
fix _round in onnx_ops to look more like new Tensor.round ( #3239 )
...
* fix: _round in onnxops
* fix: minor things
* fix: no more n
* fix: smol
* fix: smoller
2024-01-25 01:18:58 -05:00
George Hotz
aa0d1b6330
hotfix: don't use noqa: E702 that's just dumb
2024-01-24 20:01:00 -08:00
George Hotz
b92945c98d
hotfix: DEBUG >= 2 for kernels
2024-01-24 23:55:17 +00:00
George Hotz
a8fbb03438
minor hip cleanups ( #3237 )
2024-01-24 15:13:38 -08:00
nimlgen
3205fd8481
fix cuda device var rewrite ( #3233 )
2024-01-24 16:57:49 -05:00
George Hotz
ed8a32722a
hip mutex signal ( #3234 )
...
* hip mutex
* hip mutex 2
* sync
2024-01-24 13:23:09 -08:00
George Hotz
47f9887ce4
hip events work ( #3229 )
...
* hip events work
* event
2024-01-24 11:49:53 -08:00
George Hotz
de7a3a56ff
save lines in llvm ( #3231 )
...
* save lines in llvm
* no implied cast in load
* no cast in gate
2024-01-24 11:40:53 -08:00
George Hotz
83d614295e
reduce lines ( #3230 )
2024-01-24 10:35:59 -08:00
chenyu
afeadbedc9
touch up Tensor.round and Tensor.neg ( #3228 )
2024-01-24 12:29:37 -05:00
Obada Khalili
0e103b4aa0
implement Tensor.round ( #3225 )
2024-01-24 11:49:17 -05:00
geohotstan
842053873d
fix neg logical_not inconsistencies ( #3222 )
...
* try
* test: add logical_not tests
* gah im retarded, but this doesn't match types for const()
* fix: can't we jsut do this?
* big change: I don't actually know what I'm doing
* WOOO IM JUST CHANGING EVERYTHING WOW probably gon revert later
* BYE BYE noqa: E501
* fix: less lines and add test
* fix: rm 2 redundant tests
* fix: eq with False so we don't unintentionally implicit upcast, but it's bool anyways so w/e
2024-01-24 11:48:40 -05:00
George Hotz
e2e4632aea
LoadOps SYNC ( #3223 )
...
* LoadOps SYNC and WAIT
* no wait, only sync
* DEBUG >= 1
* track cross device
2024-01-23 21:59:18 -08:00
chenyu
2f4b3ab1c0
shard and to should preserve requires_grad ( #3224 )
...
dtypes are inferred from underlying lazydata, requires_grad needs to be passed explicitly
2024-01-24 00:15:10 -05:00
George Hotz
23b084e70a
add device name to device, all are constructed ( #3221 )
2024-01-23 20:34:56 -08:00
George Hotz
91a1b2bd7a
the runner does the build ( #3220 )
2024-01-23 18:45:43 -08:00
chenyu
9e5409be6c
cifar move GlobalCounters.reset() before shard ( #3217 )
...
* cifar move GlobalCounters.reset() before shard
also shard mini batch inplace
* don't eval with DISABLE_BACKWARD
2024-01-23 16:07:43 -05:00
Francis Lam
595d05a250
test: fix test_linearizer to use the correct tc_dims ( #3218 )
...
also re-enable the test_tensor_core_opts
2024-01-23 16:07:31 -05:00
chenyu
3c179cc27c
cifar only shuffle data at epoch start ( #3216 )
...
save 1ms CPU time per batch. also only shuffle training set
2024-01-23 14:41:22 -05:00
George Hotz
4a07ea355d
buffer options should work ( #3211 )
...
* buffer options should work
* minor
* fix dtype
2024-01-22 19:23:55 -08:00