chenyu
da5e27968c
failed test cases for Tensor.round ( #3240 )
...
it should round to even
2024-01-25 02:12:50 -05:00
geohotstan
b0b5eba535
fix _round in onnx_ops to look more like new Tensor.round ( #3239 )
...
* fix: _round in onnxops
* fix: minor things
* fix: no more n
* fix: smol
* fix: smoller
2024-01-25 01:18:58 -05:00
George Hotz
aa0d1b6330
hotfix: don't use noqa: E702 that's just dumb
2024-01-24 20:01:00 -08:00
George Hotz
b92945c98d
hotfix: DEBUG >= 2 for kernels
2024-01-24 23:55:17 +00:00
George Hotz
a8fbb03438
minor hip cleanups ( #3237 )
2024-01-24 15:13:38 -08:00
nimlgen
3205fd8481
fix cuda device var rewrite ( #3233 )
2024-01-24 16:57:49 -05:00
George Hotz
ed8a32722a
hip mutex signal ( #3234 )
...
* hip mutex
* hip mutex 2
* sync
2024-01-24 13:23:09 -08:00
George Hotz
47f9887ce4
hip events work ( #3229 )
...
* hip events work
* event
2024-01-24 11:49:53 -08:00
George Hotz
de7a3a56ff
save lines in llvm ( #3231 )
...
* save lines in llvm
* no implied cast in load
* no cast in gate
2024-01-24 11:40:53 -08:00
George Hotz
83d614295e
reduce lines ( #3230 )
2024-01-24 10:35:59 -08:00
chenyu
afeadbedc9
touch up Tensor.round and Tensor.neg ( #3228 )
2024-01-24 12:29:37 -05:00
Obada Khalili
0e103b4aa0
implement Tensor.round ( #3225 )
2024-01-24 11:49:17 -05:00
geohotstan
842053873d
fix neg logical_not inconsistencies ( #3222 )
...
* try
* test: add logical_not tests
* gah im retarded, but this doesn't match types for const()
* fix: can't we jsut do this?
* big change: I don't actually know what I'm doing
* WOOO IM JUST CHANGING EVERYTHING WOW probably gon revert later
* BYE BYE noqa: E501
* fix: less lines and add test
* fix: rm 2 redundant tests
* fix: eq with False so we don't unintentionally implicit upcast, but it's bool anyways so w/e
2024-01-24 11:48:40 -05:00
George Hotz
e2e4632aea
LoadOps SYNC ( #3223 )
...
* LoadOps SYNC and WAIT
* no wait, only sync
* DEBUG >= 1
* track cross device
2024-01-23 21:59:18 -08:00
chenyu
2f4b3ab1c0
shard and to should preserve requires_grad ( #3224 )
...
dtypes are inferred from underlying lazydata, requires_grad needs to be passed explicitly
2024-01-24 00:15:10 -05:00
George Hotz
23b084e70a
add device name to device, all are constructed ( #3221 )
2024-01-23 20:34:56 -08:00
George Hotz
91a1b2bd7a
the runner does the build ( #3220 )
2024-01-23 18:45:43 -08:00
chenyu
9e5409be6c
cifar move GlobalCounters.reset() before shard ( #3217 )
...
* cifar move GlobalCounters.reset() before shard
also shard mini batch inplace
* don't eval with DISABLE_BACKWARD
2024-01-23 16:07:43 -05:00
Francis Lam
595d05a250
test: fix test_linearizer to use the correct tc_dims ( #3218 )
...
also re-enable the test_tensor_core_opts
2024-01-23 16:07:31 -05:00
chenyu
3c179cc27c
cifar only shuffle data at epoch start ( #3216 )
...
save 1ms CPU time per batch. also only shuffle training set
2024-01-23 14:41:22 -05:00
George Hotz
4a07ea355d
buffer options should work ( #3211 )
...
* buffer options should work
* minor
* fix dtype
2024-01-22 19:23:55 -08:00
George Hotz
a06f34ae42
remove dead lines from cstyle ( #3212 )
...
* remove dead lines from cstyle
* external_local_bufs is dead
* more lines
* minor cleanup
2024-01-22 18:59:19 -08:00
chenyu
8465938d29
minor hlb_cifar cleanups ( #3208 )
...
mostly cosmetic. LATEBEAM=4 single 7900xtx 59.2 seconds
2024-01-22 12:38:39 -05:00
David Hou
3378625773
name upcast variables ( #3200 )
...
* name upcast variables
* typing
* unused
2024-01-22 11:37:28 -05:00
chenyu
827b7a3c64
cleanup pad_reflect and make_square_mask in hlb_cifar ( #3206 )
...
removed some complicated looking stuff. no wall time difference
2024-01-22 11:30:46 -05:00
chenyu
99884f4c98
cifar flags for RANDOM_CROP, RANDOM_FLIP, and CUTMIX ( #3204 )
...
experimenting with different setups, also would like to jit the data augmentation next
2024-01-22 01:12:51 -05:00
chenyu
53afec2841
add HALF to handcode_resnet50_opt.py ( #3202 )
...
use this to study tensor cores on HIP
2024-01-21 23:03:59 -05:00
chenyu
836883fedc
comment out cutmix in hlb_cifar ( #3201 )
...
it's no-op with multi gpu and less STEPS. also the patch was selected from the whole dataset, not from the same batch
2024-01-21 22:24:53 -05:00
chenyu
e6c71f1b26
fix device of Tensor.arange inside Tensor.one_hot ( #3199 )
...
it should have the same device as self
2024-01-21 21:03:50 -05:00
chenyu
f7d1c42239
cleanup noop prefixes in _pool ( #3198 )
...
* cleanup noop prefixes in _pool
make expand dim=None as noop (in addition to -1). then slice, reshape, expand in _pool can share the same noop prefix
* nit
* something then reshape style
* that's repeat
2024-01-21 20:03:32 -05:00
uuuvn
640e5c36ad
Fix metal tests broken by 3f56d1a ( #3196 )
...
* Remove from binary_operations before copying binary_operations into integer_binary_operations
* Also remove lt and eq if running on METAL
2024-01-21 11:53:25 -05:00
chenyu
b9d27636aa
cleanup test_ops.py ( #3192 )
...
- removed exact duplicated tests
- only kept one function if torch_fxn is the same as tinygrad_fxn
- used tensor method instead of class method style
- replaced unneeded `lamdba f: f(x)` with just `f`
- re-enabled commented tests that work now
- removed some forward_only now 0 shape tensor can backward
2024-01-20 20:08:56 -05:00
chenyu
3f56d1a5e8
add operator.lt and operator.eq to test_dtype_alu ( #3191 )
...
* add operator.lt and operator.eq to test_dtype_alu
those should pass now as we have broadcasted before passing to lt and eq.
also updated the test skipping criteria to reuse test_dtype.is_dtype_supported
* llvm lt nan is incorrect
* enable truediv too
* Revert "enable truediv too"
This reverts commit df703235fb .
* just that
2024-01-20 14:54:02 -05:00
chenyu
c4b5661146
fuzz length for multitensor reduce test case ( #3190 )
...
so that the uneven case is not just with 0 length and can have other positve values
2024-01-20 00:44:38 -05:00
chenyu
fdb1c2b1d9
move reduce over 0 len axis logic to lazy.py ( #3188 )
...
* move reduce over 0 len axis logic to lazy.py
this fixed uneven shard reduce case if the uneven one has length 0
* fix interpreted backends
* fix backwards for 0 shape tensors too
2024-01-20 00:13:03 -05:00
chenyu
485332935e
ring copy example ( #3185 )
...
* ring copy example
* use ones for init
2024-01-19 23:34:30 -05:00
George Hotz
254a7372fe
buffer copy refactor ( #3187 )
2024-01-19 20:21:24 -08:00
chenyu
fb4bd2a57d
reenable padto to search action ( #3183 )
2024-01-19 14:17:53 -05:00
chenyu
cb4cfc078a
parameterize multitensor tests for reduce ( #3181 )
...
uneven shards reduce is incorrect now
2024-01-19 14:03:01 -05:00
nimlgen
5097d5b808
fix padto when with late reduce ( #3180 )
...
* fix padto test
* no long comment
2024-01-19 14:01:44 -05:00
George Hotz
729a01bf3e
complex PRs will not be merged
2024-01-19 10:58:47 -08:00
nimlgen
f87ecbb0f3
fuzzer validates outputs + (partially) oob accesses ( #3178 )
...
* fuzzer validates outputs + (partially) oob accesses
* +random
* oob check only for compiled
* type cmp fixes
* fix zeroing
* no prints
* add seed
2024-01-19 13:34:51 -05:00
chenyu
b2571d586c
hypothesis.st -> hypothesis.strat ( #3179 )
...
leave `st` for shapetracker
2024-01-19 11:55:26 -05:00
chenyu
c4faedebf3
add test cases for negative entry max allreduce ( #3177 )
2024-01-18 22:26:51 -05:00
chenyu
ab1b7c4d09
fix allreduce for max ( #3175 )
...
* test cases to show allreduce for max is incorrect
* oh fixed
2024-01-18 20:25:35 -05:00
George Hotz
c51c90bcd4
more sync in transfer ( #3174 )
2024-01-18 17:17:03 -08:00
chenyu
28dcbf0e00
test case sharded batchnorm has different ast on devices ( #3172 )
2024-01-18 18:12:15 -05:00
chenyu
a60d50487d
disable padto, seems to have bug in gpt2 ( #3173 )
2024-01-18 18:09:30 -05:00
George Hotz
c80884884e
event driven hip ( #3160 )
...
* event driven hip
* simpler, src makes copy
* pass mypy
2024-01-18 14:35:18 -08:00
George Hotz
d2aab65958
remove unused expr node ( #3170 )
...
* remove unused expr node
* still works
* simple expr_idxs
* fixup typing
2024-01-18 14:18:43 -08:00