Commit Graph

3475 Commits

Author SHA1 Message Date
chenyu
da5e27968c failed test cases for Tensor.round (#3240)
it should round to even
2024-01-25 02:12:50 -05:00
geohotstan
b0b5eba535 fix _round in onnx_ops to look more like new Tensor.round (#3239)
* fix: _round in onnxops

* fix: minor things

* fix: no more n

* fix: smol

* fix: smoller
2024-01-25 01:18:58 -05:00
George Hotz
aa0d1b6330 hotfix: don't use noqa: E702 that's just dumb 2024-01-24 20:01:00 -08:00
George Hotz
b92945c98d hotfix: DEBUG >= 2 for kernels 2024-01-24 23:55:17 +00:00
George Hotz
a8fbb03438 minor hip cleanups (#3237) 2024-01-24 15:13:38 -08:00
nimlgen
3205fd8481 fix cuda device var rewrite (#3233) 2024-01-24 16:57:49 -05:00
George Hotz
ed8a32722a hip mutex signal (#3234)
* hip mutex

* hip mutex 2

* sync
2024-01-24 13:23:09 -08:00
George Hotz
47f9887ce4 hip events work (#3229)
* hip events work

* event
2024-01-24 11:49:53 -08:00
George Hotz
de7a3a56ff save lines in llvm (#3231)
* save lines in llvm

* no implied cast in load

* no cast in gate
2024-01-24 11:40:53 -08:00
George Hotz
83d614295e reduce lines (#3230) 2024-01-24 10:35:59 -08:00
chenyu
afeadbedc9 touch up Tensor.round and Tensor.neg (#3228) 2024-01-24 12:29:37 -05:00
Obada Khalili
0e103b4aa0 implement Tensor.round (#3225) 2024-01-24 11:49:17 -05:00
geohotstan
842053873d fix neg logical_not inconsistencies (#3222)
* try

* test: add logical_not tests

* gah im retarded, but this doesn't match types for const()

* fix: can't we jsut do this?

* big change: I don't actually know what I'm doing

* WOOO IM JUST CHANGING EVERYTHING WOW probably gon revert later

* BYE BYE noqa: E501

* fix: less lines and add test

* fix: rm 2 redundant tests

* fix: eq with False so we don't unintentionally implicit upcast, but it's bool anyways so w/e
2024-01-24 11:48:40 -05:00
George Hotz
e2e4632aea LoadOps SYNC (#3223)
* LoadOps SYNC and WAIT

* no wait, only sync

* DEBUG >= 1

* track cross device
2024-01-23 21:59:18 -08:00
chenyu
2f4b3ab1c0 shard and to should preserve requires_grad (#3224)
dtypes are inferred from underlying lazydata, requires_grad needs to be passed explicitly
2024-01-24 00:15:10 -05:00
George Hotz
23b084e70a add device name to device, all are constructed (#3221) 2024-01-23 20:34:56 -08:00
George Hotz
91a1b2bd7a the runner does the build (#3220) 2024-01-23 18:45:43 -08:00
chenyu
9e5409be6c cifar move GlobalCounters.reset() before shard (#3217)
* cifar move GlobalCounters.reset() before shard

also shard mini batch inplace

* don't eval with DISABLE_BACKWARD
2024-01-23 16:07:43 -05:00
Francis Lam
595d05a250 test: fix test_linearizer to use the correct tc_dims (#3218)
also re-enable the test_tensor_core_opts
2024-01-23 16:07:31 -05:00
chenyu
3c179cc27c cifar only shuffle data at epoch start (#3216)
save 1ms CPU time per batch. also only shuffle training set
2024-01-23 14:41:22 -05:00
George Hotz
4a07ea355d buffer options should work (#3211)
* buffer options should work

* minor

* fix dtype
2024-01-22 19:23:55 -08:00
George Hotz
a06f34ae42 remove dead lines from cstyle (#3212)
* remove dead lines from cstyle

* external_local_bufs is dead

* more lines

* minor cleanup
2024-01-22 18:59:19 -08:00
chenyu
8465938d29 minor hlb_cifar cleanups (#3208)
mostly cosmetic. LATEBEAM=4 single 7900xtx 59.2 seconds
2024-01-22 12:38:39 -05:00
David Hou
3378625773 name upcast variables (#3200)
* name upcast variables

* typing

* unused
2024-01-22 11:37:28 -05:00
chenyu
827b7a3c64 cleanup pad_reflect and make_square_mask in hlb_cifar (#3206)
removed some complicated looking stuff. no wall time difference
2024-01-22 11:30:46 -05:00
chenyu
99884f4c98 cifar flags for RANDOM_CROP, RANDOM_FLIP, and CUTMIX (#3204)
experimenting with different setups, also would like to jit the data augmentation next
2024-01-22 01:12:51 -05:00
chenyu
53afec2841 add HALF to handcode_resnet50_opt.py (#3202)
use this to study tensor cores on HIP
2024-01-21 23:03:59 -05:00
chenyu
836883fedc comment out cutmix in hlb_cifar (#3201)
it's no-op with multi gpu and less STEPS. also the patch was selected from the whole dataset, not from the same batch
2024-01-21 22:24:53 -05:00
chenyu
e6c71f1b26 fix device of Tensor.arange inside Tensor.one_hot (#3199)
it should have the same device as self
2024-01-21 21:03:50 -05:00
chenyu
f7d1c42239 cleanup noop prefixes in _pool (#3198)
* cleanup noop prefixes in _pool

make expand dim=None as noop (in addition to -1). then slice, reshape, expand in _pool can share the same noop prefix

* nit

* something then reshape style

* that's repeat
2024-01-21 20:03:32 -05:00
uuuvn
640e5c36ad Fix metal tests broken by 3f56d1a (#3196)
* Remove from binary_operations before copying binary_operations into integer_binary_operations

* Also remove lt and eq if running on METAL
2024-01-21 11:53:25 -05:00
chenyu
b9d27636aa cleanup test_ops.py (#3192)
- removed exact duplicated tests
- only kept one function if torch_fxn is the same as tinygrad_fxn
- used tensor method instead of class method style
- replaced unneeded `lamdba f: f(x)` with just `f`
- re-enabled commented tests that work now
- removed some forward_only now 0 shape tensor can backward
2024-01-20 20:08:56 -05:00
chenyu
3f56d1a5e8 add operator.lt and operator.eq to test_dtype_alu (#3191)
* add operator.lt and operator.eq to test_dtype_alu

those should pass now as we have broadcasted before passing to lt and eq.
also updated the test skipping criteria to reuse test_dtype.is_dtype_supported

* llvm lt nan is incorrect

* enable truediv too

* Revert "enable truediv too"

This reverts commit df703235fb.

* just that
2024-01-20 14:54:02 -05:00
chenyu
c4b5661146 fuzz length for multitensor reduce test case (#3190)
so that the uneven case is not just with 0 length and can have other positve values
2024-01-20 00:44:38 -05:00
chenyu
fdb1c2b1d9 move reduce over 0 len axis logic to lazy.py (#3188)
* move reduce over 0 len axis logic to lazy.py

this fixed uneven shard reduce case if the uneven one has length 0

* fix interpreted backends

* fix backwards for 0 shape tensors too
2024-01-20 00:13:03 -05:00
chenyu
485332935e ring copy example (#3185)
* ring copy example

* use ones for init
2024-01-19 23:34:30 -05:00
George Hotz
254a7372fe buffer copy refactor (#3187) 2024-01-19 20:21:24 -08:00
chenyu
fb4bd2a57d reenable padto to search action (#3183) 2024-01-19 14:17:53 -05:00
chenyu
cb4cfc078a parameterize multitensor tests for reduce (#3181)
uneven shards reduce is incorrect now
2024-01-19 14:03:01 -05:00
nimlgen
5097d5b808 fix padto when with late reduce (#3180)
* fix padto test

* no long comment
2024-01-19 14:01:44 -05:00
George Hotz
729a01bf3e complex PRs will not be merged 2024-01-19 10:58:47 -08:00
nimlgen
f87ecbb0f3 fuzzer validates outputs + (partially) oob accesses (#3178)
* fuzzer validates outputs + (partially) oob accesses

* +random

* oob check only for compiled

* type cmp fixes

* fix zeroing

* no prints

* add seed
2024-01-19 13:34:51 -05:00
chenyu
b2571d586c hypothesis.st -> hypothesis.strat (#3179)
leave `st` for shapetracker
2024-01-19 11:55:26 -05:00
chenyu
c4faedebf3 add test cases for negative entry max allreduce (#3177) 2024-01-18 22:26:51 -05:00
chenyu
ab1b7c4d09 fix allreduce for max (#3175)
* test cases to show allreduce for max is incorrect

* oh fixed
2024-01-18 20:25:35 -05:00
George Hotz
c51c90bcd4 more sync in transfer (#3174) 2024-01-18 17:17:03 -08:00
chenyu
28dcbf0e00 test case sharded batchnorm has different ast on devices (#3172) 2024-01-18 18:12:15 -05:00
chenyu
a60d50487d disable padto, seems to have bug in gpt2 (#3173) 2024-01-18 18:09:30 -05:00
George Hotz
c80884884e event driven hip (#3160)
* event driven hip

* simpler, src makes copy

* pass mypy
2024-01-18 14:35:18 -08:00
George Hotz
d2aab65958 remove unused expr node (#3170)
* remove unused expr node

* still works

* simple expr_idxs

* fixup typing
2024-01-18 14:18:43 -08:00