* PoC faster wino compile by catting consts across data expand dim
* fix fusions
* faster + golf it
* noqa 501
* implicit broadcast
* Revert "implicit broadcast"
This reverts commit 5915a9083d045ec1e6be84dcb492333325d48666.
* shorter
* shorter
* oops
* 216 upcasts is probably fine
* wino kernel count test
* test winograd number of sts
* specify device for apply_matrix mat elements
* shrink MLB on sharded axis
use onehot structure to store the real partition. goal is unsynced batchnorm2d that can be run on multigpu for training.
draft version in https://github.com/chenyuxyz/tinygrad/pull/109
* SYNCBN flag
* test unclean shrinks
* UnsyncedBatchNorm reuses BatchNorm
* more robust pad arg check
* better types
* more tests!
* 6 gpus in benchmark
* disable slow GPUS=6 benchmark
- removed noop a=0
- fixed integer div test
- added test for both python expression and Tensor method call
- reordered for consistency and added some spaces
* mockhip->hipcpu
* allocate buffers
* launch a kernel
read_asm api
* run remu in CI
* remu 0.0.2, real test ops
* simple driver
* 0.0.3, all test_ops
* run the latest emulator
* 9 minutes is way too long, drop backprop in CI
* bring back the backward pass
* Revert "bring back the backward pass"
This reverts commit 3781e1bc56.
* Print slowest tests
* emulated device directly in ops_hip
* fix ruff, override mypy for specific rules
* test in the same code path
- hip backend env variables
- install packages and verify autogen
- run certain tests
- remove the other hip tests path
- verify Device.DEFAULT
* remove the emulated hip in extra
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* Reapply "take merge views from corsix branch" (#3278)
This reverts commit d298916232.
* reintroduce merge views
* update second any
* isinstance -> not
* 25% less same but unequal
* extra/gemm: add a simple_conv.py along with correctness check
The goal is to easily test tensor core triggering situations
* test: add tests for acc_dtype handling and fixed typing
* add onnx test_reduce_log_sum_exp
* more reuse
* more
* stuff
* good CenterCropPad
* imports
* good ArrayFeatureExtractor
* pretty good Pad
* stuff
* stuff
* onnx.py
* Atan
* pass int8 test
* dtype related
* fastmath stuff
* Resize linear
* fix CI
* move back
* init
* test: added dtype tests for maximum
* fix: seperate maximum const and maximum tensors
* fix: del useless line
* fix: some dtypes
* CODE GOLF: we golfing at mar-a-lago golf club tonight boyyyys
* fix: add lil helper function
* fix: some test refactoring
* done
* sike: not done yet lol
* wtf I missed an assert, am I drunk
* yeah idk
* fix: line save from redundant check
* revert: line save
* fix: simplify test_broadcast cuz I'm stumped
* change some test name
* fix: bool max bool works
* test: add a maximum bool test
* test: make sure minimum also works with bool
* fix: something like this? :s
* fix: maybe this?
* fix: how about this? tighter check
* fix: this.
* revert: nvm mul(0.5) and div(2) has the same kernel for backward
* fix: .is_floating_point() xD
* revert: maximum and minimum and add cast
* fix: cover negative const case in test
* fix: use eq because I don't understand clang :D
* WHOOOOPS
* try
* test: add logical_not tests
* gah im retarded, but this doesn't match types for const()
* fix: can't we jsut do this?
* big change: I don't actually know what I'm doing
* WOOO IM JUST CHANGING EVERYTHING WOW probably gon revert later
* BYE BYE noqa: E501
* fix: less lines and add test
* fix: rm 2 redundant tests
* fix: eq with False so we don't unintentionally implicit upcast, but it's bool anyways so w/e
- removed exact duplicated tests
- only kept one function if torch_fxn is the same as tinygrad_fxn
- used tensor method instead of class method style
- replaced unneeded `lamdba f: f(x)` with just `f`
- re-enabled commented tests that work now
- removed some forward_only now 0 shape tensor can backward
* add operator.lt and operator.eq to test_dtype_alu
those should pass now as we have broadcasted before passing to lt and eq.
also updated the test skipping criteria to reuse test_dtype.is_dtype_supported
* llvm lt nan is incorrect
* enable truediv too
* Revert "enable truediv too"
This reverts commit df703235fb.
* just that
* move reduce over 0 len axis logic to lazy.py
this fixed uneven shard reduce case if the uneven one has length 0
* fix interpreted backends
* fix backwards for 0 shape tensors too