Commit Graph

1890 Commits

Author SHA1 Message Date
chenyu
74586bc339 fix getitem with leading None (#4943)
i think all None handling can be unified and remove the calc_dim in advanced indexing
2024-06-13 11:23:40 -04:00
nimlgen
fd071ba27e amd mockgpu correct timer resolution (#4942)
* amd mockgpu correct timer resolution

* test it
2024-06-13 10:07:34 +03:00
chenyu
fae08c4d48 fix Tensor.triu / Tensor.triu with boolean input (#4941)
`where(self, 0)` incorrectly upcasted the output. `where(self, False)` is correct but looks unnatural, so added a cast at the end. Pattern matcher can fold the cast into where branches
2024-06-12 20:16:13 -04:00
chenyu
eb0f5b5660 failed test case for getitem with leading Nones (#4936)
* failed test case for getitem with leading Nones

torch matched numpy so tinygrad is incorrect.
another repro
```
t = np.arange(12).reshape((3, 4))
print(t[None, None, np.array([1, 2])])

t = torch.arange(12).reshape((3, 4))
print(t[None, None, torch.tensor([1, 2])].numpy())

t = Tensor.arange(12).reshape(3, 4)
print(t[None, None, Tensor([1, 2])].numpy())
```

* # noqa
2024-06-12 16:19:42 -04:00
chenyu
a21ea165bc skip linearizer test_failure_22 on llvm (#4937)
getting flaky recently
2024-06-12 16:03:38 -04:00
Timmy
720c700a8a Multireduce-Kernels: Linearizer Changes and Tests (#4259)
* basic tests

* cleanup

* pylint

* ruff

* use define acc as a proxy for rendered reductions

* use define acc as a proxy for rendered reductions

* recursive reduceop rendering via ast_parse

* linters + cleanup

* fixing late buf loading

* plus linters

* removing extra line

* linters

* does this break ci?

* added tests and if add end change

* typo in add_ends

* linters

* removing comments

* allow endifs to be inserted before the end of the graph

* find add ENDIF before next BARRIER

* removing tests with manual ENDIF + linters

* specifically the next barrier aftr the store of the local result

* Revert "specifically the next barrier aftr the store of the local result"

This reverts commit b288a5c3ce.

* keeping up to date

* linters + merge changes

* cleaning up old bad decisions

* linters and opts

* mrged linearizer tests

* fixing merge issues

* removing the big ugly uop test (functionality tested end-to-end by test_linearizer additions

* small diff fixes

* updating linearizer to work without uops.add( ... cachable)

* linters

* comment in multireduce tests

* skipping tests without locals

* full tests

* linters

* load_cache[key] fix for multiple accs

* linters

* assert only one reduceop

* fix loop_scope test to actually cause an issue

* self.load_cache[key] key for DEFINE_ACC changed to use a string to make sure each acc is unique

* updated tests

* fixing merge

* removing debug prints

* complete merge fix

* linters

* diff cleanup

* adding tests in

* give each reduce it's own local buffer

* gpu=1 changes

* store and load locals with upcasting

* modifying test?

* make multireduce_netsted_local_upcast test match single reduce shapes

* removing todo

* cleaning up the diff

* unroll test

* unroll and upcast tests

* fix gpu

* seq and self.load_cache[key] cleaning

* linters

* padto works

* merge fixes

* fixes

* add skips for amd

* linters + seq

* cleaning & more tests

* softmax tests

* linters

* [run_process_replay]

* add new tests back

This reverts commit 19dec22e01.

* more hardcoded -1s

* fix ptx

* Fix name for loop in ptx

* cleaning up the diff

* cleaning up the uops diff

* nv ci is too slow

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: Szymon Ożóg <58388001+SzymonOzog@users.noreply.github.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-06-12 13:29:43 -04:00
Nicklas Boman
6e86472cd6 fix typing for test to run in py38 (#4930) 2024-06-12 13:22:30 -04:00
chenyu
1326f29e24 fix Tensor.gather shape checking criteria (#4932)
it's fine if `self.shape[d] >= index.shape[d]` for all `d != dim`, not for all `d`
2024-06-12 13:10:14 -04:00
George Hotz
9a3c1e4a17 fix mul div failure (#4928) 2024-06-12 13:58:46 +02:00
George Hotz
11a03cbbf5 don't use uops.add while constructing (#4913)
* don't use uops.add while constructing

* rebase

* bugfixes

* have to use BFS

* prove it's late

* simpler uop symbolic test (why we did this)

* use dict, not set
2024-06-12 13:31:34 +02:00
chenyu
fdbb4305cb skip unsupported dtype in fuzz_linearizer (#4917)
resolve issues in #4887. dataset generated from ubuntu but metal does not support double
2024-06-11 18:18:21 -04:00
chenyu
b886d250fb improve test_dropout_on_shard (#4912)
tested some basic property, also minor formatting for a few Tensor.training setups
2024-06-11 11:36:02 -04:00
George Hotz
35e53c0809 add sharded arange test (#4908) 2024-06-11 10:58:33 +02:00
chenyu
798ea61377 widen test_ops [low, high] and more strict atol (#4906)
default [low, high] changed from [-1.5, 1.5] to [-2, 2] (except tan).
dropped several explicit atol if it's unnecessarily larger than default 1e-6.
tested on mac, tinybox red / green
2024-06-10 20:47:09 -04:00
chenyu
97b05f567e revert the .detach() in layernorm (#4904)
* revert the .detach() in layernorm

it's only correct in LayerNorm where input is the data, and not correct in GroupNorm and InstanceNorm that reused layernorm.
Added backward tests for weights, bias and input for these norms.

* bigger atol for llvm

* relax backward more
2024-06-10 18:02:05 -04:00
qazal
8b5bcf309a process replay in all of CI (#4884) 2024-06-10 14:49:29 -04:00
chenyu
c8cd637236 test case for Tensor.var reducing over size = 1 axis (#4902)
backward failed when correction >= reducing n
2024-06-10 12:11:39 -04:00
chenyu
b56ae5606c cosmetic changes to uop _match (#4897)
minor cleanup before fixing two level match
[run_process_replay]
2024-06-09 18:29:42 -04:00
SnakeOnex
b1db2d0094 tqdm replacement (#4846)
* tqdm replacement almost

* formatting

* formatting

* imports

* line len

* fix

* removed set description :(

* removed set description :(

* fix

* fix

* green check?

* rewrote as class, fixed several bugs

* types spacing

* removed imports

* fix

* iterable

* typing

* mypy disagreement

* imports

* more e2e tests vs tqdm

* removed seed setting

* robustness against time.sleep() flakiness

* flaky fix

* automatic bar closing when count==total

* cleanup

* clang error with tqdm

* tqdm back

* use os lib, print to stderr (fixes the clang bug, where the bar was leaking into the generated c program

* back to shutil

* unit_scale + unit_scale test

* custom unit to tests

* pretty

* clean

* removed flaky test

* less test iters

* empty line

* remove disable
2024-06-09 23:46:03 +02:00
qazal
1dde829e34 UOps.IF* to graph spec (#4894) 2024-06-09 07:00:12 -04:00
George Hotz
b9afb0d577 test uop as symbolic (#4870)
* start work

* more tests passing

* more tests passing

* more

* 34 failures

* expect the failures

* remove broken rule

* render is fine in just the test

* simplify and put in test
2024-06-09 12:15:11 +02:00
nimlgen
654a8b9ef7 retire hsa (#4885)
* retire hsa

* EMULATE_AMD
2024-06-09 11:33:03 +03:00
chenyu
e33efd6a3d test cases for multitensor adds const (#4892)
Tested const remained const in ast. Removed the TODO in _to_const_val too
2024-06-08 22:57:48 -04:00
nimlgen
d24e57c615 amd support kernel with bf16 (#4863)
* amd support kernels with dispatch_ptr

* fixes

* line savings

* one line

* try

* Revert "try"

This reverts commit 5f340dfdd4.

* not used will be back when hsa is gone

* gone will be back

* add this as well
2024-06-08 22:52:32 +03:00
qazal
1e3325f369 raise assert [run_process_replay] (#4879) 2024-06-08 08:31:44 -04:00
qazal
66dfd5e7bf faster codegen process replay (#4858)
* faster codegen process replay

* use self.copy

* regenerate

* delete copy

* test a real error [run_process_replay]

* revert the error change
2024-06-07 16:20:57 +03:00
nimlgen
47bfd7c2b7 fix sync of offset buffers in graphs (#4850)
* correctly sync offset buffers

* test

* style

* run less

* just use base
2024-06-06 16:09:45 +03:00
chenyu
99e7a1d5e9 support symbolic reshape with non-contiguous (#4844)
* support symbolic reshape with non-contiguous

pre-requisite for symbolic arange (make symbolic ones that can be folded).

* test cases

* typo

* shorter
2024-06-05 16:01:19 -04:00
chenyu
a352b6d9ce symbolic Tensor.var (#4843)
taken from #4446 and add more tests
2024-06-05 12:55:54 -04:00
Timmy
887643cf34 Multireduce atomic local load/store test (#4786)
* atomic load/store test

* tests for nested & unrolled

* check barriers

* linters

* cleaning up diff

* fix assert in _temp_create_multireduce_ast changes

* cleaning up the check for redundant barriers

* minor cleanups for the assert

* always seed randn, helps with debuggability

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-05 14:41:19 +03:00
Szymon Ożóg
273945df67 Regression tests for bitshift (#4829)
* Regression tests for bitshift

* Add test for bitshift not triggered

* Enable tests
2024-06-05 11:42:34 +02:00
Alec Chen
5ac30c29d8 Construct UOps patterns using UPat (#4821)
* Allow UPat pattern definitions

* Convert pattern matcher tests to UPat constructions

* Convert constant_folder patterns to upat constructions

* Convert assembly patterns to upat constructions

* [run_process_replay] Drop UPat.from_dict
2024-06-05 10:29:37 +02:00
Szymon Ożóg
e47277d18a Disable for PTX as well (#4838)
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2024-06-05 10:37:59 +03:00
Francis Lam
890e7c12bb test/external/verify_kernel: add support for single pickled kernel (#4836) 2024-06-04 18:59:21 -04:00
Elias Wahl
04e237328b Refactor to class style (#4804) 2024-06-04 14:08:31 -07:00
David Hou
cddce0e168 don't cast before view on shape changing bitcast (#4833)
* don't cast before view on shape changing bitcast

* make sure cast before view triggers
2024-06-04 16:04:52 -04:00
Alec Chen
4909a0d16f Fix arg set in pattern matcher (#4830) 2024-06-04 15:10:09 -04:00
Alec Chen
c96026ac65 Add arg set regression test for pattern matcher (#4827)
* Add arg set regression test for pattern matcher

* real regression

---------

Co-authored-by: qazalin <qazal.software@gmail.com>
2024-06-04 13:35:09 -04:00
chenyu
a70e8a80d7 test_ops test cmp with special floats (#4826)
prepare to fix nan, it did not work with ge and le before either
2024-06-04 12:10:21 -04:00
chenyu
3afc914617 CMPEQ -> CMPNE and make it safe to pad (#4818)
* CMPNE

* new dataset
2024-06-03 18:02:15 -04:00
Szymon Ożóg
bb7b031c5c Bitshift (#4728)
* WIP

* Cleanup

* Cleanup

* Fix variable, refactor to use set

* right shift should be signed/unsigned

* Test for bitshifts

* Allow a neg
2024-06-03 21:16:01 +02:00
nimlgen
e78a9bf3f2 support view in nv/amd (#4812)
* support view in nv/amd

* fix amd

* fix

* run test on nv/amd
2024-06-03 22:11:52 +03:00
chenyu
45083ccb43 canonicalize 0 in shape in View.create (#4815)
set strides to 0, offset to 0, mask to None, and contiguous to True with size 0 view.
2024-06-03 13:37:37 -04:00
qazal
f64fa51a64 process replay for test/* (#4799)
* add input to unit tests [run_process_replay]

* add setup [run_process_replay]

* run tests [run_process_replay]

* add cuda and amd [run_process_replay]

* run everything but BEAM=2 [run_process_replay]

* skip export_model [run_process_replay]

* fix amd CI

* add concurrency back
2024-06-03 12:01:58 +03:00
Timmy
ca32921f84 Multireduce PADTO Test (#4785)
* padto test

* expanded multireduce padto tests

* cuda doesnt run on ci

* moving padto_where_multireduce test to SUM so that we can check the reduce axis

* cleaning up tests some more

* add wanna_outputs

* refactor test_padto_sum_multireduce

* fix max and refactor where

* fix axis

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-02 13:46:53 +03:00
chenyu
1ffa5ec492 unit test ShapeTracker.consecutive (#4800) 2024-06-01 10:10:51 -04:00
chenyu
8942230b1f minor cleanups of test_tensor and extend some cases (#4794) 2024-05-31 10:43:22 -04:00
qazal
637f482588 configure derandomizing CI tests (#4793) 2024-05-31 17:06:58 +03:00
chenyu
7cc883ecee CMPLT is safe to pad (#4790)
0 < 0 evals to False
2024-05-30 22:50:48 -04:00
chenyu
236390aafb fix lazy r const folding with variable shape (#4783)
currently not supporting const fold symbolic shape. I think it's possible with a refactor to Tensor.from_node.
also added some failed required tests for symbolic arange.
2024-05-30 15:19:28 -04:00