Commit Graph

4618 Commits

Author SHA1 Message Date
chenyu
e33efd6a3d test cases for multitensor adds const (#4892)
Tested const remained const in ast. Removed the TODO in _to_const_val too
2024-06-08 22:57:48 -04:00
nimlgen
d24e57c615 amd support kernel with bf16 (#4863)
* amd support kernels with dispatch_ptr

* fixes

* line savings

* one line

* try

* Revert "try"

This reverts commit 5f340dfdd4.

* not used will be back when hsa is gone

* gone will be back

* add this as well
2024-06-08 22:52:32 +03:00
qazal
1e3325f369 raise assert [run_process_replay] (#4879) 2024-06-08 08:31:44 -04:00
qazal
66dfd5e7bf faster codegen process replay (#4858)
* faster codegen process replay

* use self.copy

* regenerate

* delete copy

* test a real error [run_process_replay]

* revert the error change
2024-06-07 16:20:57 +03:00
nimlgen
47bfd7c2b7 fix sync of offset buffers in graphs (#4850)
* correctly sync offset buffers

* test

* style

* run less

* just use base
2024-06-06 16:09:45 +03:00
chenyu
99e7a1d5e9 support symbolic reshape with non-contiguous (#4844)
* support symbolic reshape with non-contiguous

pre-requisite for symbolic arange (make symbolic ones that can be folded).

* test cases

* typo

* shorter
2024-06-05 16:01:19 -04:00
chenyu
a352b6d9ce symbolic Tensor.var (#4843)
taken from #4446 and add more tests
2024-06-05 12:55:54 -04:00
Timmy
887643cf34 Multireduce atomic local load/store test (#4786)
* atomic load/store test

* tests for nested & unrolled

* check barriers

* linters

* cleaning up diff

* fix assert in _temp_create_multireduce_ast changes

* cleaning up the check for redundant barriers

* minor cleanups for the assert

* always seed randn, helps with debuggability

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-05 14:41:19 +03:00
Szymon Ożóg
273945df67 Regression tests for bitshift (#4829)
* Regression tests for bitshift

* Add test for bitshift not triggered

* Enable tests
2024-06-05 11:42:34 +02:00
Alec Chen
5ac30c29d8 Construct UOps patterns using UPat (#4821)
* Allow UPat pattern definitions

* Convert pattern matcher tests to UPat constructions

* Convert constant_folder patterns to upat constructions

* Convert assembly patterns to upat constructions

* [run_process_replay] Drop UPat.from_dict
2024-06-05 10:29:37 +02:00
Szymon Ożóg
e47277d18a Disable for PTX as well (#4838)
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2024-06-05 10:37:59 +03:00
Francis Lam
890e7c12bb test/external/verify_kernel: add support for single pickled kernel (#4836) 2024-06-04 18:59:21 -04:00
Elias Wahl
04e237328b Refactor to class style (#4804) 2024-06-04 14:08:31 -07:00
David Hou
cddce0e168 don't cast before view on shape changing bitcast (#4833)
* don't cast before view on shape changing bitcast

* make sure cast before view triggers
2024-06-04 16:04:52 -04:00
Alec Chen
4909a0d16f Fix arg set in pattern matcher (#4830) 2024-06-04 15:10:09 -04:00
Alec Chen
c96026ac65 Add arg set regression test for pattern matcher (#4827)
* Add arg set regression test for pattern matcher

* real regression

---------

Co-authored-by: qazalin <qazal.software@gmail.com>
2024-06-04 13:35:09 -04:00
chenyu
a70e8a80d7 test_ops test cmp with special floats (#4826)
prepare to fix nan, it did not work with ge and le before either
2024-06-04 12:10:21 -04:00
chenyu
3afc914617 CMPEQ -> CMPNE and make it safe to pad (#4818)
* CMPNE

* new dataset
2024-06-03 18:02:15 -04:00
Szymon Ożóg
bb7b031c5c Bitshift (#4728)
* WIP

* Cleanup

* Cleanup

* Fix variable, refactor to use set

* right shift should be signed/unsigned

* Test for bitshifts

* Allow a neg
2024-06-03 21:16:01 +02:00
nimlgen
e78a9bf3f2 support view in nv/amd (#4812)
* support view in nv/amd

* fix amd

* fix

* run test on nv/amd
2024-06-03 22:11:52 +03:00
chenyu
45083ccb43 canonicalize 0 in shape in View.create (#4815)
set strides to 0, offset to 0, mask to None, and contiguous to True with size 0 view.
2024-06-03 13:37:37 -04:00
qazal
f64fa51a64 process replay for test/* (#4799)
* add input to unit tests [run_process_replay]

* add setup [run_process_replay]

* run tests [run_process_replay]

* add cuda and amd [run_process_replay]

* run everything but BEAM=2 [run_process_replay]

* skip export_model [run_process_replay]

* fix amd CI

* add concurrency back
2024-06-03 12:01:58 +03:00
Timmy
ca32921f84 Multireduce PADTO Test (#4785)
* padto test

* expanded multireduce padto tests

* cuda doesnt run on ci

* moving padto_where_multireduce test to SUM so that we can check the reduce axis

* cleaning up tests some more

* add wanna_outputs

* refactor test_padto_sum_multireduce

* fix max and refactor where

* fix axis

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-02 13:46:53 +03:00
chenyu
1ffa5ec492 unit test ShapeTracker.consecutive (#4800) 2024-06-01 10:10:51 -04:00
chenyu
8942230b1f minor cleanups of test_tensor and extend some cases (#4794) 2024-05-31 10:43:22 -04:00
qazal
637f482588 configure derandomizing CI tests (#4793) 2024-05-31 17:06:58 +03:00
chenyu
7cc883ecee CMPLT is safe to pad (#4790)
0 < 0 evals to False
2024-05-30 22:50:48 -04:00
chenyu
236390aafb fix lazy r const folding with variable shape (#4783)
currently not supporting const fold symbolic shape. I think it's possible with a refactor to Tensor.from_node.
also added some failed required tests for symbolic arange.
2024-05-30 15:19:28 -04:00
chenyu
4921de1945 fix cumsum of 0-d tensor (#4781)
* fix cumsum of 0-d tensor

* _resolve_dim for all
2024-05-30 12:41:09 -04:00
chenyu
4cf0eadf8f failed test case for ellipsis in einsum (#4779)
from #4156
2024-05-30 11:14:42 -04:00
Alec Chen
e89bc42cc7 Add UOps pattern matcher regression tests (#4725)
* add pattern matcher regression tests

* Remove test for dtype str after rebasing

* Make test uops match type spec

* leave const const, add const alu vin test

* correct uops

* actually correct uops
2024-05-30 17:12:20 +03:00
qazal
c2945be0a3 add fused tensor core opts tests (#4775)
* add fused tc opts tests

* n=64
2024-05-30 13:50:00 +03:00
chenyu
f1bf916b8a apply NOOPT in test_arange complexity (#4774)
with hcopt, arange(2560) uses less ops than arange(256)
2024-05-29 23:12:35 -04:00
chenyu
cde7a7cda7 isolate the 134ms kernel in train_gpt2.py (#4773)
133ms on tinybox red with BEAM=2
2024-05-29 17:26:24 -04:00
chenyu
59c6472b9f check contiguous in View.create after canonicalizing mask and offset (#4770)
mask / offset / strides can change during canonicalization, and contiguous can be True at the end
2024-05-29 11:31:13 -04:00
nimlgen
019f4680e5 check dims before execution on nv (#4756)
* check dims before execution on nv

* fix linter
2024-05-28 16:57:28 +03:00
qazal
0e824741c4 pre multi reduce codegen/* cleanup (#4755)
* refactor self.reduceop

* free lines

* fix test
2024-05-28 08:15:48 -04:00
chenyu
53b9081aab check arg types of Tensor.randint (#4751)
raise TypeError if low, high, dtype are not ints
2024-05-27 20:24:10 -04:00
qazal
0e69b22629 multireduce OptOps tests (start) (#4733)
* start

* full tests

* add skips

* unrelated

* notes
2024-05-27 12:21:33 +03:00
qazal
c7b1d802f1 delete duplicate tests in test_linearizer (#4723)
* delete duplicate test

test_simplify_uop isnt needed

max works

* ci

* remove skip

* add skip back
2024-05-26 08:11:42 +03:00
Szymon Ożóg
de5c69c4c9 Unify test_dtype naming conventions (#4730) 2024-05-25 10:12:40 -04:00
chenyu
7e90026eb0 pow cleanup part 2 (#4727)
more cleanups and fix 0 ** 0
2024-05-25 07:17:40 -04:00
chenyu
31358cbea5 change Tensor.stack to method (#4719) 2024-05-24 17:04:19 -04:00
Szymon Ożóg
212025b53c Int mulacc for ptx (#4680)
* IntMulacc

* don't mov const

* Dont do int mulacc on ocelot

* Workaround for ocelot

* Remove ocelot workaround

* Fix tests that merged into mulacc

* fix uop cout after mergin to mulacc
2024-05-24 15:20:48 -04:00
qazal
c170ddceaf fix commavq benchmark (#4712)
* fix _slice and assert explicit device

* with _slice
2024-05-24 19:40:57 +03:00
Szymon Ożóg
84255069e7 Fix int8 and uint8 on PTX (#4711)
* Fix mem type for uchar

* Bring tests back
2024-05-24 11:08:52 -04:00
chenyu
4398cc3654 update test_linearizer.py (#4707)
tests passed locally on tinybox green. Also unified test skipping with local/shared/float4/tc
2024-05-23 22:41:22 -04:00
Francis Lam
49225522aa wmma: chain unrolled WMMAs and phi only at the end (#4703)
* wmma: chain unrolled WMMAs and phi only at the end

* fix linter and tests

* reduce lines
2024-05-23 17:50:18 -04:00
chenyu
eb714a600d fix UOps.CAST noop for vectorized dtypes (#4704)
* ==

* add test

* not lazyop

* use str comparison for PtrDType

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-23 17:33:29 -04:00
qazal
532c9e08e3 proposal: PHI nodes in TC shouldn't have children inside the loop (#4694)
* expectations from UOpGraph

* one with children

* minimal repro

* replace
2024-05-23 15:11:26 -04:00