chenyu
99e7a1d5e9
support symbolic reshape with non-contiguous ( #4844 )
...
* support symbolic reshape with non-contiguous
pre-requisite for symbolic arange (make symbolic ones that can be folded).
* test cases
* typo
* shorter
2024-06-05 16:01:19 -04:00
chenyu
a352b6d9ce
symbolic Tensor.var ( #4843 )
...
taken from #4446 and add more tests
2024-06-05 12:55:54 -04:00
Timmy
887643cf34
Multireduce atomic local load/store test ( #4786 )
...
* atomic load/store test
* tests for nested & unrolled
* check barriers
* linters
* cleaning up diff
* fix assert in _temp_create_multireduce_ast changes
* cleaning up the check for redundant barriers
* minor cleanups for the assert
* always seed randn, helps with debuggability
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-06-05 14:41:19 +03:00
Szymon Ożóg
273945df67
Regression tests for bitshift ( #4829 )
...
* Regression tests for bitshift
* Add test for bitshift not triggered
* Enable tests
2024-06-05 11:42:34 +02:00
Alec Chen
5ac30c29d8
Construct UOps patterns using UPat ( #4821 )
...
* Allow UPat pattern definitions
* Convert pattern matcher tests to UPat constructions
* Convert constant_folder patterns to upat constructions
* Convert assembly patterns to upat constructions
* [run_process_replay] Drop UPat.from_dict
2024-06-05 10:29:37 +02:00
Szymon Ożóg
e47277d18a
Disable for PTX as well ( #4838 )
...
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
2024-06-05 10:37:59 +03:00
Francis Lam
890e7c12bb
test/external/verify_kernel: add support for single pickled kernel ( #4836 )
2024-06-04 18:59:21 -04:00
Elias Wahl
04e237328b
Refactor to class style ( #4804 )
2024-06-04 14:08:31 -07:00
David Hou
cddce0e168
don't cast before view on shape changing bitcast ( #4833 )
...
* don't cast before view on shape changing bitcast
* make sure cast before view triggers
2024-06-04 16:04:52 -04:00
Alec Chen
4909a0d16f
Fix arg set in pattern matcher ( #4830 )
2024-06-04 15:10:09 -04:00
Alec Chen
c96026ac65
Add arg set regression test for pattern matcher ( #4827 )
...
* Add arg set regression test for pattern matcher
* real regression
---------
Co-authored-by: qazalin <qazal.software@gmail.com >
2024-06-04 13:35:09 -04:00
chenyu
a70e8a80d7
test_ops test cmp with special floats ( #4826 )
...
prepare to fix nan, it did not work with ge and le before either
2024-06-04 12:10:21 -04:00
chenyu
3afc914617
CMPEQ -> CMPNE and make it safe to pad ( #4818 )
...
* CMPNE
* new dataset
2024-06-03 18:02:15 -04:00
Szymon Ożóg
bb7b031c5c
Bitshift ( #4728 )
...
* WIP
* Cleanup
* Cleanup
* Fix variable, refactor to use set
* right shift should be signed/unsigned
* Test for bitshifts
* Allow a neg
2024-06-03 21:16:01 +02:00
nimlgen
e78a9bf3f2
support view in nv/amd ( #4812 )
...
* support view in nv/amd
* fix amd
* fix
* run test on nv/amd
2024-06-03 22:11:52 +03:00
chenyu
45083ccb43
canonicalize 0 in shape in View.create ( #4815 )
...
set strides to 0, offset to 0, mask to None, and contiguous to True with size 0 view.
2024-06-03 13:37:37 -04:00
qazal
f64fa51a64
process replay for test/* ( #4799 )
...
* add input to unit tests [run_process_replay]
* add setup [run_process_replay]
* run tests [run_process_replay]
* add cuda and amd [run_process_replay]
* run everything but BEAM=2 [run_process_replay]
* skip export_model [run_process_replay]
* fix amd CI
* add concurrency back
2024-06-03 12:01:58 +03:00
Timmy
ca32921f84
Multireduce PADTO Test ( #4785 )
...
* padto test
* expanded multireduce padto tests
* cuda doesnt run on ci
* moving padto_where_multireduce test to SUM so that we can check the reduce axis
* cleaning up tests some more
* add wanna_outputs
* refactor test_padto_sum_multireduce
* fix max and refactor where
* fix axis
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-06-02 13:46:53 +03:00
chenyu
1ffa5ec492
unit test ShapeTracker.consecutive ( #4800 )
2024-06-01 10:10:51 -04:00
chenyu
8942230b1f
minor cleanups of test_tensor and extend some cases ( #4794 )
2024-05-31 10:43:22 -04:00
qazal
637f482588
configure derandomizing CI tests ( #4793 )
2024-05-31 17:06:58 +03:00
chenyu
7cc883ecee
CMPLT is safe to pad ( #4790 )
...
0 < 0 evals to False
2024-05-30 22:50:48 -04:00
chenyu
236390aafb
fix lazy r const folding with variable shape ( #4783 )
...
currently not supporting const fold symbolic shape. I think it's possible with a refactor to Tensor.from_node.
also added some failed required tests for symbolic arange.
2024-05-30 15:19:28 -04:00
chenyu
4921de1945
fix cumsum of 0-d tensor ( #4781 )
...
* fix cumsum of 0-d tensor
* _resolve_dim for all
2024-05-30 12:41:09 -04:00
chenyu
4cf0eadf8f
failed test case for ellipsis in einsum ( #4779 )
...
from #4156
2024-05-30 11:14:42 -04:00
Alec Chen
e89bc42cc7
Add UOps pattern matcher regression tests ( #4725 )
...
* add pattern matcher regression tests
* Remove test for dtype str after rebasing
* Make test uops match type spec
* leave const const, add const alu vin test
* correct uops
* actually correct uops
2024-05-30 17:12:20 +03:00
qazal
c2945be0a3
add fused tensor core opts tests ( #4775 )
...
* add fused tc opts tests
* n=64
2024-05-30 13:50:00 +03:00
chenyu
f1bf916b8a
apply NOOPT in test_arange complexity ( #4774 )
...
with hcopt, arange(2560) uses less ops than arange(256)
2024-05-29 23:12:35 -04:00
chenyu
cde7a7cda7
isolate the 134ms kernel in train_gpt2.py ( #4773 )
...
133ms on tinybox red with BEAM=2
2024-05-29 17:26:24 -04:00
chenyu
59c6472b9f
check contiguous in View.create after canonicalizing mask and offset ( #4770 )
...
mask / offset / strides can change during canonicalization, and contiguous can be True at the end
2024-05-29 11:31:13 -04:00
nimlgen
019f4680e5
check dims before execution on nv ( #4756 )
...
* check dims before execution on nv
* fix linter
2024-05-28 16:57:28 +03:00
qazal
0e824741c4
pre multi reduce codegen/* cleanup ( #4755 )
...
* refactor self.reduceop
* free lines
* fix test
2024-05-28 08:15:48 -04:00
chenyu
53b9081aab
check arg types of Tensor.randint ( #4751 )
...
raise TypeError if low, high, dtype are not ints
2024-05-27 20:24:10 -04:00
qazal
0e69b22629
multireduce OptOps tests (start) ( #4733 )
...
* start
* full tests
* add skips
* unrelated
* notes
2024-05-27 12:21:33 +03:00
qazal
c7b1d802f1
delete duplicate tests in test_linearizer ( #4723 )
...
* delete duplicate test
test_simplify_uop isnt needed
max works
* ci
* remove skip
* add skip back
2024-05-26 08:11:42 +03:00
Szymon Ożóg
de5c69c4c9
Unify test_dtype naming conventions ( #4730 )
2024-05-25 10:12:40 -04:00
chenyu
7e90026eb0
pow cleanup part 2 ( #4727 )
...
more cleanups and fix 0 ** 0
2024-05-25 07:17:40 -04:00
chenyu
31358cbea5
change Tensor.stack to method ( #4719 )
2024-05-24 17:04:19 -04:00
Szymon Ożóg
212025b53c
Int mulacc for ptx ( #4680 )
...
* IntMulacc
* don't mov const
* Dont do int mulacc on ocelot
* Workaround for ocelot
* Remove ocelot workaround
* Fix tests that merged into mulacc
* fix uop cout after mergin to mulacc
2024-05-24 15:20:48 -04:00
qazal
c170ddceaf
fix commavq benchmark ( #4712 )
...
* fix _slice and assert explicit device
* with _slice
2024-05-24 19:40:57 +03:00
Szymon Ożóg
84255069e7
Fix int8 and uint8 on PTX ( #4711 )
...
* Fix mem type for uchar
* Bring tests back
2024-05-24 11:08:52 -04:00
chenyu
4398cc3654
update test_linearizer.py ( #4707 )
...
tests passed locally on tinybox green. Also unified test skipping with local/shared/float4/tc
2024-05-23 22:41:22 -04:00
Francis Lam
49225522aa
wmma: chain unrolled WMMAs and phi only at the end ( #4703 )
...
* wmma: chain unrolled WMMAs and phi only at the end
* fix linter and tests
* reduce lines
2024-05-23 17:50:18 -04:00
chenyu
eb714a600d
fix UOps.CAST noop for vectorized dtypes ( #4704 )
...
* ==
* add test
* not lazyop
* use str comparison for PtrDType
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-05-23 17:33:29 -04:00
qazal
532c9e08e3
proposal: PHI nodes in TC shouldn't have children inside the loop ( #4694 )
...
* expectations from UOpGraph
* one with children
* minimal repro
* replace
2024-05-23 15:11:26 -04:00
Szymon Ożóg
9a9963ba7b
Remove uops deepcopy from PTX ( #4671 )
...
* Remove uops deepcopy from PTX
* Update test
* Fix test
* fix for non-ptx
* Clean
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-05-22 23:14:17 -04:00
chenyu
47aba47f64
update Torch.gather api ( #4692 )
...
* update Torch.gather api
gather(self, dim, index) to match torch
* fix that
2024-05-22 21:54:06 -04:00
qazal
498cf3e7e0
fuzzer path search for DEFINE_ACC ( #4656 )
...
* insert acc
* add test_ops
* find toposorts
* todo - not yet ready
* remove the import
* atol and childless children
2024-05-23 00:50:01 +03:00
qazal
f11a81f707
isolated test for BEAM=2 llama wrong uops toposort ( #4687 )
...
* add ast
* skip test in CI
2024-05-23 00:47:37 +03:00
Francis Lam
721f9f6acf
test/external/verify_kernel: fix LOGKERNS variable name in comments ( #4685 )
...
should've been changed with the LOGKERN to LOGKERNS change
2024-05-22 17:08:40 -04:00