chenyu
1ffa5ec492
unit test ShapeTracker.consecutive ( #4800 )
2024-06-01 10:10:51 -04:00
chenyu
8942230b1f
minor cleanups of test_tensor and extend some cases ( #4794 )
2024-05-31 10:43:22 -04:00
qazal
637f482588
configure derandomizing CI tests ( #4793 )
2024-05-31 17:06:58 +03:00
chenyu
7cc883ecee
CMPLT is safe to pad ( #4790 )
...
0 < 0 evals to False
2024-05-30 22:50:48 -04:00
chenyu
236390aafb
fix lazy r const folding with variable shape ( #4783 )
...
currently not supporting const fold symbolic shape. I think it's possible with a refactor to Tensor.from_node.
also added some failed required tests for symbolic arange.
2024-05-30 15:19:28 -04:00
chenyu
4921de1945
fix cumsum of 0-d tensor ( #4781 )
...
* fix cumsum of 0-d tensor
* _resolve_dim for all
2024-05-30 12:41:09 -04:00
chenyu
4cf0eadf8f
failed test case for ellipsis in einsum ( #4779 )
...
from #4156
2024-05-30 11:14:42 -04:00
Alec Chen
e89bc42cc7
Add UOps pattern matcher regression tests ( #4725 )
...
* add pattern matcher regression tests
* Remove test for dtype str after rebasing
* Make test uops match type spec
* leave const const, add const alu vin test
* correct uops
* actually correct uops
2024-05-30 17:12:20 +03:00
qazal
c2945be0a3
add fused tensor core opts tests ( #4775 )
...
* add fused tc opts tests
* n=64
2024-05-30 13:50:00 +03:00
chenyu
f1bf916b8a
apply NOOPT in test_arange complexity ( #4774 )
...
with hcopt, arange(2560) uses less ops than arange(256)
2024-05-29 23:12:35 -04:00
chenyu
cde7a7cda7
isolate the 134ms kernel in train_gpt2.py ( #4773 )
...
133ms on tinybox red with BEAM=2
2024-05-29 17:26:24 -04:00
chenyu
59c6472b9f
check contiguous in View.create after canonicalizing mask and offset ( #4770 )
...
mask / offset / strides can change during canonicalization, and contiguous can be True at the end
2024-05-29 11:31:13 -04:00
nimlgen
019f4680e5
check dims before execution on nv ( #4756 )
...
* check dims before execution on nv
* fix linter
2024-05-28 16:57:28 +03:00
qazal
0e824741c4
pre multi reduce codegen/* cleanup ( #4755 )
...
* refactor self.reduceop
* free lines
* fix test
2024-05-28 08:15:48 -04:00
chenyu
53b9081aab
check arg types of Tensor.randint ( #4751 )
...
raise TypeError if low, high, dtype are not ints
2024-05-27 20:24:10 -04:00
qazal
0e69b22629
multireduce OptOps tests (start) ( #4733 )
...
* start
* full tests
* add skips
* unrelated
* notes
2024-05-27 12:21:33 +03:00
qazal
c7b1d802f1
delete duplicate tests in test_linearizer ( #4723 )
...
* delete duplicate test
test_simplify_uop isnt needed
max works
* ci
* remove skip
* add skip back
2024-05-26 08:11:42 +03:00
Szymon Ożóg
de5c69c4c9
Unify test_dtype naming conventions ( #4730 )
2024-05-25 10:12:40 -04:00
chenyu
7e90026eb0
pow cleanup part 2 ( #4727 )
...
more cleanups and fix 0 ** 0
2024-05-25 07:17:40 -04:00
chenyu
31358cbea5
change Tensor.stack to method ( #4719 )
2024-05-24 17:04:19 -04:00
Szymon Ożóg
212025b53c
Int mulacc for ptx ( #4680 )
...
* IntMulacc
* don't mov const
* Dont do int mulacc on ocelot
* Workaround for ocelot
* Remove ocelot workaround
* Fix tests that merged into mulacc
* fix uop cout after mergin to mulacc
2024-05-24 15:20:48 -04:00
qazal
c170ddceaf
fix commavq benchmark ( #4712 )
...
* fix _slice and assert explicit device
* with _slice
2024-05-24 19:40:57 +03:00
Szymon Ożóg
84255069e7
Fix int8 and uint8 on PTX ( #4711 )
...
* Fix mem type for uchar
* Bring tests back
2024-05-24 11:08:52 -04:00
chenyu
4398cc3654
update test_linearizer.py ( #4707 )
...
tests passed locally on tinybox green. Also unified test skipping with local/shared/float4/tc
2024-05-23 22:41:22 -04:00
Francis Lam
49225522aa
wmma: chain unrolled WMMAs and phi only at the end ( #4703 )
...
* wmma: chain unrolled WMMAs and phi only at the end
* fix linter and tests
* reduce lines
2024-05-23 17:50:18 -04:00
chenyu
eb714a600d
fix UOps.CAST noop for vectorized dtypes ( #4704 )
...
* ==
* add test
* not lazyop
* use str comparison for PtrDType
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-05-23 17:33:29 -04:00
qazal
532c9e08e3
proposal: PHI nodes in TC shouldn't have children inside the loop ( #4694 )
...
* expectations from UOpGraph
* one with children
* minimal repro
* replace
2024-05-23 15:11:26 -04:00
Szymon Ożóg
9a9963ba7b
Remove uops deepcopy from PTX ( #4671 )
...
* Remove uops deepcopy from PTX
* Update test
* Fix test
* fix for non-ptx
* Clean
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-05-22 23:14:17 -04:00
chenyu
47aba47f64
update Torch.gather api ( #4692 )
...
* update Torch.gather api
gather(self, dim, index) to match torch
* fix that
2024-05-22 21:54:06 -04:00
qazal
498cf3e7e0
fuzzer path search for DEFINE_ACC ( #4656 )
...
* insert acc
* add test_ops
* find toposorts
* todo - not yet ready
* remove the import
* atol and childless children
2024-05-23 00:50:01 +03:00
qazal
f11a81f707
isolated test for BEAM=2 llama wrong uops toposort ( #4687 )
...
* add ast
* skip test in CI
2024-05-23 00:47:37 +03:00
Francis Lam
721f9f6acf
test/external/verify_kernel: fix LOGKERNS variable name in comments ( #4685 )
...
should've been changed with the LOGKERN to LOGKERNS change
2024-05-22 17:08:40 -04:00
qazal
c5f5755328
correctness test for multireduce nested locals ( #4682 )
...
* nested locals test
* move st
2024-05-22 19:35:35 +03:00
qazal
d12d412e8b
revert uops dtype in pattern matcher ( #4681 )
...
This reverts commit 5f84cbb5df .
2024-05-22 14:45:51 +03:00
chenyu
0f21aa0416
example kernel that triggers Memory access fault for resnet on red ( #4678 )
2024-05-21 18:59:36 -04:00
qazal
5f84cbb5df
keep UOps.CAST in PHI-GEP fold for unmatching dtypes ( #4674 )
...
* these should be val.dtype
* cast float4 and float2 to root
* document tests
* 2 args
* fix assert
* match dtype
* no extra lines
* better fix
2024-05-21 14:59:49 -04:00
qazal
458a3961eb
catch compile errors in uops tests ( #4672 )
...
* use helper and compile
* llama beam=2
* ast length
* skip float4, fix hsa
* use empty tensors
2024-05-21 12:20:35 +03:00
Timmy
de733d73cf
Multireduce Linearizer Tests ( #4665 )
...
* updated tests
* make sure the upcasting tests actually causes the problem
* diff cleanup
* use UOpGraph utils
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-05-21 02:43:25 +03:00
qazal
b33c827aed
UOps.RANGE toposort spec ( #4660 )
...
* use iterator
* nested loops and outer loads
* uop after phi
2024-05-20 23:38:20 +03:00
qazal
0d9e623d83
consolidate uops tests ( #4659 )
...
* merge uoptimize
* move tests
* fix skip message
2024-05-20 21:42:31 +03:00
Szymon Ożóg
1e7b7b2c3c
Fix flop coutning for mulacc ( #4640 )
...
* Fix flop coutning for mulacc
* add test_simple_mulacc
* Update test_uops_stats.py
* Update test_uops_stats.py
* revert test_mulacc
* Test for MULACC vs MUL+ADD
2024-05-20 12:06:00 -04:00
nimlgen
c9f7f2da70
nv hcq bind api ( #4629 )
...
* hcq bind api for nv
* linter
* linter
* add test
* small comment
2024-05-19 23:17:10 +03:00
qazal
d308f4fa9a
correctly insert UOps.END* in fuzz result ( #4653 )
2024-05-19 21:10:28 +03:00
chenyu
456aa0b656
update test_search kernel count ( #4652 )
...
integration test that beaming 1 kernel increments kernel count by 1, and moved exiting test_kernel_count to TestTimeLinearizer
2024-05-19 13:54:52 -04:00
qazal
954718e6bf
reorder DEFINE_GLOBAL in fuzz_uops ( #4651 )
...
* globals base
* test: opt out of DEFINE_GLOBAL
* do it like ExecItem
2024-05-19 20:51:31 +03:00
Léo
967e35f8b8
fix(beam): GlobalCounters kernel count increasing when clearing l2 ( #4598 )
...
* fix(beam): GlobalCounters kernel count increasing when clearing l2
* fix: removed the NOSTATS var by adding do_update_stats to Tensor.realize()
* test(search): regression test for _time_program, should not increment kernel_count
* fix(test_search): unused var and now properly checking when l2 is cleared
* fix(test_search): added assert message
* fix(test_search): now testing public beam api for kcount
* ruff fixes
---------
Co-authored-by: Léo Paillé <leo.paille@enseirb-matmeca.fr >
2024-05-19 10:03:47 -07:00
George Hotz
4753283221
LOOP -> RANGE ( #4650 )
2024-05-19 06:40:20 -07:00
chenyu
286b4dbdf2
compile raise CompileError and skip only RuntimeError in multiprocess… ( #4646 )
...
* compile raise CompileError and skip only RuntimeError in multiprocess beam
renderer error with multiprocess should not be skipped by beam
* use `==` for dtype to dtype comparison
* that needs to be is
* typo
2024-05-19 00:25:25 -04:00
qazal
b0cb02f719
uops fuzzing infra ( #4641 )
...
* base with bfs
* find paths
* get last
* try blocks
* Revert "try blocks"
This reverts commit 25f8e3fe85 .
* this should be simpler
* full exec
* support debug
* fix lint
* add todo
* copy in_degree
2024-05-18 20:19:57 +03:00
qazal
bf8f855838
assert kernel counts in unsupported fusions ( #4643 )
...
* replace with comments
* not relevant
* update comment
* custom exception maybe
* fix LoadOps.VIEW
2024-05-18 20:14:37 +03:00