Commit Graph

1847 Commits

Author SHA1 Message Date
qazal
f64fa51a64 process replay for test/* (#4799)
* add input to unit tests [run_process_replay]

* add setup [run_process_replay]

* run tests [run_process_replay]

* add cuda and amd [run_process_replay]

* run everything but BEAM=2 [run_process_replay]

* skip export_model [run_process_replay]

* fix amd CI

* add concurrency back
2024-06-03 12:01:58 +03:00
Timmy
ca32921f84 Multireduce PADTO Test (#4785)
* padto test

* expanded multireduce padto tests

* cuda doesnt run on ci

* moving padto_where_multireduce test to SUM so that we can check the reduce axis

* cleaning up tests some more

* add wanna_outputs

* refactor test_padto_sum_multireduce

* fix max and refactor where

* fix axis

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-02 13:46:53 +03:00
chenyu
1ffa5ec492 unit test ShapeTracker.consecutive (#4800) 2024-06-01 10:10:51 -04:00
chenyu
8942230b1f minor cleanups of test_tensor and extend some cases (#4794) 2024-05-31 10:43:22 -04:00
qazal
637f482588 configure derandomizing CI tests (#4793) 2024-05-31 17:06:58 +03:00
chenyu
7cc883ecee CMPLT is safe to pad (#4790)
0 < 0 evals to False
2024-05-30 22:50:48 -04:00
chenyu
236390aafb fix lazy r const folding with variable shape (#4783)
currently not supporting const fold symbolic shape. I think it's possible with a refactor to Tensor.from_node.
also added some failed required tests for symbolic arange.
2024-05-30 15:19:28 -04:00
chenyu
4921de1945 fix cumsum of 0-d tensor (#4781)
* fix cumsum of 0-d tensor

* _resolve_dim for all
2024-05-30 12:41:09 -04:00
chenyu
4cf0eadf8f failed test case for ellipsis in einsum (#4779)
from #4156
2024-05-30 11:14:42 -04:00
Alec Chen
e89bc42cc7 Add UOps pattern matcher regression tests (#4725)
* add pattern matcher regression tests

* Remove test for dtype str after rebasing

* Make test uops match type spec

* leave const const, add const alu vin test

* correct uops

* actually correct uops
2024-05-30 17:12:20 +03:00
qazal
c2945be0a3 add fused tensor core opts tests (#4775)
* add fused tc opts tests

* n=64
2024-05-30 13:50:00 +03:00
chenyu
f1bf916b8a apply NOOPT in test_arange complexity (#4774)
with hcopt, arange(2560) uses less ops than arange(256)
2024-05-29 23:12:35 -04:00
chenyu
cde7a7cda7 isolate the 134ms kernel in train_gpt2.py (#4773)
133ms on tinybox red with BEAM=2
2024-05-29 17:26:24 -04:00
chenyu
59c6472b9f check contiguous in View.create after canonicalizing mask and offset (#4770)
mask / offset / strides can change during canonicalization, and contiguous can be True at the end
2024-05-29 11:31:13 -04:00
nimlgen
019f4680e5 check dims before execution on nv (#4756)
* check dims before execution on nv

* fix linter
2024-05-28 16:57:28 +03:00
qazal
0e824741c4 pre multi reduce codegen/* cleanup (#4755)
* refactor self.reduceop

* free lines

* fix test
2024-05-28 08:15:48 -04:00
chenyu
53b9081aab check arg types of Tensor.randint (#4751)
raise TypeError if low, high, dtype are not ints
2024-05-27 20:24:10 -04:00
qazal
0e69b22629 multireduce OptOps tests (start) (#4733)
* start

* full tests

* add skips

* unrelated

* notes
2024-05-27 12:21:33 +03:00
qazal
c7b1d802f1 delete duplicate tests in test_linearizer (#4723)
* delete duplicate test

test_simplify_uop isnt needed

max works

* ci

* remove skip

* add skip back
2024-05-26 08:11:42 +03:00
Szymon Ożóg
de5c69c4c9 Unify test_dtype naming conventions (#4730) 2024-05-25 10:12:40 -04:00
chenyu
7e90026eb0 pow cleanup part 2 (#4727)
more cleanups and fix 0 ** 0
2024-05-25 07:17:40 -04:00
chenyu
31358cbea5 change Tensor.stack to method (#4719) 2024-05-24 17:04:19 -04:00
Szymon Ożóg
212025b53c Int mulacc for ptx (#4680)
* IntMulacc

* don't mov const

* Dont do int mulacc on ocelot

* Workaround for ocelot

* Remove ocelot workaround

* Fix tests that merged into mulacc

* fix uop cout after mergin to mulacc
2024-05-24 15:20:48 -04:00
qazal
c170ddceaf fix commavq benchmark (#4712)
* fix _slice and assert explicit device

* with _slice
2024-05-24 19:40:57 +03:00
Szymon Ożóg
84255069e7 Fix int8 and uint8 on PTX (#4711)
* Fix mem type for uchar

* Bring tests back
2024-05-24 11:08:52 -04:00
chenyu
4398cc3654 update test_linearizer.py (#4707)
tests passed locally on tinybox green. Also unified test skipping with local/shared/float4/tc
2024-05-23 22:41:22 -04:00
Francis Lam
49225522aa wmma: chain unrolled WMMAs and phi only at the end (#4703)
* wmma: chain unrolled WMMAs and phi only at the end

* fix linter and tests

* reduce lines
2024-05-23 17:50:18 -04:00
chenyu
eb714a600d fix UOps.CAST noop for vectorized dtypes (#4704)
* ==

* add test

* not lazyop

* use str comparison for PtrDType

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-23 17:33:29 -04:00
qazal
532c9e08e3 proposal: PHI nodes in TC shouldn't have children inside the loop (#4694)
* expectations from UOpGraph

* one with children

* minimal repro

* replace
2024-05-23 15:11:26 -04:00
Szymon Ożóg
9a9963ba7b Remove uops deepcopy from PTX (#4671)
* Remove uops deepcopy from PTX

* Update test

* Fix test

* fix for non-ptx

* Clean

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-22 23:14:17 -04:00
chenyu
47aba47f64 update Torch.gather api (#4692)
* update Torch.gather api

gather(self, dim, index) to match torch

* fix that
2024-05-22 21:54:06 -04:00
qazal
498cf3e7e0 fuzzer path search for DEFINE_ACC (#4656)
* insert acc

* add test_ops

* find toposorts

* todo - not yet ready

* remove the import

* atol and childless children
2024-05-23 00:50:01 +03:00
qazal
f11a81f707 isolated test for BEAM=2 llama wrong uops toposort (#4687)
* add ast

* skip test in CI
2024-05-23 00:47:37 +03:00
Francis Lam
721f9f6acf test/external/verify_kernel: fix LOGKERNS variable name in comments (#4685)
should've been changed with the LOGKERN to LOGKERNS change
2024-05-22 17:08:40 -04:00
qazal
c5f5755328 correctness test for multireduce nested locals (#4682)
* nested locals test

* move st
2024-05-22 19:35:35 +03:00
qazal
d12d412e8b revert uops dtype in pattern matcher (#4681)
This reverts commit 5f84cbb5df.
2024-05-22 14:45:51 +03:00
chenyu
0f21aa0416 example kernel that triggers Memory access fault for resnet on red (#4678) 2024-05-21 18:59:36 -04:00
qazal
5f84cbb5df keep UOps.CAST in PHI-GEP fold for unmatching dtypes (#4674)
* these should be val.dtype

* cast float4 and float2 to root

* document tests

* 2 args

* fix assert

* match dtype

* no extra lines

* better fix
2024-05-21 14:59:49 -04:00
qazal
458a3961eb catch compile errors in uops tests (#4672)
* use helper and compile

* llama beam=2

* ast length

* skip float4, fix hsa

* use empty tensors
2024-05-21 12:20:35 +03:00
Timmy
de733d73cf Multireduce Linearizer Tests (#4665)
* updated tests

* make sure the upcasting tests actually causes the problem

* diff cleanup

* use UOpGraph utils

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-21 02:43:25 +03:00
qazal
b33c827aed UOps.RANGE toposort spec (#4660)
* use iterator

* nested loops and outer loads

* uop after phi
2024-05-20 23:38:20 +03:00
qazal
0d9e623d83 consolidate uops tests (#4659)
* merge uoptimize

* move tests

* fix skip message
2024-05-20 21:42:31 +03:00
Szymon Ożóg
1e7b7b2c3c Fix flop coutning for mulacc (#4640)
* Fix flop coutning for mulacc

* add test_simple_mulacc

* Update test_uops_stats.py

* Update test_uops_stats.py

* revert test_mulacc

* Test for MULACC vs MUL+ADD
2024-05-20 12:06:00 -04:00
nimlgen
c9f7f2da70 nv hcq bind api (#4629)
* hcq bind api for nv

* linter

* linter

* add test

* small comment
2024-05-19 23:17:10 +03:00
qazal
d308f4fa9a correctly insert UOps.END* in fuzz result (#4653) 2024-05-19 21:10:28 +03:00
chenyu
456aa0b656 update test_search kernel count (#4652)
integration test that beaming 1 kernel increments kernel count by 1, and moved exiting test_kernel_count to TestTimeLinearizer
2024-05-19 13:54:52 -04:00
qazal
954718e6bf reorder DEFINE_GLOBAL in fuzz_uops (#4651)
* globals base

* test: opt out of DEFINE_GLOBAL

* do it like ExecItem
2024-05-19 20:51:31 +03:00
Léo
967e35f8b8 fix(beam): GlobalCounters kernel count increasing when clearing l2 (#4598)
* fix(beam): GlobalCounters kernel count increasing when clearing l2

* fix: removed the NOSTATS var by adding do_update_stats to Tensor.realize()

* test(search): regression test for _time_program, should not increment kernel_count

* fix(test_search): unused var and now properly checking when l2 is cleared

* fix(test_search): added assert message

* fix(test_search): now testing public beam api for kcount

* ruff fixes

---------

Co-authored-by: Léo Paillé <leo.paille@enseirb-matmeca.fr>
2024-05-19 10:03:47 -07:00
George Hotz
4753283221 LOOP -> RANGE (#4650) 2024-05-19 06:40:20 -07:00
chenyu
286b4dbdf2 compile raise CompileError and skip only RuntimeError in multiprocess… (#4646)
* compile raise CompileError and skip only RuntimeError in multiprocess beam

renderer error with multiprocess should not be skipped by beam

* use `==` for dtype to dtype comparison

* that needs to be is

* typo
2024-05-19 00:25:25 -04:00