Commit Graph

4433 Commits

Author SHA1 Message Date
nimlgen
019f4680e5 check dims before execution on nv (#4756)
* check dims before execution on nv

* fix linter
2024-05-28 16:57:28 +03:00
qazal
0e824741c4 pre multi reduce codegen/* cleanup (#4755)
* refactor self.reduceop

* free lines

* fix test
2024-05-28 08:15:48 -04:00
chenyu
53b9081aab check arg types of Tensor.randint (#4751)
raise TypeError if low, high, dtype are not ints
2024-05-27 20:24:10 -04:00
qazal
0e69b22629 multireduce OptOps tests (start) (#4733)
* start

* full tests

* add skips

* unrelated

* notes
2024-05-27 12:21:33 +03:00
qazal
c7b1d802f1 delete duplicate tests in test_linearizer (#4723)
* delete duplicate test

test_simplify_uop isnt needed

max works

* ci

* remove skip

* add skip back
2024-05-26 08:11:42 +03:00
Szymon Ożóg
de5c69c4c9 Unify test_dtype naming conventions (#4730) 2024-05-25 10:12:40 -04:00
chenyu
7e90026eb0 pow cleanup part 2 (#4727)
more cleanups and fix 0 ** 0
2024-05-25 07:17:40 -04:00
chenyu
31358cbea5 change Tensor.stack to method (#4719) 2024-05-24 17:04:19 -04:00
Szymon Ożóg
212025b53c Int mulacc for ptx (#4680)
* IntMulacc

* don't mov const

* Dont do int mulacc on ocelot

* Workaround for ocelot

* Remove ocelot workaround

* Fix tests that merged into mulacc

* fix uop cout after mergin to mulacc
2024-05-24 15:20:48 -04:00
qazal
c170ddceaf fix commavq benchmark (#4712)
* fix _slice and assert explicit device

* with _slice
2024-05-24 19:40:57 +03:00
Szymon Ożóg
84255069e7 Fix int8 and uint8 on PTX (#4711)
* Fix mem type for uchar

* Bring tests back
2024-05-24 11:08:52 -04:00
chenyu
4398cc3654 update test_linearizer.py (#4707)
tests passed locally on tinybox green. Also unified test skipping with local/shared/float4/tc
2024-05-23 22:41:22 -04:00
Francis Lam
49225522aa wmma: chain unrolled WMMAs and phi only at the end (#4703)
* wmma: chain unrolled WMMAs and phi only at the end

* fix linter and tests

* reduce lines
2024-05-23 17:50:18 -04:00
chenyu
eb714a600d fix UOps.CAST noop for vectorized dtypes (#4704)
* ==

* add test

* not lazyop

* use str comparison for PtrDType

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-23 17:33:29 -04:00
qazal
532c9e08e3 proposal: PHI nodes in TC shouldn't have children inside the loop (#4694)
* expectations from UOpGraph

* one with children

* minimal repro

* replace
2024-05-23 15:11:26 -04:00
Szymon Ożóg
9a9963ba7b Remove uops deepcopy from PTX (#4671)
* Remove uops deepcopy from PTX

* Update test

* Fix test

* fix for non-ptx

* Clean

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-22 23:14:17 -04:00
chenyu
47aba47f64 update Torch.gather api (#4692)
* update Torch.gather api

gather(self, dim, index) to match torch

* fix that
2024-05-22 21:54:06 -04:00
qazal
498cf3e7e0 fuzzer path search for DEFINE_ACC (#4656)
* insert acc

* add test_ops

* find toposorts

* todo - not yet ready

* remove the import

* atol and childless children
2024-05-23 00:50:01 +03:00
qazal
f11a81f707 isolated test for BEAM=2 llama wrong uops toposort (#4687)
* add ast

* skip test in CI
2024-05-23 00:47:37 +03:00
Francis Lam
721f9f6acf test/external/verify_kernel: fix LOGKERNS variable name in comments (#4685)
should've been changed with the LOGKERN to LOGKERNS change
2024-05-22 17:08:40 -04:00
qazal
c5f5755328 correctness test for multireduce nested locals (#4682)
* nested locals test

* move st
2024-05-22 19:35:35 +03:00
qazal
d12d412e8b revert uops dtype in pattern matcher (#4681)
This reverts commit 5f84cbb5df.
2024-05-22 14:45:51 +03:00
chenyu
0f21aa0416 example kernel that triggers Memory access fault for resnet on red (#4678) 2024-05-21 18:59:36 -04:00
qazal
5f84cbb5df keep UOps.CAST in PHI-GEP fold for unmatching dtypes (#4674)
* these should be val.dtype

* cast float4 and float2 to root

* document tests

* 2 args

* fix assert

* match dtype

* no extra lines

* better fix
2024-05-21 14:59:49 -04:00
qazal
458a3961eb catch compile errors in uops tests (#4672)
* use helper and compile

* llama beam=2

* ast length

* skip float4, fix hsa

* use empty tensors
2024-05-21 12:20:35 +03:00
Timmy
de733d73cf Multireduce Linearizer Tests (#4665)
* updated tests

* make sure the upcasting tests actually causes the problem

* diff cleanup

* use UOpGraph utils

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-21 02:43:25 +03:00
qazal
b33c827aed UOps.RANGE toposort spec (#4660)
* use iterator

* nested loops and outer loads

* uop after phi
2024-05-20 23:38:20 +03:00
qazal
0d9e623d83 consolidate uops tests (#4659)
* merge uoptimize

* move tests

* fix skip message
2024-05-20 21:42:31 +03:00
Szymon Ożóg
1e7b7b2c3c Fix flop coutning for mulacc (#4640)
* Fix flop coutning for mulacc

* add test_simple_mulacc

* Update test_uops_stats.py

* Update test_uops_stats.py

* revert test_mulacc

* Test for MULACC vs MUL+ADD
2024-05-20 12:06:00 -04:00
nimlgen
c9f7f2da70 nv hcq bind api (#4629)
* hcq bind api for nv

* linter

* linter

* add test

* small comment
2024-05-19 23:17:10 +03:00
qazal
d308f4fa9a correctly insert UOps.END* in fuzz result (#4653) 2024-05-19 21:10:28 +03:00
chenyu
456aa0b656 update test_search kernel count (#4652)
integration test that beaming 1 kernel increments kernel count by 1, and moved exiting test_kernel_count to TestTimeLinearizer
2024-05-19 13:54:52 -04:00
qazal
954718e6bf reorder DEFINE_GLOBAL in fuzz_uops (#4651)
* globals base

* test: opt out of DEFINE_GLOBAL

* do it like ExecItem
2024-05-19 20:51:31 +03:00
Léo
967e35f8b8 fix(beam): GlobalCounters kernel count increasing when clearing l2 (#4598)
* fix(beam): GlobalCounters kernel count increasing when clearing l2

* fix: removed the NOSTATS var by adding do_update_stats to Tensor.realize()

* test(search): regression test for _time_program, should not increment kernel_count

* fix(test_search): unused var and now properly checking when l2 is cleared

* fix(test_search): added assert message

* fix(test_search): now testing public beam api for kcount

* ruff fixes

---------

Co-authored-by: Léo Paillé <leo.paille@enseirb-matmeca.fr>
2024-05-19 10:03:47 -07:00
George Hotz
4753283221 LOOP -> RANGE (#4650) 2024-05-19 06:40:20 -07:00
chenyu
286b4dbdf2 compile raise CompileError and skip only RuntimeError in multiprocess… (#4646)
* compile raise CompileError and skip only RuntimeError in multiprocess beam

renderer error with multiprocess should not be skipped by beam

* use `==` for dtype to dtype comparison

* that needs to be is

* typo
2024-05-19 00:25:25 -04:00
qazal
b0cb02f719 uops fuzzing infra (#4641)
* base with bfs

* find paths

* get last

* try blocks

* Revert "try blocks"

This reverts commit 25f8e3fe85.

* this should be simpler

* full exec

* support debug

* fix lint

* add todo

* copy in_degree
2024-05-18 20:19:57 +03:00
qazal
bf8f855838 assert kernel counts in unsupported fusions (#4643)
* replace with comments

* not relevant

* update comment

* custom exception maybe

* fix LoadOps.VIEW
2024-05-18 20:14:37 +03:00
qazal
a5204fe89d refactor UOps.CONST (#4639)
* delete more

* nit: dont need assign

* can this be simpler

* use scalars

* always cast

* clang needs cast

* format
2024-05-18 10:07:36 +03:00
George Hotz
07b350a8f4 new uops is an actual graph (#4560)
* new uops is an actual graph

* it's way slower

* simpler

* fix define acc

* render_loop unique

* ops test pass

* add pattern matcher back, there's bugs

* rewrite

* use priority queue

* recursive children

* fix tests

* fix tests with SINK

* fix abstractions

* fix assembly

* simpler

* link define_acc

* fix DEFINE_ACC placement

* type verify

* full cmp

* fix cmp

* ACCESS_ACC

* insert DEFINE_ACC

* fix PHI

* recursive rewrite

* fix many tests

* sum collapse

* more patterns

* correct change

* fold arange

* fix that lin test

* space

* big folding rule works

* close

* has more maxes, meh

* cached node replace

* set changed

* simplest folding yet

* works

* works

* DIV

* all tests pass

* del

* fuzz linearizer fails

* sum_collapse

* test depth 2 cf

* fix lin test 14

* fix clang depth

* disable that

* failure 14 is fixed

* fix ptx

* failure 27 is fixed

* fix llama

* run_cnt

* Revert "Optimize PTX gated loads index calculation (#4304)"

This reverts commit d97d5a7689.

* fix uops loop

* fix ptx bugs

* add barrier

* print

* mem_type in ptx direct

* bypass tests that fail in CI but pass locally

* ptx remove ptr_ar

* more ptx passing

* fix ptx tests

* assert compile support

* remove  model inference benchmark from red
2024-05-17 18:00:18 -07:00
nimlgen
daf57af3eb move tc to renderers (#4631)
* move tc to renderers

* missed import

* fix typo

* fix

* fix imports

* remove from tests

* fix 4607

* nv emulate timestamp

* time is int

* correct time
2024-05-18 00:36:29 +03:00
nimlgen
10cf8e459b hcq update queue in place (#4626)
* do not self wait in hcq

* faster enqueue

* comments

* tests

* linter

* fix typo
2024-05-17 22:18:20 +03:00
chenyu
c86adabe15 time with real global buffers in search (#4621)
* filter fake buffers in search

* test that

* update test
2024-05-17 12:36:23 -04:00
uuuvn
639ea5b0f2 Metal linearizer failure 22 is flaky not just on CI (#4617)
* METAL doesn't fail anymore, not just on CI

* oops
2024-05-16 11:31:23 -04:00
qazal
f3f2b96583 pick schedule tests from external_test_opt (#4615)
* conv tests

* misc

* that shouldnt const fold
2024-05-16 15:43:41 +03:00
qazal
13200c6894 check simple_pads in all views (#4614) 2024-05-16 14:34:39 +03:00
qazal
0b464df605 base change scheduling spec (#4613)
* spec and kernel cnt

* dont use half

* skip half
2024-05-16 13:30:49 +03:00
nimlgen
65f7e3b3ab nv setup constbuf4 (#4511)
* nv correct constbuf 4

* compare results to cuda

* test fixed

* failed kernel

* repro

* revert this change
2024-05-16 10:42:35 +03:00
chenyu
04f2327ca3 fix abs of diff of uint (#4411) 2024-05-15 18:39:11 -04:00
chenyu
2119e0456d redo simpler abs and sign (#4611)
moved Sign logic to function.py, and backward always returns 0 to match torch.
rewrite abs as `self * self.sign()`, so it's backward also matches torch.
2024-05-15 18:19:46 -04:00