Commit Graph

4981 Commits

Author SHA1 Message Date
qazal
bee96a19ff fuzz uop schedules (#5345)
* basic blocks + cleanups

* fixups

* elif is better for future me

* fuzz_schedule_max_paths

* fix linter
2024-07-09 15:24:56 +03:00
Ian Paul
d5a68ae6b3 Simple abstractions3.py fix (#5343)
* abstractions3.py fix

* Add abstractions3.py to CI tests
2024-07-09 13:48:42 +03:00
nimlgen
a2a9bfd2ec nv correct error messages with ptx (#5341)
* nv correct error messages with ptx

* return compile error
2024-07-09 10:39:39 +03:00
George Hotz
c13da83f12 tests from lowerer branch (#5339)
* tests from lowerer branch

* Update test_image_dtype.py

* Update test_image_dtype.py

* Update test_image_dtype.py
2024-07-08 21:23:19 -07:00
chenyu
4ceab5d2b1 fix PTX match rule for gated LOAD (#5338)
* test padto sum with bool tensor and bool acc dtype

make sure bool tensor acc with gate is handled correctly

* broken in PTX

* fix ptx
2024-07-08 22:25:03 -04:00
chenyu
a80f2df1bd fix some PTX tests (#5337)
fix broken PTX tests in test_linearizer and test_uops. there are tests that were skipped and broken because it runs only with CUDA=1 and we run PTX with NV=1 now
2024-07-08 21:33:05 -04:00
wozeparrot
9150a6be7a tensor metadata (#5271) 2024-07-08 17:45:40 -07:00
chenyu
7f642aa7ed minor PTX matcher cleanup [run_process_replay] (#5336)
* minor PTX matcher cleanup [run_process_replay]

uop.cast syntatic sugar and some newline/space cleanup

* comment
2024-07-08 19:19:20 -04:00
chenyu
0f0940225a fix Tensor.all and Tensor.any for PTX (#5335)
supported boolean acc and boolean phi. and rewrite boolean max to uint8 max
2024-07-08 18:15:04 -04:00
Roelof van Dijk
053c706961 refactor: expr_view on View (#5315) 2024-07-08 11:47:34 -07:00
kormann
2349d837fb Fix scope order in graph toposort [run_process_replay] (#5330)
* fix

* test

* nothing
2024-07-08 11:46:15 -07:00
chenyu
631bc974a0 raise line count limit to 8500 (#5331) 2024-07-08 14:00:28 -04:00
Timmy
bb7746985f multireduce scheduler tests (#5141)
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-07-08 20:28:55 +03:00
nimlgen
bb2222e488 nv default for ampere & ada (#5329) 2024-07-08 19:01:27 +03:00
nimlgen
51d6f372e4 nv get classes based on device (#5325)
* nv get classes

* support in mockgpu

* choose sm based on gpu

* fix

* fix

* fix arch
2024-07-08 18:25:05 +03:00
chenyu
7d049fc20c move getting 0 and min value of a dtype to dtype.py (#5328)
cleanup getting base case for reduce ops
[run_process_replay]
2024-07-08 10:51:56 -04:00
nimlgen
b0c5c58833 nv rm_control to rmctrl type (#5327)
* nv rm_control to rmctrl type

* fix
2024-07-08 17:24:33 +03:00
Elias Wahl
73bddc44f6 Fix fake dataloader (#5326) 2024-07-08 09:07:44 -04:00
chenyu
6856f915d6 Tensor.any and Tensor.all (#5320)
does not work in ptx yet due to how boolean tensor is handled
2024-07-07 14:36:00 -04:00
chenyu
2029cb7047 support passing None to Tensor.clip (#5319)
passing None for no upper bound or no lower bound
2024-07-07 13:04:22 -04:00
chenyu
296a1a36bb update Tensor.round doc and example (#5318)
document rounding half to even and update examples to show
2024-07-07 12:10:39 -04:00
chenyu
c1e330f302 Tensor.int and Tensor.bool (#5317) 2024-07-07 11:52:58 -04:00
nimlgen
778d1cdbee nv allocate local memory dynamically (#5277)
* nv allocate local memory dynamically

* fix

* linter

* linter 2

* linter

* fixes
2024-07-07 17:34:49 +03:00
qazal
ae10e936e7 UOps.VECTORIZE cleanups [run_process_replay] (#5314)
* still render_cast

* one extra line ok

* these are all just vectorize

* save space

* behavior change can go in a different diff
2024-07-07 10:49:08 +03:00
greg-niemeyer
77b2ce9fc9 Add UOps.VECTORIZE [run_process_replay] (#5289)
* Add UOps.VECTORIZE to core

* Update vectorized cast tests

* Addresses code review comments

- Removes VECTORIZE from LLVMRenderer
- Add line breaks to unduly long lines
- Add noop CAST rule back
- Update asserts and add render_vectorize in
  CSytleLanguage renderer

* Add missing const folding rule for VECTORIZE

Also adds corresponding test

* Fixes test_const_vectorize_fold and add assert

- Use sane types with VECTORIZE in test_const_vectorize_fold
- Add assert that sanity checks the types for VECTORIZE

* Rename test_cast_vectorized_fold

Renames test_cast_vectorized_fold to test_noop_vectorize_fold
because the test targets a very specific rule and there are
other tests for VECTORIZE.

* Revert unrelated changes

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
Co-authored-by: qazal <qazal.software@gmail.com>
2024-07-07 09:59:57 +03:00
qazal
2a7282c1e1 test: delete the extra cast in cstyle load [run_process_replay] [no_assert] (#5310)
* test: delete the extra cast in cstyle load [run_process_replay] [no_assert]

* assert buf_uop

* ImageDType

* ptx is actually a 64bit address
2024-07-07 09:12:49 +03:00
chenyu
cededd8eb4 minor multi cleanup (#5311)
add type, move around and some newlines
2024-07-06 21:55:59 -04:00
qazal
8a99514462 generalize the uops toposort spec to ptx (#5309)
* generalize spec to ptx

* redundant assert

* extra print
2024-07-07 00:06:30 +03:00
chenyu
ca0ef1700b use precise::sin in metal (#5307) 2024-07-06 12:47:27 -04:00
qazal
5c2ca7bad4 remove UOps.SPECIAL rendering from llvm (#5306) 2024-07-06 19:28:47 +03:00
chenyu
356e5d2e54 touchup multi dtype in elementwise (#5305)
only need to check real once, also added type annotation
2024-07-06 11:54:12 -04:00
qazal
7ddda9f9f1 hotfix: cache seen graphs in fusion (#5302) 2024-07-06 14:13:58 +03:00
qazal
11dfb19b20 track seen graphs in recursive group (#5301)
* track seen

* maybe never add realized

* ahh it needs to track sts

* delete extra check

* cache typings

* minor cleanup
2024-07-06 12:39:31 +03:00
qazal
d813617742 prescheduling refactor (#5300)
* p1

* refactor tuple
2024-07-06 12:04:03 +03:00
qazal
c1e166c08a fix dtype mismatch for bool ops in multi (#5299) 2024-07-06 11:36:40 +03:00
chenyu
fc03fc025e enable sin on METAL in test_dtype_alu (#5298) 2024-07-05 14:52:09 -04:00
qazal
b369e75ed0 refactor schedule creation (#5297) 2024-07-05 21:14:38 +03:00
qazal
5292d37db6 LoadOps.VIEW in the scheduler spec (#5296)
* refactor to allow_buffer_view

* tests

* fix multi
2024-07-05 19:43:50 +03:00
hikettei
1ab7a4cff0 Handling Multiple UnaryOps.BITCAST in Function for Proper Kernel Fusion [run_process_replay] (#5172)
* [Patch] added an option not to ignore view replacing when doing bitcast

* added the testcase

* [Add] reproduced bitcast cannot be fused into a single kernel in the unittest

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-07-05 19:16:44 +03:00
chenyu
43c3f73fbc handcode_bert_opt.py (#5295)
similar to handcode_resnet50_opt.py, one file to check bert kernels without dataset.
2024-07-05 11:01:20 -04:00
nimlgen
d7835a705c hotfix: fix metal with vars (#5294)
* hotfix: fix metal with vars

* one more place
2024-07-05 16:53:40 +03:00
nimlgen
8a548b0b6e metal support offset (#5293) 2024-07-05 16:13:05 +03:00
qazal
1cefbb33ab uop graph tests + type_verify cleanup (#5292)
* test_cast_alu_fold

* test_double_cast_fold + these should assert
2024-07-05 13:00:01 +03:00
qazal
341c4a29d1 hotfix: use dtype.scalar() for rendering cast [run_process_replay] [no_assert] (#5290) 2024-07-05 11:29:35 +03:00
chenyu
87d27c45ec minor _broadcast cleanup (#5286)
`any(x==0 for x in y)` is `0 in y`.

also `get_args(ConstType)` instead of hard coded `float, int, bool`
2024-07-04 14:25:24 -04:00
SnakeOnex
8c03816ae9 fix README example (#5284)
* fixed README example

* README test

* changed py -> python markdown code flags in REAME
2024-07-04 11:15:07 -04:00
nimlgen
2778b6046c new memory scheduler (#5278)
* new memory schedule algo

* works

* fix

* fix

* linter

* tiny fixes

* do not optimize copy buffers

* mpre comments

* tiny cleanups
2024-07-04 18:06:04 +03:00
nimlgen
84b3e3bb6f hcq exec no embedded signal (#5142) 2024-07-04 13:29:21 +03:00
Tobias Fischer
0c3a35e5c2 Stable Diffusion v2 Inference (#5283)
* model implementation

* clip fix, more qol options
2024-07-03 22:47:10 -04:00
chenyu
e5ba385f03 remove first contiguous in multi from_sharded (#5121)
second contiguous guarantees lbs are contiguous going into MultiLazyBuffer, don't need the first contiguous
2024-07-03 19:42:56 -04:00