qazal
bee96a19ff
fuzz uop schedules ( #5345 )
...
* basic blocks + cleanups
* fixups
* elif is better for future me
* fuzz_schedule_max_paths
* fix linter
2024-07-09 15:24:56 +03:00
Ian Paul
d5a68ae6b3
Simple abstractions3.py fix ( #5343 )
...
* abstractions3.py fix
* Add abstractions3.py to CI tests
2024-07-09 13:48:42 +03:00
nimlgen
a2a9bfd2ec
nv correct error messages with ptx ( #5341 )
...
* nv correct error messages with ptx
* return compile error
2024-07-09 10:39:39 +03:00
George Hotz
c13da83f12
tests from lowerer branch ( #5339 )
...
* tests from lowerer branch
* Update test_image_dtype.py
* Update test_image_dtype.py
* Update test_image_dtype.py
2024-07-08 21:23:19 -07:00
chenyu
4ceab5d2b1
fix PTX match rule for gated LOAD ( #5338 )
...
* test padto sum with bool tensor and bool acc dtype
make sure bool tensor acc with gate is handled correctly
* broken in PTX
* fix ptx
2024-07-08 22:25:03 -04:00
chenyu
a80f2df1bd
fix some PTX tests ( #5337 )
...
fix broken PTX tests in test_linearizer and test_uops. there are tests that were skipped and broken because it runs only with CUDA=1 and we run PTX with NV=1 now
2024-07-08 21:33:05 -04:00
wozeparrot
9150a6be7a
tensor metadata ( #5271 )
2024-07-08 17:45:40 -07:00
chenyu
7f642aa7ed
minor PTX matcher cleanup [run_process_replay] ( #5336 )
...
* minor PTX matcher cleanup [run_process_replay]
uop.cast syntatic sugar and some newline/space cleanup
* comment
2024-07-08 19:19:20 -04:00
chenyu
0f0940225a
fix Tensor.all and Tensor.any for PTX ( #5335 )
...
supported boolean acc and boolean phi. and rewrite boolean max to uint8 max
2024-07-08 18:15:04 -04:00
Roelof van Dijk
053c706961
refactor: expr_view on View ( #5315 )
2024-07-08 11:47:34 -07:00
kormann
2349d837fb
Fix scope order in graph toposort [run_process_replay] ( #5330 )
...
* fix
* test
* nothing
2024-07-08 11:46:15 -07:00
chenyu
631bc974a0
raise line count limit to 8500 ( #5331 )
2024-07-08 14:00:28 -04:00
Timmy
bb7746985f
multireduce scheduler tests ( #5141 )
...
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-07-08 20:28:55 +03:00
nimlgen
bb2222e488
nv default for ampere & ada ( #5329 )
2024-07-08 19:01:27 +03:00
nimlgen
51d6f372e4
nv get classes based on device ( #5325 )
...
* nv get classes
* support in mockgpu
* choose sm based on gpu
* fix
* fix
* fix arch
2024-07-08 18:25:05 +03:00
chenyu
7d049fc20c
move getting 0 and min value of a dtype to dtype.py ( #5328 )
...
cleanup getting base case for reduce ops
[run_process_replay]
2024-07-08 10:51:56 -04:00
nimlgen
b0c5c58833
nv rm_control to rmctrl type ( #5327 )
...
* nv rm_control to rmctrl type
* fix
2024-07-08 17:24:33 +03:00
Elias Wahl
73bddc44f6
Fix fake dataloader ( #5326 )
2024-07-08 09:07:44 -04:00
chenyu
6856f915d6
Tensor.any and Tensor.all ( #5320 )
...
does not work in ptx yet due to how boolean tensor is handled
2024-07-07 14:36:00 -04:00
chenyu
2029cb7047
support passing None to Tensor.clip ( #5319 )
...
passing None for no upper bound or no lower bound
2024-07-07 13:04:22 -04:00
chenyu
296a1a36bb
update Tensor.round doc and example ( #5318 )
...
document rounding half to even and update examples to show
2024-07-07 12:10:39 -04:00
chenyu
c1e330f302
Tensor.int and Tensor.bool ( #5317 )
2024-07-07 11:52:58 -04:00
nimlgen
778d1cdbee
nv allocate local memory dynamically ( #5277 )
...
* nv allocate local memory dynamically
* fix
* linter
* linter 2
* linter
* fixes
2024-07-07 17:34:49 +03:00
qazal
ae10e936e7
UOps.VECTORIZE cleanups [run_process_replay] ( #5314 )
...
* still render_cast
* one extra line ok
* these are all just vectorize
* save space
* behavior change can go in a different diff
2024-07-07 10:49:08 +03:00
greg-niemeyer
77b2ce9fc9
Add UOps.VECTORIZE [run_process_replay] ( #5289 )
...
* Add UOps.VECTORIZE to core
* Update vectorized cast tests
* Addresses code review comments
- Removes VECTORIZE from LLVMRenderer
- Add line breaks to unduly long lines
- Add noop CAST rule back
- Update asserts and add render_vectorize in
CSytleLanguage renderer
* Add missing const folding rule for VECTORIZE
Also adds corresponding test
* Fixes test_const_vectorize_fold and add assert
- Use sane types with VECTORIZE in test_const_vectorize_fold
- Add assert that sanity checks the types for VECTORIZE
* Rename test_cast_vectorized_fold
Renames test_cast_vectorized_fold to test_noop_vectorize_fold
because the test targets a very specific rule and there are
other tests for VECTORIZE.
* Revert unrelated changes
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
Co-authored-by: qazal <qazal.software@gmail.com >
2024-07-07 09:59:57 +03:00
qazal
2a7282c1e1
test: delete the extra cast in cstyle load [run_process_replay] [no_assert] ( #5310 )
...
* test: delete the extra cast in cstyle load [run_process_replay] [no_assert]
* assert buf_uop
* ImageDType
* ptx is actually a 64bit address
2024-07-07 09:12:49 +03:00
chenyu
cededd8eb4
minor multi cleanup ( #5311 )
...
add type, move around and some newlines
2024-07-06 21:55:59 -04:00
qazal
8a99514462
generalize the uops toposort spec to ptx ( #5309 )
...
* generalize spec to ptx
* redundant assert
* extra print
2024-07-07 00:06:30 +03:00
chenyu
ca0ef1700b
use precise::sin in metal ( #5307 )
2024-07-06 12:47:27 -04:00
qazal
5c2ca7bad4
remove UOps.SPECIAL rendering from llvm ( #5306 )
2024-07-06 19:28:47 +03:00
chenyu
356e5d2e54
touchup multi dtype in elementwise ( #5305 )
...
only need to check real once, also added type annotation
2024-07-06 11:54:12 -04:00
qazal
7ddda9f9f1
hotfix: cache seen graphs in fusion ( #5302 )
2024-07-06 14:13:58 +03:00
qazal
11dfb19b20
track seen graphs in recursive group ( #5301 )
...
* track seen
* maybe never add realized
* ahh it needs to track sts
* delete extra check
* cache typings
* minor cleanup
2024-07-06 12:39:31 +03:00
qazal
d813617742
prescheduling refactor ( #5300 )
...
* p1
* refactor tuple
2024-07-06 12:04:03 +03:00
qazal
c1e166c08a
fix dtype mismatch for bool ops in multi ( #5299 )
2024-07-06 11:36:40 +03:00
chenyu
fc03fc025e
enable sin on METAL in test_dtype_alu ( #5298 )
2024-07-05 14:52:09 -04:00
qazal
b369e75ed0
refactor schedule creation ( #5297 )
2024-07-05 21:14:38 +03:00
qazal
5292d37db6
LoadOps.VIEW in the scheduler spec ( #5296 )
...
* refactor to allow_buffer_view
* tests
* fix multi
2024-07-05 19:43:50 +03:00
hikettei
1ab7a4cff0
Handling Multiple UnaryOps.BITCAST in Function for Proper Kernel Fusion [run_process_replay] ( #5172 )
...
* [Patch] added an option not to ignore view replacing when doing bitcast
* added the testcase
* [Add] reproduced bitcast cannot be fused into a single kernel in the unittest
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-07-05 19:16:44 +03:00
chenyu
43c3f73fbc
handcode_bert_opt.py ( #5295 )
...
similar to handcode_resnet50_opt.py, one file to check bert kernels without dataset.
2024-07-05 11:01:20 -04:00
nimlgen
d7835a705c
hotfix: fix metal with vars ( #5294 )
...
* hotfix: fix metal with vars
* one more place
2024-07-05 16:53:40 +03:00
nimlgen
8a548b0b6e
metal support offset ( #5293 )
2024-07-05 16:13:05 +03:00
qazal
1cefbb33ab
uop graph tests + type_verify cleanup ( #5292 )
...
* test_cast_alu_fold
* test_double_cast_fold + these should assert
2024-07-05 13:00:01 +03:00
qazal
341c4a29d1
hotfix: use dtype.scalar() for rendering cast [run_process_replay] [no_assert] ( #5290 )
2024-07-05 11:29:35 +03:00
chenyu
87d27c45ec
minor _broadcast cleanup ( #5286 )
...
`any(x==0 for x in y)` is `0 in y`.
also `get_args(ConstType)` instead of hard coded `float, int, bool`
2024-07-04 14:25:24 -04:00
SnakeOnex
8c03816ae9
fix README example ( #5284 )
...
* fixed README example
* README test
* changed py -> python markdown code flags in REAME
2024-07-04 11:15:07 -04:00
nimlgen
2778b6046c
new memory scheduler ( #5278 )
...
* new memory schedule algo
* works
* fix
* fix
* linter
* tiny fixes
* do not optimize copy buffers
* mpre comments
* tiny cleanups
2024-07-04 18:06:04 +03:00
nimlgen
84b3e3bb6f
hcq exec no embedded signal ( #5142 )
2024-07-04 13:29:21 +03:00
Tobias Fischer
0c3a35e5c2
Stable Diffusion v2 Inference ( #5283 )
...
* model implementation
* clip fix, more qol options
2024-07-03 22:47:10 -04:00
chenyu
e5ba385f03
remove first contiguous in multi from_sharded ( #5121 )
...
second contiguous guarantees lbs are contiguous going into MultiLazyBuffer, don't need the first contiguous
2024-07-03 19:42:56 -04:00