nimlgen
51d6f372e4
nv get classes based on device ( #5325 )
...
* nv get classes
* support in mockgpu
* choose sm based on gpu
* fix
* fix
* fix arch
2024-07-08 18:25:05 +03:00
chenyu
7d049fc20c
move getting 0 and min value of a dtype to dtype.py ( #5328 )
...
cleanup getting base case for reduce ops
[run_process_replay]
2024-07-08 10:51:56 -04:00
nimlgen
b0c5c58833
nv rm_control to rmctrl type ( #5327 )
...
* nv rm_control to rmctrl type
* fix
2024-07-08 17:24:33 +03:00
Elias Wahl
73bddc44f6
Fix fake dataloader ( #5326 )
2024-07-08 09:07:44 -04:00
chenyu
6856f915d6
Tensor.any and Tensor.all ( #5320 )
...
does not work in ptx yet due to how boolean tensor is handled
2024-07-07 14:36:00 -04:00
chenyu
2029cb7047
support passing None to Tensor.clip ( #5319 )
...
passing None for no upper bound or no lower bound
2024-07-07 13:04:22 -04:00
chenyu
296a1a36bb
update Tensor.round doc and example ( #5318 )
...
document rounding half to even and update examples to show
2024-07-07 12:10:39 -04:00
chenyu
c1e330f302
Tensor.int and Tensor.bool ( #5317 )
2024-07-07 11:52:58 -04:00
nimlgen
778d1cdbee
nv allocate local memory dynamically ( #5277 )
...
* nv allocate local memory dynamically
* fix
* linter
* linter 2
* linter
* fixes
2024-07-07 17:34:49 +03:00
qazal
ae10e936e7
UOps.VECTORIZE cleanups [run_process_replay] ( #5314 )
...
* still render_cast
* one extra line ok
* these are all just vectorize
* save space
* behavior change can go in a different diff
2024-07-07 10:49:08 +03:00
greg-niemeyer
77b2ce9fc9
Add UOps.VECTORIZE [run_process_replay] ( #5289 )
...
* Add UOps.VECTORIZE to core
* Update vectorized cast tests
* Addresses code review comments
- Removes VECTORIZE from LLVMRenderer
- Add line breaks to unduly long lines
- Add noop CAST rule back
- Update asserts and add render_vectorize in
CSytleLanguage renderer
* Add missing const folding rule for VECTORIZE
Also adds corresponding test
* Fixes test_const_vectorize_fold and add assert
- Use sane types with VECTORIZE in test_const_vectorize_fold
- Add assert that sanity checks the types for VECTORIZE
* Rename test_cast_vectorized_fold
Renames test_cast_vectorized_fold to test_noop_vectorize_fold
because the test targets a very specific rule and there are
other tests for VECTORIZE.
* Revert unrelated changes
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
Co-authored-by: qazal <qazal.software@gmail.com >
2024-07-07 09:59:57 +03:00
qazal
2a7282c1e1
test: delete the extra cast in cstyle load [run_process_replay] [no_assert] ( #5310 )
...
* test: delete the extra cast in cstyle load [run_process_replay] [no_assert]
* assert buf_uop
* ImageDType
* ptx is actually a 64bit address
2024-07-07 09:12:49 +03:00
chenyu
cededd8eb4
minor multi cleanup ( #5311 )
...
add type, move around and some newlines
2024-07-06 21:55:59 -04:00
qazal
8a99514462
generalize the uops toposort spec to ptx ( #5309 )
...
* generalize spec to ptx
* redundant assert
* extra print
2024-07-07 00:06:30 +03:00
chenyu
ca0ef1700b
use precise::sin in metal ( #5307 )
2024-07-06 12:47:27 -04:00
qazal
5c2ca7bad4
remove UOps.SPECIAL rendering from llvm ( #5306 )
2024-07-06 19:28:47 +03:00
chenyu
356e5d2e54
touchup multi dtype in elementwise ( #5305 )
...
only need to check real once, also added type annotation
2024-07-06 11:54:12 -04:00
qazal
7ddda9f9f1
hotfix: cache seen graphs in fusion ( #5302 )
2024-07-06 14:13:58 +03:00
qazal
11dfb19b20
track seen graphs in recursive group ( #5301 )
...
* track seen
* maybe never add realized
* ahh it needs to track sts
* delete extra check
* cache typings
* minor cleanup
2024-07-06 12:39:31 +03:00
qazal
d813617742
prescheduling refactor ( #5300 )
...
* p1
* refactor tuple
2024-07-06 12:04:03 +03:00
qazal
c1e166c08a
fix dtype mismatch for bool ops in multi ( #5299 )
2024-07-06 11:36:40 +03:00
chenyu
fc03fc025e
enable sin on METAL in test_dtype_alu ( #5298 )
2024-07-05 14:52:09 -04:00
qazal
b369e75ed0
refactor schedule creation ( #5297 )
2024-07-05 21:14:38 +03:00
qazal
5292d37db6
LoadOps.VIEW in the scheduler spec ( #5296 )
...
* refactor to allow_buffer_view
* tests
* fix multi
2024-07-05 19:43:50 +03:00
hikettei
1ab7a4cff0
Handling Multiple UnaryOps.BITCAST in Function for Proper Kernel Fusion [run_process_replay] ( #5172 )
...
* [Patch] added an option not to ignore view replacing when doing bitcast
* added the testcase
* [Add] reproduced bitcast cannot be fused into a single kernel in the unittest
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-07-05 19:16:44 +03:00
chenyu
43c3f73fbc
handcode_bert_opt.py ( #5295 )
...
similar to handcode_resnet50_opt.py, one file to check bert kernels without dataset.
2024-07-05 11:01:20 -04:00
nimlgen
d7835a705c
hotfix: fix metal with vars ( #5294 )
...
* hotfix: fix metal with vars
* one more place
2024-07-05 16:53:40 +03:00
nimlgen
8a548b0b6e
metal support offset ( #5293 )
2024-07-05 16:13:05 +03:00
qazal
1cefbb33ab
uop graph tests + type_verify cleanup ( #5292 )
...
* test_cast_alu_fold
* test_double_cast_fold + these should assert
2024-07-05 13:00:01 +03:00
qazal
341c4a29d1
hotfix: use dtype.scalar() for rendering cast [run_process_replay] [no_assert] ( #5290 )
2024-07-05 11:29:35 +03:00
chenyu
87d27c45ec
minor _broadcast cleanup ( #5286 )
...
`any(x==0 for x in y)` is `0 in y`.
also `get_args(ConstType)` instead of hard coded `float, int, bool`
2024-07-04 14:25:24 -04:00
SnakeOnex
8c03816ae9
fix README example ( #5284 )
...
* fixed README example
* README test
* changed py -> python markdown code flags in REAME
2024-07-04 11:15:07 -04:00
nimlgen
2778b6046c
new memory scheduler ( #5278 )
...
* new memory schedule algo
* works
* fix
* fix
* linter
* tiny fixes
* do not optimize copy buffers
* mpre comments
* tiny cleanups
2024-07-04 18:06:04 +03:00
nimlgen
84b3e3bb6f
hcq exec no embedded signal ( #5142 )
2024-07-04 13:29:21 +03:00
Tobias Fischer
0c3a35e5c2
Stable Diffusion v2 Inference ( #5283 )
...
* model implementation
* clip fix, more qol options
2024-07-03 22:47:10 -04:00
chenyu
e5ba385f03
remove first contiguous in multi from_sharded ( #5121 )
...
second contiguous guarantees lbs are contiguous going into MultiLazyBuffer, don't need the first contiguous
2024-07-03 19:42:56 -04:00
chenyu
f1ff65e763
remove "no-nans-fp-math"="true" for LLVM ( #5282 )
...
fixed isnan for llvm (still have issue with < nan)
2024-07-03 17:52:50 -04:00
chenyu
3929a9dc94
fix UOp.cmp_tuple for ALU ( #5280 )
...
* fix UOp.cmp_tuple for ALU
for ALU, use self.arg instead of self.op to compare
* skip that?
2024-07-03 14:59:05 -04:00
qazal
a9d6a6c339
verify_lazyop with multi reduce ( #5276 )
...
* outsource the assert to the implicit movement op check
* tests
2024-07-03 20:15:42 +03:00
George Hotz
16e3b8b013
uops work from lowerer [run_process_replay] ( #5279 )
2024-07-03 09:40:00 -07:00
chenyu
622b7bd556
simpler TinyJit inside TinyJit detection ( #5219 )
...
* simpler TinyJit inside TinyJit detection
suggested in 73395b998b (commitcomment-143660402)
* cannot repro...
* clear the way out
* finally clear
2024-07-03 12:28:53 -04:00
gip
04ef0fd328
fix: message when applegpu tools missiong ( #5236 )
2024-07-03 09:07:09 -07:00
reddyn12
d3e244d8b7
prev speed improvements ( #5252 )
...
Co-authored-by: reddyn <nikidsniper@gmail.com >
2024-07-03 09:06:01 -07:00
nimlgen
21d41f06a2
nv follows HCQCompatAllocRes protocol ( #5275 )
...
* nv follows HCQCompatAllocRes protocol
* fix amd
2024-07-03 11:34:10 +03:00
Vyacheslav Pachkov
d3e4e21759
add return type for HCQCompatAllocator _alloc ( #5267 )
...
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
2024-07-03 10:25:44 +03:00
chenyu
191463a919
add timing to SDXL ( #5273 )
2024-07-02 23:29:54 -04:00
chenyu
b2c3a28a5e
nn.RMSNorm ( #5272 )
...
the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize
2024-07-02 21:39:01 -04:00
chenyu
9a2a82a77f
test stable diffusion unet in ci ( #5268 )
...
unet is parameterized now so can test a smaller one is ci
2024-07-02 21:37:52 -04:00
chenyu
ce52b10f6f
add a flag DISABLE_LOOP_COLLAPSE ( #5270 )
...
workaround if user encountered UNMUL error
2024-07-02 20:01:11 -04:00
George Hotz
e53b164e1a
small changes from lowerer ( #5266 )
2024-07-02 15:03:54 -07:00