wozeparrot
|
46e360fdc0
|
check bfloat16 range with threefry (#6660)
|
2024-09-23 10:48:44 +08:00 |
|
qazal
|
d24e4b1042
|
viz more kernel view work (#6659)
|
2024-09-23 10:48:35 +08:00 |
|
qazal
|
6be1bf09f1
|
hotfix: bring COMPARE_SCHEDULE=0 back (#6657)
|
2024-09-23 10:39:43 +08:00 |
|
George Hotz
|
e945fa9c5c
|
put local on the PtrDtype [run_process_replay] (#6656)
* put local on the PtrDtype [run_process_replay]
* those are local too
|
2024-09-23 10:29:17 +08:00 |
|
chenyu
|
90c1ccc402
|
simpler drop valid check in simplify_valid_image_load (#6653)
* simpler drop valid check in simplify_valid_image_load
* update tests
|
2024-09-22 21:46:39 -04:00 |
|
qazal
|
99ed9fb75e
|
simpler verify_ast [run_process_replay] (#6654)
|
2024-09-23 09:40:09 +08:00 |
|
nimlgen
|
8a9195d86e
|
qcom texs refactor (#6613)
* qcom texs refactor
* fix
* linter
* qcombuf
* linter
|
2024-09-23 09:03:17 +08:00 |
|
qazal
|
d1bae42d35
|
viz lowerer and graph_rewrite dedup try 2 (#6652)
|
2024-09-22 21:09:46 +08:00 |
|
qazal
|
6b65d8c461
|
more process replay tracing work [run_process_replay] (#6650)
|
2024-09-22 16:16:58 +08:00 |
|
George Hotz
|
4fc5a34fe7
|
lowerer is just a graph rewrite, not a class [run_process_replay] (#6648)
|
2024-09-22 14:15:33 +08:00 |
|
George Hotz
|
0eb710de84
|
move WMMA out of lowerer [run_process_replay] (#6647)
|
2024-09-22 14:05:51 +08:00 |
|
George Hotz
|
84703d5b77
|
replace the lowerer with a contextual PatternMatcher [run_process_replay] (#6646)
* replace the lowerer with a contextual PatternMatcher [run_process_replay]
* todo
* it's REDUCE by the time it's in lowerer
|
2024-09-22 13:22:26 +08:00 |
|
qazal
|
4751159139
|
second iteration on viz/serve.py (#6643)
* small detail in checkStatus
* better abstractions for the api
* update test_viz
* ui updates
|
2024-09-22 08:49:44 +08:00 |
|
qazal
|
5bafed2f88
|
process replay traceback (#6642)
|
2024-09-21 16:53:34 +08:00 |
|
chenyu
|
9456a625bc
|
const_like type fix (#6641)
`Tuple[ConstType, ...]` instead of `Tuple[ConstType]`
|
2024-09-21 03:44:08 -04:00 |
|
qazal
|
8edce82124
|
viz show server status (#6640)
|
2024-09-21 15:08:13 +08:00 |
|
qazal
|
982086f54c
|
UOps.VALID try 2 (#6623)
* make UOps.VALID compile
* fixable tests
* bufs dedup
* cleanup the CONST spec
* regenerate dataset with graph_rewrite
```py
def rewrite_const(const:UOp, st_src:UOp) -> UOp:
st: ShapeTracker = st_src.arg
return UOp(UOps.VALID, dtypes.bool, (st.to_uop(),)).where(UOp.const(const.dtype, const.arg), UOp.const(const.dtype, 0))
pm = PatternMatcher([(UPat(UOps.CONST, name="const", src=(UPat(UOps.SHAPETRACKER, name="st_src"),)), rewrite_const)])
```
* rm arg
* remove arg
* revert arg removal
This reverts commit 2c35c75c95.
* red test_pickle_define_var
|
2024-09-21 14:19:25 +08:00 |
|
qazal
|
dd05e27622
|
remove UOp from DEFINE_VAR arg [run_process_replay] (#6639)
* remove UOp from DEFINE_VAR arg [run_process_replay]
* that assert is in `spec`
* more .args to remove
|
2024-09-21 14:07:56 +08:00 |
|
qazal
|
d2351af019
|
fixup non-void SINKs in tests [run_process_replay] (#6624)
|
2024-09-21 13:29:18 +08:00 |
|
qazal
|
391d14438e
|
DEFINE_VAR prereqs for VALID [run_process_replay] (#6637)
|
2024-09-21 13:28:39 +08:00 |
|
Tobias Fischer
|
c1bbd15bd9
|
Sharded SDXL Inference (#6328)
* initial sharding fixes
* sigma device fix
* emptyline space fix
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
|
2024-09-21 01:26:43 -04:00 |
|
chenyu
|
b91aa1c3d1
|
validhack cleanup parse valid [run_process_replay] (#6635)
|
2024-09-20 11:51:56 -04:00 |
|
George Hotz
|
683857de5d
|
make match a method on UPat [run_process_replay] (#6634)
* make match a method on UPat [run_process_replay]
* remove class stuff
* a cleaner UPatAny
|
2024-09-20 20:00:03 +08:00 |
|
qazal
|
dbe890b358
|
lowerer don't re-init UOp if we aren't rewriting [run_process_replay] (#6633)
|
2024-09-20 19:26:49 +08:00 |
|
nimlgen
|
21f2d79461
|
qcom match gpu impl for reg a6xx_sp_cs_unknown_a9b1 (#6631)
|
2024-09-20 18:14:00 +08:00 |
|
nimlgen
|
053c4dee55
|
qcom test for image pitch (#6621)
* qcom test for image pitch
* comment
|
2024-09-20 18:13:48 +08:00 |
|
chenyu
|
37ddd971e6
|
validhack explicity check valid has CMPLT [run_process_replay] (#6630)
valid might a const if gate folding is disabled
|
2024-09-20 06:13:05 -04:00 |
|
qazal
|
581a389a58
|
limit ctx tracking in TrackedPatternMatcher [run_process_replay] (#6629)
* limit ctx tracking in TrackedPatternMatcher [run_process_replay]
* add regression test
|
2024-09-20 18:06:05 +08:00 |
|
nimlgen
|
641586cb87
|
qcom updated ioctl (#6627)
|
2024-09-20 18:02:40 +08:00 |
|
qazal
|
98644a047b
|
use UOp.define_var in Variable shape [run_process_replay] (#6626)
|
2024-09-20 17:58:29 +08:00 |
|
chenyu
|
acef3e67fa
|
add an example that idx is const and valid cannot be removed (#6625)
very weird
|
2024-09-20 05:46:27 -04:00 |
|
chenyu
|
5707503048
|
x//a<b -> x <a*b for positive a (#6622)
openpilot valids 47 -> 37
|
2024-09-20 04:38:47 -04:00 |
|
qazal
|
72c7087420
|
viz add kernel code (#6620)
* viz add kernel code
* no defaultdict
* ctxs
|
2024-09-20 16:31:47 +08:00 |
|
qazal
|
2dfb1e022c
|
UOp st prereqs for valid [run_process_replay] (#6618)
|
2024-09-20 15:55:35 +08:00 |
|
qazal
|
74f8f86631
|
viz kernel tree view (#6614)
* viz kernel tree view
* use get_kernel
* remove current_kernel
cleanup current_kernel
* unset kernel name
|
2024-09-20 15:52:12 +08:00 |
|
chenyu
|
b14c1bc417
|
UOps.RANGE is_increasing (#6615)
* UOps.RANGE is_increasing
283 -> 47 valids
* test
|
2024-09-20 03:14:52 -04:00 |
|
Comma Device
|
76aa6416d7
|
qcom: add disassembler with DEBUG >= 5
|
2024-09-20 07:04:28 +00:00 |
|
chenyu
|
036c2f5b26
|
validhack use the new style ge for upper bound valid (#6612)
also relaxed the bound check to check vmin/vmax instead just const.
valids 482 -> 283
|
2024-09-19 23:45:42 -04:00 |
|
George Hotz
|
c4d5575c61
|
beat mlx at resnet 18 (#6611)
* work to beat mlx at resnet18 [run_process_replay]
* pruning
* wino sometimes
* shorter
* comment
|
2024-09-20 11:28:01 +08:00 |
|
qazal
|
785aaec67c
|
make VIZ more responsive for big graphs (#6610)
|
2024-09-20 08:56:13 +08:00 |
|
George Hotz
|
78699d9924
|
15% more folder speed [run_process_replay] (#6607)
* 15% more folder speed [run_process_replay]
* gep cleanups
|
2024-09-19 22:34:42 +08:00 |
|
chenyu
|
a37e92081a
|
fix unrolled arange folding (#6606)
* fix unrolled arange folding
also added flop test to test_arange to make sure it's 0 flop
* skip PTX
|
2024-09-19 09:03:01 -04:00 |
|
qazal
|
eebd23155c
|
move scheduler rewrites into full_ast_rewrite [run_process_replay] (#6609)
|
2024-09-19 20:03:28 +08:00 |
|
qazal
|
31748c72c4
|
refactor viz to parse_qs (#6608)
|
2024-09-19 19:51:41 +08:00 |
|
nimlgen
|
944cc46e11
|
qcom fix image pitch (#6600)
* qcom fix image pitch
* correct
|
2024-09-19 18:50:02 +08:00 |
|
George Hotz
|
a1a882b006
|
arange folding with new ge (#6604)
* arange folding with new ge
* bump allowed gated
* bump allowed speed
|
2024-09-19 18:01:28 +08:00 |
|
George Hotz
|
224151a958
|
update indexing with UPat.any [run_process_replay] (#6605)
|
2024-09-19 17:40:17 +08:00 |
|
chenyu
|
d148a62f8d
|
more generic simplify_valid_image_load (#6603)
use graph_rewrite to simplify the expression with narrowed variables, and check boundry conditions on monotonically increasing function to drop valid.
|
2024-09-19 05:33:37 -04:00 |
|
George Hotz
|
718ecad2ee
|
add UPat.any support [run_process_replay] (#6602)
* add UPat.any support [run_process_replay]
* single arange pattern
* no loop_start and loop_end
|
2024-09-19 17:11:24 +08:00 |
|
qazal
|
d06b36e527
|
viz open UPat links in editor (#6601)
* move the reloader
* open links in editor
* less things in ui
|
2024-09-19 16:48:09 +08:00 |
|