chenyu
|
8d75326cb5
|
do not fold var with min==max (#6713)
not really used, want it to keep as a var for valid simplification
[run_process_replay]
|
2024-09-24 06:16:34 -04:00 |
|
chenyu
|
9e51879019
|
fix idx setup in image_valid test_openpilot_conv3 (#6710)
* fix idx setup in image_valid test_openpilot_conv3
* corrected output and sad
|
2024-09-24 05:49:04 -04:00 |
|
wozeparrot
|
2be0b26a1f
|
rand only supports single device (#6682)
|
2024-09-24 16:07:44 +08:00 |
|
chenyu
|
4bb1694f49
|
more tests about bounds of UOp divs (#6700)
|
2024-09-24 00:41:43 -04:00 |
|
chenyu
|
79aef64d70
|
update tests in test_image_valid (#6698)
|
2024-09-24 00:04:21 -04:00 |
|
George Hotz
|
7c38121280
|
load penalty (#6681)
* bias/bn loads after loops
* load penalty in fix_priority
* more generic test
|
2024-09-23 18:12:12 +08:00 |
|
George Hotz
|
431ffc4254
|
hotfix: delete float16 failing
|
2024-09-23 17:42:57 +08:00 |
|
chenyu
|
f55459c98e
|
failed validhack test for a 0.9.7 conv (#6677)
|
2024-09-23 04:43:47 -04:00 |
|
chenyu
|
0362dbbbe8
|
relax idx simplification given valid (#6669)
apply to kernels in op 0.9.7.
if a valid has a complicated expr, we cannot drop valid but it's possible to simplify idx given valid
|
2024-09-23 03:04:57 -04:00 |
|
chenyu
|
26ebb7cab4
|
don't use div_folding in lt_folding (#6666)
* don't use div_folding in lt_folding
valids 35 -> 13
* fails the same as before
|
2024-09-23 01:50:18 -04:00 |
|
chenyu
|
da5b741656
|
removed valid in openpilot conv (#6619)
35 valids left
|
2024-09-23 00:30:18 -04:00 |
|
George Hotz
|
52c2c4df9c
|
fix match of sz 0 + dedup kernel ast [run_process_replay] (#6663)
* fix match of sz 0 [run_process_replay]
* empty graph rewrite to dedup st
|
2024-09-23 11:56:53 +08:00 |
|
chenyu
|
1923932339
|
canonicalize simplex lt (#6658)
(X := a0*x0 + a1*x1 + ...) > 0 is equivalent to x0 + x1 + ... > 0 if xi >= 0 and ai > 0 for ints
|
2024-09-22 23:04:47 -04:00 |
|
wozeparrot
|
46e360fdc0
|
check bfloat16 range with threefry (#6660)
|
2024-09-23 10:48:44 +08:00 |
|
qazal
|
d24e4b1042
|
viz more kernel view work (#6659)
|
2024-09-23 10:48:35 +08:00 |
|
qazal
|
6be1bf09f1
|
hotfix: bring COMPARE_SCHEDULE=0 back (#6657)
|
2024-09-23 10:39:43 +08:00 |
|
George Hotz
|
e945fa9c5c
|
put local on the PtrDtype [run_process_replay] (#6656)
* put local on the PtrDtype [run_process_replay]
* those are local too
|
2024-09-23 10:29:17 +08:00 |
|
chenyu
|
90c1ccc402
|
simpler drop valid check in simplify_valid_image_load (#6653)
* simpler drop valid check in simplify_valid_image_load
* update tests
|
2024-09-22 21:46:39 -04:00 |
|
qazal
|
6b65d8c461
|
more process replay tracing work [run_process_replay] (#6650)
|
2024-09-22 16:16:58 +08:00 |
|
qazal
|
5bafed2f88
|
process replay traceback (#6642)
|
2024-09-21 16:53:34 +08:00 |
|
qazal
|
982086f54c
|
UOps.VALID try 2 (#6623)
* make UOps.VALID compile
* fixable tests
* bufs dedup
* cleanup the CONST spec
* regenerate dataset with graph_rewrite
```py
def rewrite_const(const:UOp, st_src:UOp) -> UOp:
st: ShapeTracker = st_src.arg
return UOp(UOps.VALID, dtypes.bool, (st.to_uop(),)).where(UOp.const(const.dtype, const.arg), UOp.const(const.dtype, 0))
pm = PatternMatcher([(UPat(UOps.CONST, name="const", src=(UPat(UOps.SHAPETRACKER, name="st_src"),)), rewrite_const)])
```
* rm arg
* remove arg
* revert arg removal
This reverts commit 2c35c75c95.
* red test_pickle_define_var
|
2024-09-21 14:19:25 +08:00 |
|
qazal
|
d2351af019
|
fixup non-void SINKs in tests [run_process_replay] (#6624)
|
2024-09-21 13:29:18 +08:00 |
|
qazal
|
391d14438e
|
DEFINE_VAR prereqs for VALID [run_process_replay] (#6637)
|
2024-09-21 13:28:39 +08:00 |
|
nimlgen
|
053c4dee55
|
qcom test for image pitch (#6621)
* qcom test for image pitch
* comment
|
2024-09-20 18:13:48 +08:00 |
|
chenyu
|
acef3e67fa
|
add an example that idx is const and valid cannot be removed (#6625)
very weird
|
2024-09-20 05:46:27 -04:00 |
|
chenyu
|
5707503048
|
x//a<b -> x <a*b for positive a (#6622)
openpilot valids 47 -> 37
|
2024-09-20 04:38:47 -04:00 |
|
chenyu
|
b14c1bc417
|
UOps.RANGE is_increasing (#6615)
* UOps.RANGE is_increasing
283 -> 47 valids
* test
|
2024-09-20 03:14:52 -04:00 |
|
chenyu
|
036c2f5b26
|
validhack use the new style ge for upper bound valid (#6612)
also relaxed the bound check to check vmin/vmax instead just const.
valids 482 -> 283
|
2024-09-19 23:45:42 -04:00 |
|
chenyu
|
a37e92081a
|
fix unrolled arange folding (#6606)
* fix unrolled arange folding
also added flop test to test_arange to make sure it's 0 flop
* skip PTX
|
2024-09-19 09:03:01 -04:00 |
|
George Hotz
|
a1a882b006
|
arange folding with new ge (#6604)
* arange folding with new ge
* bump allowed gated
* bump allowed speed
|
2024-09-19 18:01:28 +08:00 |
|
chenyu
|
d148a62f8d
|
more generic simplify_valid_image_load (#6603)
use graph_rewrite to simplify the expression with narrowed variables, and check boundry conditions on monotonically increasing function to drop valid.
|
2024-09-19 05:33:37 -04:00 |
|
chenyu
|
eeee032b14
|
tiny cleanup of test_image_valid (#6597)
* tiny cleanup of test_image_valid
Sepcial and Variable to setup UOp
* typo
|
2024-09-19 03:09:47 -04:00 |
|
George Hotz
|
012a2c449a
|
fix lt_folding VCONST issue [run_process_replay] (#6424)
* le and ge [run_process_replay]
* bugfix
* fix divides bug
* fix lt_folding issue
|
2024-09-19 14:59:20 +08:00 |
|
qazal
|
309ea63c03
|
include cached replaces in VIZ=1 (#6596)
* pick some work from vizmore branch
* fix the ctx location
* fix that loc
|
2024-09-19 14:48:31 +08:00 |
|
qazal
|
44c18a39a5
|
fix upat .location for the type verifier (#6592)
* fix upat .location for the type verifier
* get the last tinygrad file
|
2024-09-19 14:13:12 +08:00 |
|
chenyu
|
496806ce75
|
another example of openpilot conv with valid (#6595)
|
2024-09-19 01:54:01 -04:00 |
|
chenyu
|
7f9fd556b0
|
_min_max for WHERE (#6564)
prereq to gated load simplification
just for int
|
2024-09-18 23:47:48 -04:00 |
|
chenyu
|
1b6eee02ad
|
failed test case for openpilot validhack conv (#6590)
* failed test case for openpilot validhack conv
can save 2ms once this is fixed
* fix order
|
2024-09-18 23:12:30 -04:00 |
|
George Hotz
|
e015b41ce9
|
remove e( function just alu( [run_process_replay] (#6589)
* remove e( function just alu( [run_process_replay]
* missed two
|
2024-09-19 10:24:02 +08:00 |
|
George Hotz
|
fa0f678d5a
|
use the PatternMatcher to validate UOps type [run_process_replay] (#6583)
* use the PatternMatcher to validate UOps type [run_process_replay]
* type check tests pass
* DEFINE_VAR
* fix precommit
* fix tests
* ptx
* type check tests pass
* ptx test
* int64
* ptx barrier
* delete old stuff
|
2024-09-19 09:59:06 +08:00 |
|
chenyu
|
bd40a26b8b
|
image valid test case that current approach does not work (#6584)
|
2024-09-18 06:06:03 -04:00 |
|
George Hotz
|
d02bb270b7
|
add copyin copyout for image on GPU [run_process_replay] (#6580)
* add copyin copyout for image on GPU [run_process_replay]
* add timing
* enqueue vs total run
* it's failing but that's fine
|
2024-09-18 16:06:20 +08:00 |
|
chenyu
|
162ead02a9
|
remove LOAD where valid is an empty set (#6579)
356 -> 354 valids
|
2024-09-18 03:49:41 -04:00 |
|
George Hotz
|
d4b662c318
|
new openpilot compile (#6573)
* new openpilot compile
* note, copyout doesn't work for images
|
2024-09-18 14:22:50 +08:00 |
|
chenyu
|
c3a70dbf0d
|
20 jitted steps in openpilot benchmark (#6577)
|
2024-09-18 02:15:16 -04:00 |
|
chenyu
|
a72d51e277
|
brute force VALIDHACK matching (#6575)
* brute force VALIDHACK matching
* cleanup
* 9700
|
2024-09-18 01:59:50 -04:00 |
|
qazal
|
d8e5d5c663
|
move VIZ=1 tests to fuzzers (#6574)
|
2024-09-18 12:12:03 +08:00 |
|
George Hotz
|
28e565dc0d
|
prune independent kernels for openpilot [run_process_replay] (#6569)
* prune independent kernels for openpilot [run_process_replay]
* new pruning
* prune first, then memory plan
|
2024-09-17 20:02:38 +08:00 |
|
qazal
|
9295bc0189
|
viz more work [run_process_replay] (#6568)
* infra
* found it
* real work
* bring those back
* cleanup test_viz
* comment that out
|
2024-09-17 19:27:09 +08:00 |
|
qazal
|
455a27dd43
|
start viz unittests (#6550)
* test_viz
* more tests
|
2024-09-17 18:58:23 +08:00 |
|