George Hotz
cd534dee11
cstyle changes that don't pass process replay ( #6734 )
...
* cstyle changes that don't pass process replay
* add constant folder back there
* cleanups
* const
* fix some tests
* bfloat16 too
* complete set of types
* that cast shouldn't be needed
* that was a questionable test
2024-09-25 17:33:34 +08:00
George Hotz
cb22ef379a
truncate consts early ( #6741 )
...
* truncate consts early
* ptx still fails
* Update dtype.py
2024-09-25 16:49:51 +08:00
George Hotz
882339f729
remove parens from neg ( #6738 )
2024-09-25 15:38:20 +08:00
qazal
5ad2f95d01
process replay diff stats ( #6736 )
...
* process replay diff stats
* fix tuples
2024-09-25 15:19:56 +08:00
chenyu
ff25bfb1b0
conv backward tests in test_simplify_valid_idx ( #6727 )
...
the backward idx is pretty ugly now
2024-09-25 02:51:07 -04:00
chenyu
e6a1b5aa8f
more test_simplify_valid_idx cleanup ( #6726 )
...
moved UOps.VECTORIZE of idx into the helper
2024-09-24 23:47:42 -04:00
chenyu
14524eeddc
test_image_valid.py -> test_simplify_valid_idx.py ( #6724 )
...
restructure the tests, will use the same file for non-image tests
2024-09-24 23:32:27 -04:00
qazal
e0d8685c99
test_masked_upcast_wino check device buf_max ( #6723 )
2024-09-25 11:26:53 +08:00
ttomsa
76bd4c7d5f
advanced setitem ( #6262 )
...
* advanced setitem draft
* add setitem tests
* fix for tests
* small change
* handle repeated indices with test
* fix v broadcasting to mask
* clean up a bit
* open more tests
* clean up, fixes issue with scalar tensor index
* fix
* fix index_put_ and linter
* add type annotation
* done
* remove non contiguous hack
* woops linter
* name fix
* add back type notation
* more type notation
* final
* linter
* check lazydata not shared
* no numpy
* no numpy
* rename
* index benchmark
* linter
* no cloning time
* rm benchmark
* new function
* rm contiguous and cast early
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-09-24 22:14:59 -04:00
qazal
3bf25aae78
start work on global buffer count limit [run_process_replay] ( #6722 )
...
* add a bufs_max option
* simple spec
2024-09-25 09:51:56 +08:00
qazal
cefc3e9382
make all schedules immutable [run_process_replay] ( #6718 )
...
* compute inputs and outputs in LBScheduleItem [run_process_replay]
* simpler metadata, delete __hash__
* no dynamic field
* test_diff_schedule
2024-09-24 21:08:16 +08:00
qazal
29330014ab
give FUZZ_SCHEDULE views a base ( #6717 )
...
* memoryview to bytes
* give FUZZ_SCHEDULE views a base
2024-09-24 19:20:37 +08:00
nimlgen
f0019ad29c
bump ci test timeout for test_speed_exec_time ( #6715 )
...
* bump ci test timeout for test_speed_exec_time
* more
2024-09-24 18:44:09 +08:00
chenyu
8d75326cb5
do not fold var with min==max ( #6713 )
...
not really used, want it to keep as a var for valid simplification
[run_process_replay]
2024-09-24 06:16:34 -04:00
chenyu
9e51879019
fix idx setup in image_valid test_openpilot_conv3 ( #6710 )
...
* fix idx setup in image_valid test_openpilot_conv3
* corrected output and sad
2024-09-24 05:49:04 -04:00
wozeparrot
2be0b26a1f
rand only supports single device ( #6682 )
2024-09-24 16:07:44 +08:00
chenyu
4bb1694f49
more tests about bounds of UOp divs ( #6700 )
2024-09-24 00:41:43 -04:00
chenyu
79aef64d70
update tests in test_image_valid ( #6698 )
2024-09-24 00:04:21 -04:00
George Hotz
7c38121280
load penalty ( #6681 )
...
* bias/bn loads after loops
* load penalty in fix_priority
* more generic test
2024-09-23 18:12:12 +08:00
George Hotz
431ffc4254
hotfix: delete float16 failing
2024-09-23 17:42:57 +08:00
chenyu
f55459c98e
failed validhack test for a 0.9.7 conv ( #6677 )
2024-09-23 04:43:47 -04:00
chenyu
0362dbbbe8
relax idx simplification given valid ( #6669 )
...
apply to kernels in op 0.9.7.
if a valid has a complicated expr, we cannot drop valid but it's possible to simplify idx given valid
2024-09-23 03:04:57 -04:00
chenyu
26ebb7cab4
don't use div_folding in lt_folding ( #6666 )
...
* don't use div_folding in lt_folding
valids 35 -> 13
* fails the same as before
2024-09-23 01:50:18 -04:00
chenyu
da5b741656
removed valid in openpilot conv ( #6619 )
...
35 valids left
2024-09-23 00:30:18 -04:00
George Hotz
52c2c4df9c
fix match of sz 0 + dedup kernel ast [run_process_replay] ( #6663 )
...
* fix match of sz 0 [run_process_replay]
* empty graph rewrite to dedup st
2024-09-23 11:56:53 +08:00
chenyu
1923932339
canonicalize simplex lt ( #6658 )
...
(X := a0*x0 + a1*x1 + ...) > 0 is equivalent to x0 + x1 + ... > 0 if xi >= 0 and ai > 0 for ints
2024-09-22 23:04:47 -04:00
wozeparrot
46e360fdc0
check bfloat16 range with threefry ( #6660 )
2024-09-23 10:48:44 +08:00
qazal
d24e4b1042
viz more kernel view work ( #6659 )
2024-09-23 10:48:35 +08:00
qazal
6be1bf09f1
hotfix: bring COMPARE_SCHEDULE=0 back ( #6657 )
2024-09-23 10:39:43 +08:00
George Hotz
e945fa9c5c
put local on the PtrDtype [run_process_replay] ( #6656 )
...
* put local on the PtrDtype [run_process_replay]
* those are local too
2024-09-23 10:29:17 +08:00
chenyu
90c1ccc402
simpler drop valid check in simplify_valid_image_load ( #6653 )
...
* simpler drop valid check in simplify_valid_image_load
* update tests
2024-09-22 21:46:39 -04:00
qazal
6b65d8c461
more process replay tracing work [run_process_replay] ( #6650 )
2024-09-22 16:16:58 +08:00
qazal
5bafed2f88
process replay traceback ( #6642 )
2024-09-21 16:53:34 +08:00
qazal
982086f54c
UOps.VALID try 2 ( #6623 )
...
* make UOps.VALID compile
* fixable tests
* bufs dedup
* cleanup the CONST spec
* regenerate dataset with graph_rewrite
```py
def rewrite_const(const:UOp, st_src:UOp) -> UOp:
st: ShapeTracker = st_src.arg
return UOp(UOps.VALID, dtypes.bool, (st.to_uop(),)).where(UOp.const(const.dtype, const.arg), UOp.const(const.dtype, 0))
pm = PatternMatcher([(UPat(UOps.CONST, name="const", src=(UPat(UOps.SHAPETRACKER, name="st_src"),)), rewrite_const)])
```
* rm arg
* remove arg
* revert arg removal
This reverts commit 2c35c75c95 .
* red test_pickle_define_var
2024-09-21 14:19:25 +08:00
qazal
d2351af019
fixup non-void SINKs in tests [run_process_replay] ( #6624 )
2024-09-21 13:29:18 +08:00
qazal
391d14438e
DEFINE_VAR prereqs for VALID [run_process_replay] ( #6637 )
2024-09-21 13:28:39 +08:00
nimlgen
053c4dee55
qcom test for image pitch ( #6621 )
...
* qcom test for image pitch
* comment
2024-09-20 18:13:48 +08:00
chenyu
acef3e67fa
add an example that idx is const and valid cannot be removed ( #6625 )
...
very weird
2024-09-20 05:46:27 -04:00
chenyu
5707503048
x//a<b -> x <a*b for positive a ( #6622 )
...
openpilot valids 47 -> 37
2024-09-20 04:38:47 -04:00
chenyu
b14c1bc417
UOps.RANGE is_increasing ( #6615 )
...
* UOps.RANGE is_increasing
283 -> 47 valids
* test
2024-09-20 03:14:52 -04:00
chenyu
036c2f5b26
validhack use the new style ge for upper bound valid ( #6612 )
...
also relaxed the bound check to check vmin/vmax instead just const.
valids 482 -> 283
2024-09-19 23:45:42 -04:00
chenyu
a37e92081a
fix unrolled arange folding ( #6606 )
...
* fix unrolled arange folding
also added flop test to test_arange to make sure it's 0 flop
* skip PTX
2024-09-19 09:03:01 -04:00
George Hotz
a1a882b006
arange folding with new ge ( #6604 )
...
* arange folding with new ge
* bump allowed gated
* bump allowed speed
2024-09-19 18:01:28 +08:00
chenyu
d148a62f8d
more generic simplify_valid_image_load ( #6603 )
...
use graph_rewrite to simplify the expression with narrowed variables, and check boundry conditions on monotonically increasing function to drop valid.
2024-09-19 05:33:37 -04:00
chenyu
eeee032b14
tiny cleanup of test_image_valid ( #6597 )
...
* tiny cleanup of test_image_valid
Sepcial and Variable to setup UOp
* typo
2024-09-19 03:09:47 -04:00
George Hotz
012a2c449a
fix lt_folding VCONST issue [run_process_replay] ( #6424 )
...
* le and ge [run_process_replay]
* bugfix
* fix divides bug
* fix lt_folding issue
2024-09-19 14:59:20 +08:00
qazal
309ea63c03
include cached replaces in VIZ=1 ( #6596 )
...
* pick some work from vizmore branch
* fix the ctx location
* fix that loc
2024-09-19 14:48:31 +08:00
qazal
44c18a39a5
fix upat .location for the type verifier ( #6592 )
...
* fix upat .location for the type verifier
* get the last tinygrad file
2024-09-19 14:13:12 +08:00
chenyu
496806ce75
another example of openpilot conv with valid ( #6595 )
2024-09-19 01:54:01 -04:00
chenyu
7f9fd556b0
_min_max for WHERE ( #6564 )
...
prereq to gated load simplification
just for int
2024-09-18 23:47:48 -04:00