wozeparrot
46e360fdc0
check bfloat16 range with threefry ( #6660 )
2024-09-23 10:48:44 +08:00
qazal
d24e4b1042
viz more kernel view work ( #6659 )
2024-09-23 10:48:35 +08:00
qazal
6be1bf09f1
hotfix: bring COMPARE_SCHEDULE=0 back ( #6657 )
2024-09-23 10:39:43 +08:00
George Hotz
e945fa9c5c
put local on the PtrDtype [run_process_replay] ( #6656 )
...
* put local on the PtrDtype [run_process_replay]
* those are local too
2024-09-23 10:29:17 +08:00
chenyu
90c1ccc402
simpler drop valid check in simplify_valid_image_load ( #6653 )
...
* simpler drop valid check in simplify_valid_image_load
* update tests
2024-09-22 21:46:39 -04:00
qazal
6b65d8c461
more process replay tracing work [run_process_replay] ( #6650 )
2024-09-22 16:16:58 +08:00
qazal
5bafed2f88
process replay traceback ( #6642 )
2024-09-21 16:53:34 +08:00
qazal
982086f54c
UOps.VALID try 2 ( #6623 )
...
* make UOps.VALID compile
* fixable tests
* bufs dedup
* cleanup the CONST spec
* regenerate dataset with graph_rewrite
```py
def rewrite_const(const:UOp, st_src:UOp) -> UOp:
st: ShapeTracker = st_src.arg
return UOp(UOps.VALID, dtypes.bool, (st.to_uop(),)).where(UOp.const(const.dtype, const.arg), UOp.const(const.dtype, 0))
pm = PatternMatcher([(UPat(UOps.CONST, name="const", src=(UPat(UOps.SHAPETRACKER, name="st_src"),)), rewrite_const)])
```
* rm arg
* remove arg
* revert arg removal
This reverts commit 2c35c75c95 .
* red test_pickle_define_var
2024-09-21 14:19:25 +08:00
qazal
d2351af019
fixup non-void SINKs in tests [run_process_replay] ( #6624 )
2024-09-21 13:29:18 +08:00
qazal
391d14438e
DEFINE_VAR prereqs for VALID [run_process_replay] ( #6637 )
2024-09-21 13:28:39 +08:00
nimlgen
053c4dee55
qcom test for image pitch ( #6621 )
...
* qcom test for image pitch
* comment
2024-09-20 18:13:48 +08:00
chenyu
acef3e67fa
add an example that idx is const and valid cannot be removed ( #6625 )
...
very weird
2024-09-20 05:46:27 -04:00
chenyu
5707503048
x//a<b -> x <a*b for positive a ( #6622 )
...
openpilot valids 47 -> 37
2024-09-20 04:38:47 -04:00
chenyu
b14c1bc417
UOps.RANGE is_increasing ( #6615 )
...
* UOps.RANGE is_increasing
283 -> 47 valids
* test
2024-09-20 03:14:52 -04:00
chenyu
036c2f5b26
validhack use the new style ge for upper bound valid ( #6612 )
...
also relaxed the bound check to check vmin/vmax instead just const.
valids 482 -> 283
2024-09-19 23:45:42 -04:00
chenyu
a37e92081a
fix unrolled arange folding ( #6606 )
...
* fix unrolled arange folding
also added flop test to test_arange to make sure it's 0 flop
* skip PTX
2024-09-19 09:03:01 -04:00
George Hotz
a1a882b006
arange folding with new ge ( #6604 )
...
* arange folding with new ge
* bump allowed gated
* bump allowed speed
2024-09-19 18:01:28 +08:00
chenyu
d148a62f8d
more generic simplify_valid_image_load ( #6603 )
...
use graph_rewrite to simplify the expression with narrowed variables, and check boundry conditions on monotonically increasing function to drop valid.
2024-09-19 05:33:37 -04:00
chenyu
eeee032b14
tiny cleanup of test_image_valid ( #6597 )
...
* tiny cleanup of test_image_valid
Sepcial and Variable to setup UOp
* typo
2024-09-19 03:09:47 -04:00
George Hotz
012a2c449a
fix lt_folding VCONST issue [run_process_replay] ( #6424 )
...
* le and ge [run_process_replay]
* bugfix
* fix divides bug
* fix lt_folding issue
2024-09-19 14:59:20 +08:00
qazal
309ea63c03
include cached replaces in VIZ=1 ( #6596 )
...
* pick some work from vizmore branch
* fix the ctx location
* fix that loc
2024-09-19 14:48:31 +08:00
qazal
44c18a39a5
fix upat .location for the type verifier ( #6592 )
...
* fix upat .location for the type verifier
* get the last tinygrad file
2024-09-19 14:13:12 +08:00
chenyu
496806ce75
another example of openpilot conv with valid ( #6595 )
2024-09-19 01:54:01 -04:00
chenyu
7f9fd556b0
_min_max for WHERE ( #6564 )
...
prereq to gated load simplification
just for int
2024-09-18 23:47:48 -04:00
chenyu
1b6eee02ad
failed test case for openpilot validhack conv ( #6590 )
...
* failed test case for openpilot validhack conv
can save 2ms once this is fixed
* fix order
2024-09-18 23:12:30 -04:00
George Hotz
e015b41ce9
remove e( function just alu( [run_process_replay] ( #6589 )
...
* remove e( function just alu( [run_process_replay]
* missed two
2024-09-19 10:24:02 +08:00
George Hotz
fa0f678d5a
use the PatternMatcher to validate UOps type [run_process_replay] ( #6583 )
...
* use the PatternMatcher to validate UOps type [run_process_replay]
* type check tests pass
* DEFINE_VAR
* fix precommit
* fix tests
* ptx
* type check tests pass
* ptx test
* int64
* ptx barrier
* delete old stuff
2024-09-19 09:59:06 +08:00
chenyu
bd40a26b8b
image valid test case that current approach does not work ( #6584 )
2024-09-18 06:06:03 -04:00
George Hotz
d02bb270b7
add copyin copyout for image on GPU [run_process_replay] ( #6580 )
...
* add copyin copyout for image on GPU [run_process_replay]
* add timing
* enqueue vs total run
* it's failing but that's fine
2024-09-18 16:06:20 +08:00
chenyu
162ead02a9
remove LOAD where valid is an empty set ( #6579 )
...
356 -> 354 valids
2024-09-18 03:49:41 -04:00
George Hotz
d4b662c318
new openpilot compile ( #6573 )
...
* new openpilot compile
* note, copyout doesn't work for images
2024-09-18 14:22:50 +08:00
chenyu
c3a70dbf0d
20 jitted steps in openpilot benchmark ( #6577 )
2024-09-18 02:15:16 -04:00
chenyu
a72d51e277
brute force VALIDHACK matching ( #6575 )
...
* brute force VALIDHACK matching
* cleanup
* 9700
2024-09-18 01:59:50 -04:00
qazal
d8e5d5c663
move VIZ=1 tests to fuzzers ( #6574 )
2024-09-18 12:12:03 +08:00
George Hotz
28e565dc0d
prune independent kernels for openpilot [run_process_replay] ( #6569 )
...
* prune independent kernels for openpilot [run_process_replay]
* new pruning
* prune first, then memory plan
2024-09-17 20:02:38 +08:00
qazal
9295bc0189
viz more work [run_process_replay] ( #6568 )
...
* infra
* found it
* real work
* bring those back
* cleanup test_viz
* comment that out
2024-09-17 19:27:09 +08:00
qazal
455a27dd43
start viz unittests ( #6550 )
...
* test_viz
* more tests
2024-09-17 18:58:23 +08:00
George Hotz
67a03e72bb
remove expr_idxs [run_process_replay] ( #6567 )
...
* remove expr_idxs [run_process_replay]
* goodbye that test
2024-09-17 18:34:51 +08:00
chenyu
b947db3de1
don't fold mul mod for common factor ( #6566 )
...
it makes valid pattern more annoying
2024-09-17 06:01:27 -04:00
Gaétan Lepage
f214bb140d
test: relax tolerance of test_broadcastdot ( #6560 )
2024-09-17 03:26:39 -04:00
chenyu
5fb877c78c
generic valid match criteria of #6552 ( #6558 )
...
455 -> 364 valids.
generalize `idx < image bound` to `idx < image bound + c` for some `c`
2024-09-17 02:40:36 -04:00
George Hotz
0ab06d5840
push geps through wmma ( #6559 )
...
* push geps through wmma
* update tests
2024-09-17 14:38:40 +08:00
George Hotz
ffce3ed896
add some new rules ( #6555 )
...
* add some new rules
* fix that
* non controversial
2024-09-17 13:59:55 +08:00
chenyu
c62b6fd8f0
match any statement in valid for simplification ( #6554 )
2024-09-17 01:39:47 -04:00
George Hotz
a2239c812e
minimum new style expand ( #6534 )
...
* minimum new style expand [run_process_replay]
* float4 folding works
* fix uop graph
* if means or
* dype.count idx overload
* fix test arange
* expand nope
* fix expand contract
* fix amd tensor core
* oh, that's a good test with a real failure
* remove prints
* early reduce
* tomorrow, we remove sorted on expand args
* fix wmma issue
* that makes test_arange pass
* vectorized folding
* no check
* broadcast
* fix clang with self assign rule
2024-09-17 13:02:41 +08:00
kormann
f5dd25d376
enable whisper batch for long sequences ( #6458 )
...
* long batch +test
* long batch +test
* cleanup
* rollback syntactic changes
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-09-17 00:42:10 -04:00
chenyu
7c942418a1
other side of simple out of bound valid case ( #6552 )
...
462 -> 455
2024-09-16 23:57:15 -04:00
chenyu
aeaf7894a7
more generic version of #6548 ( #6549 )
...
x*(-1)<0 can be generalized to x*(-1)<c, 473 -> 462 valids
2024-09-16 23:17:16 -04:00
chenyu
596f41eb46
simple drop image valid case ( #6548 )
...
* simple drop image valid case
started unit test, 530 -> 473 valids
* cleanup
2024-09-16 22:54:07 -04:00
George Hotz
42ba887daa
remove logic to vectorize reduces ( #6536 )
...
* remove logic to vectorize reduces
* fix tests
2024-09-16 14:04:48 +08:00