Commit Graph

2537 Commits

Author SHA1 Message Date
chenyu
eeee032b14 tiny cleanup of test_image_valid (#6597)
* tiny cleanup of test_image_valid

Sepcial and Variable to setup UOp

* typo
2024-09-19 03:09:47 -04:00
George Hotz
012a2c449a fix lt_folding VCONST issue [run_process_replay] (#6424)
* le and ge [run_process_replay]

* bugfix

* fix divides bug

* fix lt_folding issue
2024-09-19 14:59:20 +08:00
qazal
309ea63c03 include cached replaces in VIZ=1 (#6596)
* pick some work from vizmore branch

* fix the ctx location

* fix that loc
2024-09-19 14:48:31 +08:00
qazal
44c18a39a5 fix upat .location for the type verifier (#6592)
* fix upat .location for the type verifier

* get the last tinygrad file
2024-09-19 14:13:12 +08:00
chenyu
496806ce75 another example of openpilot conv with valid (#6595) 2024-09-19 01:54:01 -04:00
chenyu
7f9fd556b0 _min_max for WHERE (#6564)
prereq to gated load simplification

just for int
2024-09-18 23:47:48 -04:00
chenyu
1b6eee02ad failed test case for openpilot validhack conv (#6590)
* failed test case for openpilot validhack conv

can save 2ms once this is fixed

* fix order
2024-09-18 23:12:30 -04:00
George Hotz
e015b41ce9 remove e( function just alu( [run_process_replay] (#6589)
* remove e( function just alu( [run_process_replay]

* missed two
2024-09-19 10:24:02 +08:00
George Hotz
fa0f678d5a use the PatternMatcher to validate UOps type [run_process_replay] (#6583)
* use the PatternMatcher to validate UOps type [run_process_replay]

* type check tests pass

* DEFINE_VAR

* fix precommit

* fix tests

* ptx

* type check tests pass

* ptx test

* int64

* ptx barrier

* delete old stuff
2024-09-19 09:59:06 +08:00
chenyu
bd40a26b8b image valid test case that current approach does not work (#6584) 2024-09-18 06:06:03 -04:00
George Hotz
d02bb270b7 add copyin copyout for image on GPU [run_process_replay] (#6580)
* add copyin copyout for image on GPU [run_process_replay]

* add timing

* enqueue vs total run

* it's failing but that's fine
2024-09-18 16:06:20 +08:00
chenyu
162ead02a9 remove LOAD where valid is an empty set (#6579)
356 -> 354 valids
2024-09-18 03:49:41 -04:00
George Hotz
d4b662c318 new openpilot compile (#6573)
* new openpilot compile

* note, copyout doesn't work for images
2024-09-18 14:22:50 +08:00
chenyu
c3a70dbf0d 20 jitted steps in openpilot benchmark (#6577) 2024-09-18 02:15:16 -04:00
chenyu
a72d51e277 brute force VALIDHACK matching (#6575)
* brute force VALIDHACK matching

* cleanup

* 9700
2024-09-18 01:59:50 -04:00
qazal
d8e5d5c663 move VIZ=1 tests to fuzzers (#6574) 2024-09-18 12:12:03 +08:00
George Hotz
28e565dc0d prune independent kernels for openpilot [run_process_replay] (#6569)
* prune independent kernels for openpilot [run_process_replay]

* new pruning

* prune first, then memory plan
2024-09-17 20:02:38 +08:00
qazal
9295bc0189 viz more work [run_process_replay] (#6568)
* infra

* found it

* real work

* bring those back

* cleanup test_viz

* comment that out
2024-09-17 19:27:09 +08:00
qazal
455a27dd43 start viz unittests (#6550)
* test_viz

* more tests
2024-09-17 18:58:23 +08:00
George Hotz
67a03e72bb remove expr_idxs [run_process_replay] (#6567)
* remove expr_idxs [run_process_replay]

* goodbye that test
2024-09-17 18:34:51 +08:00
chenyu
b947db3de1 don't fold mul mod for common factor (#6566)
it makes valid pattern more annoying
2024-09-17 06:01:27 -04:00
Gaétan Lepage
f214bb140d test: relax tolerance of test_broadcastdot (#6560) 2024-09-17 03:26:39 -04:00
chenyu
5fb877c78c generic valid match criteria of #6552 (#6558)
455 -> 364 valids.
generalize `idx < image bound` to `idx < image bound + c` for some `c`
2024-09-17 02:40:36 -04:00
George Hotz
0ab06d5840 push geps through wmma (#6559)
* push geps through wmma

* update tests
2024-09-17 14:38:40 +08:00
George Hotz
ffce3ed896 add some new rules (#6555)
* add some new rules

* fix that

* non controversial
2024-09-17 13:59:55 +08:00
chenyu
c62b6fd8f0 match any statement in valid for simplification (#6554) 2024-09-17 01:39:47 -04:00
George Hotz
a2239c812e minimum new style expand (#6534)
* minimum new style expand [run_process_replay]

* float4 folding works

* fix uop graph

* if means or

* dype.count idx overload

* fix test arange

* expand nope

* fix expand contract

* fix amd tensor core

* oh, that's a good test with a real failure

* remove prints

* early reduce

* tomorrow, we remove sorted on expand args

* fix wmma issue

* that makes test_arange pass

* vectorized folding

* no check

* broadcast

* fix clang with self assign rule
2024-09-17 13:02:41 +08:00
kormann
f5dd25d376 enable whisper batch for long sequences (#6458)
* long batch +test

* long batch +test

* cleanup

* rollback syntactic changes

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-17 00:42:10 -04:00
chenyu
7c942418a1 other side of simple out of bound valid case (#6552)
462 -> 455
2024-09-16 23:57:15 -04:00
chenyu
aeaf7894a7 more generic version of #6548 (#6549)
x*(-1)<0 can be generalized to x*(-1)<c, 473 -> 462 valids
2024-09-16 23:17:16 -04:00
chenyu
596f41eb46 simple drop image valid case (#6548)
* simple drop image valid case

started unit test, 530 -> 473 valids

* cleanup
2024-09-16 22:54:07 -04:00
George Hotz
42ba887daa remove logic to vectorize reduces (#6536)
* remove logic to vectorize reduces

* fix tests
2024-09-16 14:04:48 +08:00
qazal
607113fcdf fix vectorized dtype repr [run_process_replay] (#6535) 2024-09-16 13:42:55 +08:00
qazal
9b9b83b8b0 viz tests (#6532)
* vizz fuzz tests

* caching

* print timings

* hotfix: update currentRewrite onClick

* import from typing

* indent into __main__
2024-09-16 13:08:42 +08:00
George Hotz
07bd6e070d add more uops tests for vmin/vmax/const_factor/divides (#6533) 2024-09-16 13:06:31 +08:00
ignaciosica
c447ec2190 Fix amx shape [run_process_replay] (#6524)
* fix amx shape (sz,sz,sz) -> (sz,sz,1)

* revert check
2024-09-16 09:49:55 +08:00
chenyu
1683b274b6 main example we want the valid removed (#6527)
* main example we want the valid removed

* ast lines are long
2024-09-15 21:49:10 -04:00
George Hotz
21835fc08c more graph rewrite tests (#6521) 2024-09-16 09:20:54 +08:00
chenyu
b2c286f567 fix typing for test_ops (#6520)
mostly passed TYPED=1 python3 -m pytest -n=auto test/test_ops.py.

one last test specifically set an invalid value to test the exception, and to ignore that we need to import typeguard. And to get a working version of typeguard, we would need to get rid of dependency on tensorflow_addons because it requires a very old version of typeguard
2024-09-15 06:18:36 -04:00
George Hotz
cd90092f14 graph rewrite tests (#6519)
* more graph rewrite tests

* more complex test cases

* more tests

* more tests

* cleanups

* 9600 lines

* cleanups
2024-09-15 17:29:16 +08:00
qazal
4ffb722d4e var_vals prereq for deleting LBScheduleItem [run_process_replay] (#6511) 2024-09-14 17:00:30 +08:00
George Hotz
a532d59bbd gep tuple [run_process_replay] (#6495)
* gep tuple [run_process_replay]

* no inf loop, that goes in expander

* fix ops python

* unbreak gep 0

* fix tests

* fix tests

* VECTORIZE/GEP

* oops, broken
2024-09-12 16:37:31 +08:00
George Hotz
6dfa63cb21 more vconst stuff + gep tuple [run_process_replay] (#6494)
* more vconst stuff [run_process_replay]

* revert that

* fix inf loop
2024-09-12 14:58:14 +08:00
George Hotz
119b0ea4af add UOps.VCONST [run_process_replay] (#6487)
* add UOps.VCONST [run_process_replay]

* VCONST folding

* simpler devectorize

* alu

* revert that type
2024-09-12 14:03:39 +08:00
George Hotz
76487a3533 remove nop, use upat [run_process_replay] (#6489)
* remove nop, use upat [run_process_replay]

* mypy passes

* no wonder nothing worked

* fixes
2024-09-12 12:16:19 +08:00
George Hotz
bdd0c06f29 add void type to uop (#6471)
* unwrap_dtype maybe

* uopgraph stuff that hardcoded None

* test_ops passes

* dtypes.py fixups

* update test_linearizer and friends

* more ast updates

* test_beam and test_schedule too

* add void type to uop [run_process_replay]

* remove dumb casts

* start making it green

* more cast cleanups

* more cls methods to fix

* regenerate dataset

* split UOp and NOp const

* maybe that too

* fix docs

* update test_uop_symbolic

* test_verify_ast

* new sops with no diff

* meh, type_ignore is alright

* remove that assert

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-11 18:16:28 +08:00
George Hotz
1b4d1823b7 add pyint to DTYPES_DICT [run_process_replay] (#6477)
* add pyint to DTYPES_DICT [run_process_replay]

* also fix uop alu bug

* exclude pyint there too

* ne ne

* force explicit dtype
2024-09-11 17:31:59 +08:00
qazal
3cde1503ce enable graph rewrite in the scheduler (#6249)
* test: enable

* skip those

* skip pads tests
2024-09-11 14:30:04 +08:00
chenyu
d9d1ae7248 more lt folding using gcd (#6469) 2024-09-11 02:09:35 -04:00
qazal
262569a3eb green conv bw AST_REWRITE=1 (#6466)
* green conv bw AST_REWRITE=1

* new strides and dtype fix
2024-09-11 10:51:24 +08:00