Sieds Lykles
02208565de
add check ( #10257 )
2025-05-12 11:03:01 -04:00
Kirill R.
4c7c139102
Use cmod/cdiv in sym_infer ( #10258 )
...
* Use cmod/cdiv in sym_infer
* test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-12 09:07:28 -04:00
Sieds Lykles
7c4b381fbf
Extra simplify valid test [pr] ( #10256 )
...
* add test
* Change the range
* add todo test
2025-05-12 07:32:03 -04:00
Sieds Lykles
74e40aafa0
use cdiv in div and mod folding ( #10216 )
...
* use cdiv
* use cdiv and cmod there as well
* Add tests
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-09 12:37:24 -04:00
Sieds Lykles
8da9c070ca
take gcd out of trunc div ( #10238 )
2025-05-09 12:08:10 -04:00
chenyu
9846435c2e
fix test_div_numerator_negative ( #10229 )
...
the simplification was wrong with negative const_factor
2025-05-09 06:19:59 -04:00
chenyu
cba508c8c3
update uop symbolic tests ( #10228 )
...
clean up TODOs and update tests
2025-05-09 01:55:53 -04:00
chenyu
56def6c319
better bound for mod negative number ( #10227 )
2025-05-09 01:19:47 -04:00
chenyu
99f6d89dfb
tighter idiv bound for symbolic denominator ( #10226 )
2025-05-08 22:38:56 -04:00
Sieds Lykles
2891892834
Fold constant variable ( #10196 )
...
* Add rule
* add test and comment
* merge rule
2025-05-07 11:39:44 -07:00
Sieds Lykles
8386527bb9
Take neg out of idiv ( #10164 )
...
* Add rules
* Fix tests
* Move rules lower to prevent recursion
2025-05-07 11:39:08 -07:00
Sieds Lykles
09544d4556
Add rule and test ( #10189 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-07 10:15:55 -04:00
George Hotz
603c03bef2
fix tests for rewrite [pr] ( #10167 )
...
* fix tests for rewrite [pr]
* cleaner
* delete linearize_uop
* clean up the rest
2025-05-05 19:19:49 -07:00
Sieds Lykles
848c7783a4
Sign check in div const div pattern ( #10150 )
...
* Add rule
* Relax the condition
* Add test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-03 18:04:34 -04:00
George Hotz
ef011ff5f9
flip Ops.COPY order [pr] ( #10122 )
...
* flip Ops.COPY order [pr]
* fix copy and support multi device copy in _device
2025-05-01 00:26:24 -04:00
George Hotz
dd0070daab
Revert "flip Ops.COPY order [pr] ( #10120 )" ( #10121 )
...
This reverts commit 984f09ac74 .
2025-04-30 17:25:21 -04:00
George Hotz
984f09ac74
flip Ops.COPY order [pr] ( #10120 )
2025-04-30 16:50:18 -04:00
George Hotz
c3ff308abb
range has only one src now [pr] ( #10100 )
...
* range has only one op now
* fix z3 checker
* ci fix
* needs shell
* try pip ensure update
* that ensurepip is useless
* upgrade pip before cache
* windows happy?
2025-04-29 10:31:05 -04:00
qazal
cbf7347cd6
display viz rewrites with tabbing if they are subrewrites ( #10097 )
...
* display viz rewrites with tabbing if they are subrewrites
* update viz api
2025-04-29 17:57:21 +08:00
Sieds Lykles
dbb7aee02e
Split constant in div with negative x ( #10088 )
...
* add rule
* change test
* lower complexity limit
* remove offset in fold_unrolled_divs
* remove import
* add one more condition
2025-04-28 16:24:14 -04:00
George Hotz
690dac79b5
don't modify the ranges on reduce rewrite ( #10062 )
...
* bug in div range folding
* simpler
* oh, this is right for indexing, but the div mod folding needs to be fixed
* reenable
* Passing test_complexity_w_unroll2 (#10068 )
* Passing
* remove non_folded_divs
* Add check for negative tern in div folding
* Add test
* bump that limit
* fix casted
---------
Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com >
2025-04-28 12:01:19 -04:00
chenyu
4c1ce1a299
don't simplify if div folding resulted in negative numerator ( #10064 )
...
* don't simplify if div folding resulted in negative numerator
* test
2025-04-26 17:01:18 -04:00
George Hotz
2ed3acd767
toposort is a function [pr] ( #10004 )
2025-04-23 16:25:03 +01:00
George Hotz
d1f6701eb7
hotfix: lower amd threshold + improve block reorder test
2025-04-22 20:44:29 +01:00
George Hotz
c1539b0319
putting add first orders loads as expected ( #9991 )
2025-04-22 20:12:05 +01:00
George Hotz
feee6986c9
faster block reorder ( #9990 )
...
* faster block reorder [pr]
* that shouldn't change order
* key just in sorted
* ind
2025-04-22 19:18:57 +01:00
chenyu
9e5e371999
make DISABLE_COMPILER_CACHE a ContextVar [pr] ( #9983 )
2025-04-22 10:32:54 -04:00
George Hotz
c519b553db
non recursive toposort is 2x+ faster ( #9979 )
...
* non recursive toposort is 2x+ faster
* don't change the order
2025-04-22 13:59:38 +01:00
George Hotz
f5dc70c624
microbenchmarks + micro speed ups ( #9972 )
...
* microbenchmarks
* forgot the ubenchs
* clean up type verify
2025-04-22 11:30:46 +01:00
qazal
9a9aba4cd5
setitem tests (some failing) from kernelize ( #9940 )
2025-04-20 18:47:55 +08:00
George Hotz
8919370c76
hotfix: fix test_save_all_dtypes on METAL
2025-04-18 08:42:31 +01:00
Eitan Turok
2c7c205bc5
Fix dtype comparisons in vectorized transcendental + tests ( #9794 )
...
* init test
* cleanup
* init
* update
* fix
* fix python runtime for vectorized code
* awesome helper
* update
* update
* cleanup
* more cleaning
* cleanup more
* fix tests
* more cleaning
* cleanup more
* fix
* even cleaner
* failing tests is sad
* cleanup
* better name
* make tests pass
* remove vec from python runtime
* remove vec from eval_uop
* remove expected failues
* better name
2025-04-16 08:06:12 -04:00
George Hotz
44e4934167
fast pattern matcher [pr] ( #9737 )
...
* FastPatternMatcher
* works without that
* fix test pickle
* strict len
* compile match function
* dynamic compile
* fast
* faster
* compile
* track
* a lot faster
* clean up
* dup or
* faster and simpler
* fast match doesn't support store
* plane
* minor refactor
* real speed
* don't imply return None
* upat
* fix test
* heard you wanted more speed
* no generator
* split cf
* early fixup
* fxn fixup
* reconstruct_function
* Revert "reconstruct_function"
This reverts commit 37dac010ab .
* simpler stuff
* too big
* upat compile error
* cleanups
* don't cache that
* cleanups
* 10 -> 15
2025-04-14 15:24:41 +01:00
chenyu
e0ec8be37d
use CPU for test_schedule_ring ( #9843 )
...
* use CPU for test_schedule_ring
* why pre-commit is good
2025-04-10 23:20:53 -04:00
qazal
16956b79de
canonicalize Device.DEFAULT ( #9835 )
2025-04-10 23:02:11 +08:00
George Hotz
f666dd14eb
fix get reduce contraction with test ( #9834 )
2025-04-10 22:24:21 +08:00
George Hotz
53f0b2aad7
fix infinite loop in flash attention ( #9827 )
...
* fix infinite loop in flash attention
* get_contraction_with_reduce
* skip that test
* SINGLE_KERNEL_SOFTMAX + fix multi
* default IGNORE_OOB
* print change
2025-04-10 20:06:44 +08:00
qazal
498a2bf738
add err handling tests to viz + cleanups ( #9825 )
...
* cleanup
* add err handling tests to viz + cleanups
* lint
2025-04-10 14:05:05 +08:00
qazal
3bd992dc95
multi stage graph_rewrite_map ( #9803 )
...
* multistage graph_rewrite_map
* s/merge_map/input_map
* build up kernel_map from the tensor_map
2025-04-09 15:59:45 +08:00
Eitan Turok
bb7922b95f
Vectorize Transcendental Regression Tests ( #9753 )
...
* init test
* cleanup
2025-04-08 01:27:39 +08:00
chenyu
407ca54382
symbolic fold double where ( #9436 )
...
* symbolic fold double where
a.where(b.where(c, d), d) -> (a & b).where(c, d). a pattern in optimizer
* test case
2025-04-05 05:12:17 -04:00
Sieds Lykles
9c2fc695b5
cond.logical_not().where(a,b) -> cond.where(b,a) ( #9741 )
...
* Add rule for negation in where, simplifies arange patterns
* 0 becomes 0.0 again
* Only if cond is bool
* ne is never None
* Add a test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-04-04 19:13:32 -04:00
George Hotz
8b5a523743
fix minimum length in pattern matcher ( #9736 )
2025-04-04 14:57:01 +08:00
George Hotz
cac8bcf8b5
use Ops.REDUCE ( #9721 )
...
* decrease bert python time [pr]
* order copies
* Revert "order copies"
This reverts commit 3f62c8693b .
* rewrite count
* Ops.REDUCE
* acc first in the add chain
* Fix tensor core acc
* arange patterns look good
* fix multireduce gate
* reduce rewrite rule
* bump that to 15 minutes
* multiwmma isn't fusing
* gep through wmma is gep pushing
* bump that timeout too, it's all env setup
* add failing test
2025-04-04 10:14:34 +08:00
chenyu
c20f112e9f
example test use z3 to verify valid simplification ( #9684 )
2025-04-02 01:05:52 -04:00
chenyu
c672716b38
improve vmin/vmax for IDIV ( #9678 )
2025-04-01 23:16:01 -04:00
chenyu
8dd88ad476
don't div_and_mod_folding for negative numerator with remainder ( #9674 )
...
can be wrong in C div since it truncates towards zero
2025-04-01 16:26:23 -04:00
chenyu
0e34f9082e
helper functions for cstyle div mod [pr] ( #9673 )
2025-04-01 08:06:56 -04:00
chenyu
5358b0904b
update uop_given_valid if a node becomes const ( #9604 )
...
* update uop_given_valid if a node becomes const
* cleanup
2025-03-27 14:57:46 -04:00
qazal
bf94924d5a
fix viz with nested graph_rewrite ( #9595 )
2025-03-27 13:14:28 +08:00