chenyu
36a1f38049
lazy folding: mul -1 is neg, and neg neg is noop ( #4472 )
2024-05-08 01:52:22 -04:00
chenyu
c508eb7425
revert the removal of CAST_BEFORE_VIEW ( #4471 )
...
this brings most of the memory gain for resnet back.
2024-05-08 00:14:29 -04:00
chenyu
f363f39e83
fix dtype of const folded sum ( #4349 )
...
const folding sum should return in the same dtype the same as regular sum, which can be different from input dtype
2024-04-29 11:40:45 -04:00
George Hotz
ba7314c26b
cleanup lbs ( #4163 )
2024-04-12 22:32:16 -07:00
chenyu
a7c6864260
remove CAST_BEFORE_VIEW ( #4152 )
...
* remove CAST_BEFORE_VIEW
testing perf, also this might have issue with assign?
* remove all
2024-04-13 01:05:08 -04:00
geohotstan
1a1dd1c1a7
add and enable tests for indexing const folding ( #4068 )
...
* enable test in test_indexing
* added tests
* rename stuff
* del a test case cuz it's loadops.copy
2024-04-04 10:46:28 -04:00
chenyu
406cb5fd90
const fold ReduceOps ( #4059 )
2024-04-03 14:39:28 -04:00
chenyu
fe03725b21
const fold cast unrealized_unpadded_const ( #4047 )
...
* const fold unrealized_unpadded_const
changed the underlying arg directly
* CAST_BEFORE_VIEW folds some
* fix const index in getitem
2024-04-03 12:31:24 -04:00
chenyu
f61ed869f5
Use exec_alu for lazy const folding ( #4039 )
2024-04-02 20:52:05 -04:00
chenyu
85edc493b0
uops const fold rules to prevent tautological compare warnings ( #4041 )
...
* uops const fold rules to prevent tautological compare warnings
`bool < false` is false, `true < bool` is false, `a == a` is true, `a != a` is false
* not true for nan
* and nan does not work with llvm
* full truth table test
* revert a==a
* comments and indents
2024-04-02 16:45:58 -04:00
chenyu
82440d3416
don't call contiguous for unpadded const into multi tensor ( #4032 )
...
* don't call contiguous for unpadded const into multi tensor
fixed multi const folding for sharded const.
still wip, need to be careful that this does not break multi device cache somewhere
* ehh need a memory test for that
* simple sharded memory test
2024-04-01 19:22:14 -04:00
chenyu
77a68fc52f
test examples for multi tensor const folding ( #4031 )
...
works with literal const operand now because it's copied to each shard and handled by lazy.
does not work for sharded const
2024-04-01 16:53:43 -04:00
chenyu
379d52548d
const fold left const operand for ADD and MUL ( #4029 )
...
* const fold left const operand for ADD and MUL
* neg have dtype issue
2024-04-01 15:09:04 -04:00
chenyu
0e02d074bd
fix Tensor.pow folding for exponent 0 and 1 ( #4025 )
2024-03-31 19:57:23 -04:00
chenyu
d3f27761b0
move const folding of ADD/SUB/MUL from tensor to lazy ( #4020 )
...
* move const folding of ADD/SUB/MUL from tensor to lazy
will do div and pow separately.
* fix onnx adding with None
2024-03-31 16:35:36 -04:00
chenyu
7f859593b8
fix _to_const_val and const folding around it ( #4017 )
...
* fix _to_const_val and const folding around it
is_unrealized_contiguous_const is too strict and almost never hit if const is expanded.
suffice to check if there's no pad
* that test is folded
* test_const_folding
2024-03-31 13:09:23 -04:00