chenyu
6fd24561d1
distribute MUL const into ADD for int ( #6361 )
...
pre-req for real_stride
2024-09-05 01:36:57 -04:00
chenyu
a666450e4d
UOp pattern x + x -> x * 2 ( #6224 )
...
* UOp pattern x + x -> x * 2
now there's no NEG, with this it covers all kinds of a*x+b*x
* can remove x-x
2024-08-21 12:06:19 -04:00
chenyu
c9a9631818
no UnaryOps.NEG in generated UOp patterns ( #6209 )
...
* no UnaryOps.NEG in generated UOp patterns
removed pattern `x * (-1) -> -x` and `x != True`
* those are fine because NEG became CMPNE and True
* fix sd validation L2 norm
2024-08-21 11:08:22 -04:00
George Hotz
16f420f7a7
split full_graph_rewrite and linearize_uop [run_process_replay] ( #6215 )
...
* split full_graph_rewrite and linearize_uop
* fix tests
* graph rewrite in test uops
* add types
2024-08-20 20:12:33 -07:00
Max-We
53b20afa3f
Write tar_extract ( #6180 )
...
* Add tar_extract
* Add tar_extract tests
* Fix dtype for initialization from path
* Tests for path initialization
* rm print
---------
Co-authored-by: Maximilian Weichart <maximilian.weichart@icloud.com >
2024-08-19 12:06:17 -07:00
Eitan Turok
8556d0c642
Support gunzip in fetch ( #6176 )
...
* init
* update
* clean
* add type
* clean
* fix import order
* shorten variable names
2024-08-19 12:04:40 -07:00
chenyu
00578a021b
re:6125 switch real_size to use uops [run_process_replay] ( #6138 )
...
* switch real_size to use uops [run_process_replay]
* enough to pass
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2024-08-19 13:20:24 -04:00
qazal
be6dda4093
hotfix: more lazyop rename to uop [run_process_replay] ( #6157 )
2024-08-18 17:28:44 +03:00
chenyu
f7950fc2b6
add E275 missing-whitespace-after-keyword linting rule ( #6149 )
...
requires space after keywords like `assert`, `not`, `return`, `else`
2024-08-17 16:44:34 -04:00
George Hotz
88edc2902d
axis_is_masked with graph_rewrite [run_process_replay] ( #6144 )
2024-08-17 10:28:49 -07:00
qazal
d9ce664350
add test_verify_ast [run_process_replay] ( #6134 )
2024-08-17 14:14:30 +03:00
George Hotz
d9cb45af09
only axis is masked [run_process_replay] ( #6123 )
2024-08-16 21:01:17 -07:00
George Hotz
94aa5f11b5
Revert "use vmax for real_size [run_process_replay] ( #6120 )" ( #6122 )
...
This reverts commit a6e3211444 .
2024-08-16 20:33:19 -07:00
George Hotz
a6e3211444
use vmax for real_size [run_process_replay] ( #6120 )
...
* use vmax for real_size [run_process_replay]
* axis is masked
2024-08-16 20:17:23 -07:00
George Hotz
912f01ed4b
UOpGraph -> linearize_uop [run_process_replay] ( #6119 )
2024-08-16 19:48:39 -07:00
George Hotz
74ee9febec
remove iter from uopgraph ( #6110 )
...
* remove iter from uopgraph
* linearize returns uops
* fix tests
* linearize in linearize
* tests fix
* touchup
* test failures
2024-08-16 15:58:29 -07:00
qazal
28c75bf2a6
merge uops with ops ( #6111 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-08-16 18:17:57 -04:00
chenyu
9ef82e1f2b
UOp pattern DEFINE_VAR with min==max is also CONST ( #6095 )
...
* UOp pattern DEFINE_VAR with min==max is also CONST
* fix tests
2024-08-15 12:09:44 -04:00
chenyu
5accfe26a0
rewrite bool ADD to OR and MUL to AND ( #6084 )
...
* rewrite bool ADD to OR and MUL to AND
fixed running `tinyphysics.onnx`, which contains a getitem from a boolean tensor.
only can repro through BEAM_COMPARE, which i think is a different bug in test_linearizer_failure
* fold those, and fix tests
* only for bool
* move dtypes.bool
2024-08-15 10:11:57 -04:00
chenyu
1782e4f64d
use div folding to do lt folding ( #6065 )
2024-08-13 16:59:05 -04:00
chenyu
6ed9711898
UOps pattern (x%c)+(x//c)*c = x ( #6051 )
...
pretty cool that this is very easy to write now
2024-08-12 14:58:48 -04:00
chenyu
e6c7c3e499
update pylint path to check indent/space for all ( #6022 )
...
also fixed many errors. it was not checking nested dirs. exclude autogen for now.
can we use ruff for this?
2024-08-10 14:41:09 -04:00
George Hotz
cfb04c67d1
run unit tests separate from others (and only once) ( #6020 )
...
* run unit tests separate from others
* ignore unit tests elsewhere
2024-08-10 11:17:56 -07:00
chenyu
63a8bc29d4
addition divisor in UOp div_folding ( #6002 )
...
in addition to try gcd of all terms, also try least common divisor of all MULs
2024-08-09 20:09:05 -04:00
chenyu
5961faa4be
minor change to UOp div_fold ( #6004 )
...
remove an unnecessary gcd and swap the quo rem order, minimize diff for divisor pr
2024-08-09 17:09:59 -04:00
chenyu
1f1eb46af6
more failed simplified UOp div test case ( #5992 )
...
this speculative div was handled by "divisor" in symbolic.
2024-08-08 18:39:25 -04:00
chenyu
c3e1ae2535
add failed simplified UOp div test case ( #5990 )
...
more cases!
2024-08-08 17:37:48 -04:00
chenyu
62c77a2831
trim const in UOp div_folding ( #5982 )
...
simplify `(4*x+4*y+7)//16` to `(x+y+1)//4`.
fixed `GPU=1 UOP_IS_SYMBOLIC=1 IMAGE=2 python -m pytest test/test_ops.py -k conv`
2024-08-08 12:49:05 -04:00
chenyu
859d0e4709
UOp simplify (x+c0)*c1 -> x*c1+c0*c1 ( #5973 )
2024-08-07 21:25:22 -04:00
chenyu
fa3a36e576
fancier UOp div gcd folding ( #5953 )
...
combine and cancel the remaining const based on gcd of other terms like SumNode.
2024-08-07 02:04:25 -04:00
chenyu
aa7fd7ef74
Use (-self).lt(-x+1) for UOp.ge ( #5955 )
...
matched symbolic and fixed UOP_IS_SYMBOLIC=1 arange folding
2024-08-07 01:31:27 -04:00
chenyu
aee737bd9e
divide by gcd in UOp div folding ( #5949 )
...
* divide by gcd in UOp div folding
`(6x+6y)//16 -> (3x+3y)//8` etc
simpler version
* only factor out const
* don't apply for unsigned
* don't need that if
* space
2024-08-06 20:00:57 -04:00
chenyu
489575c3be
more UOp sum div with gcd tests ( #5936 )
...
* more UOp sum div with gcd tests
* one more
2024-08-06 12:50:10 -04:00
chenyu
09b7722637
UOp generic div folding ( #5896 )
2024-08-05 21:38:43 -04:00
chenyu
da61dea1b2
simple failed UOp sub symbolic test case ( #5894 )
2024-08-03 14:27:23 -04:00
chenyu
d5de44340e
UOp add mod folding ( #5862 )
...
* UOp add mod folding
* that passes now
2024-08-02 18:31:46 -04:00
chenyu
41bbd3f4c1
update UOp mod reduction patterns ( #5883 )
...
prepare generic mod folding, also some test changes from mod folding pr
2024-08-02 17:43:40 -04:00
George Hotz
877e0b4ba0
define global only has the index [run_process_replay] ( #5869 )
...
* define global only has the index [run_process_replay]
* fix that linearizer test
* fix ptx
* stupid ptx fix
2024-08-01 19:01:15 -07:00
chenyu
f27f949a5d
Revert "revert some UOp IDIV bound ( #5863 )" ( #5871 )
...
This reverts commit 0c8d202348 .
2024-08-01 21:38:31 -04:00
chenyu
df138bc558
Revert "revert a mod pattern ( #5864 )" ( #5870 )
...
This reverts commit 5c8de2d044 .
2024-08-01 20:44:26 -04:00
chenyu
1b0314d9ef
Revert "remove one more UOp mod pattern ( #5865 )" ( #5868 )
...
This reverts commit b03b8e18c2 .
2024-08-01 20:28:35 -04:00
chenyu
b03b8e18c2
remove one more UOp mod pattern ( #5865 )
...
fixed UOP_IS_SYMBOLIC=1 test_failure_40
2024-08-01 18:29:04 -04:00
chenyu
5c8de2d044
revert a mod pattern ( #5864 )
...
fixed UOP_IS_SYMBOLIC=1 linearizer failure 47
2024-08-01 17:24:26 -04:00
chenyu
0c8d202348
revert some UOp IDIV bound ( #5863 )
...
* revert some UOp IDIV bound
breaks conv with UOP_IS_SYMBOLIC, added some conv tests in CI
* those are correct
* skip slow ones
2024-08-01 15:09:06 -04:00
chenyu
c2ffcf6887
remove the wrong mod UOp pattern ( #5847 )
...
don't think we are hitting it because the stride construction, and it's wrong and not needed
2024-07-31 16:24:25 -04:00
chenyu
2e087ca8e4
UOp bound for div negative number ( #5808 )
2024-07-31 02:10:23 -04:00
chenyu
02f0be03f2
tests on UOp div negative number and arange opts ( #5825 )
2024-07-30 20:06:57 -04:00
nimlgen
ab3839a80a
cleanup nv/cuda compilers ( #5767 )
...
* cleanup nv/cuda compilers
* destroy prog
* small test
* fix test
* nv ptx rewrite key
* jitlink free
* ptx is part of cuda
2024-07-29 13:50:03 +03:00
chenyu
e7a14f398e
more uop_symbolic tests for divmod pairs ( #5785 )
2024-07-28 21:27:06 -04:00
chenyu
71a64d8252
UOps.MUL bound when one is negative ( #5781 )
...
* UOps.MUL bound when one is negative
also one more distribute_mul rule
* don't always expand
2024-07-28 19:02:47 -04:00