George Hotz
fa14f7b4fd
switch contract arg to match expand arg [run_process_replay] ( #5667 )
...
* switch contract arg to match expand arg [run_process_replay]
* support multiaxis contract too, it's easy
* cancel contract/expand
2024-07-23 18:08:33 -07:00
chenyu
ea99efe815
remove UOps lt pattern of booleans ( #5666 )
...
covered by the generic lt fold pattern
2024-07-23 20:11:21 -04:00
chenyu
e196640d71
more generic lt folding ( #5665 )
2024-07-23 19:50:59 -04:00
chenyu
7c8fe0fe47
skip interpolate tests for PYTHON=1 ( #5664 )
2024-07-23 18:47:15 -04:00
George Hotz
a85493bdbe
multiaxis contract test
2024-07-23 15:09:15 -07:00
George Hotz
e3f00ac77d
Fix cuda tc emu test ( #5663 )
...
* fix acc folding for NV tensor cores
* fix correctness of reduce_before_expand
* fix test emulated CUDA tensor cores
* test_gemm_fp16 on some devices
2024-07-23 15:04:25 -07:00
chenyu
c34f9db0f7
remove ptx PTXRenderer.gdim gid lid [run_process_replay] ( #5662 )
...
gdim is not used, gid and lid do not need to be attributes
2024-07-23 17:33:20 -04:00
chenyu
16c27ae400
update UOp.SPECIAL arg spec [run_process_replay] ( #5661 )
...
* update UOp.SPECIAL arg spec [run_process_replay]
from `(0, "gid0", 4)` to just `("gid0", 4)`. closer to a Variable
* fix ptx
2024-07-23 16:58:12 -04:00
George Hotz
4d47968580
fix acc folding for NV tensor cores ( #5658 )
...
* fix acc folding for NV tensor cores
* fix correctness of reduce_before_expand
2024-07-23 13:03:02 -07:00
chenyu
01fe00e055
skip test_failure_39 in CI ( #5660 )
...
took more than 2 minutes in ci metal, it's basically the same as test_failure_37 but 20X bigger
2024-07-23 14:47:05 -04:00
chenyu
fdc72ba102
reorder UOps.DEFINE_VAR in runtime [run_process_replay] ( #5659 )
...
prep rewrite SPECIAL using DEFINE_VAR
2024-07-23 14:32:10 -04:00
chenyu
199b3bf02b
simple UOp lt/ge folding ( #5657 )
...
works if lhs is a DEFINE_VAR.
folds trivial x < -math.inf now, need to change SPECIAL to use DEFINE_VAR to fold more
2024-07-23 14:11:05 -04:00
qazal
b0fc5a4c6f
start scheduler process replay ( #5656 )
2024-07-23 20:02:51 +03:00
chenyu
e210c87b4a
uop mod-mod simplification ( #5650 )
2024-07-23 12:33:55 -04:00
nimlgen
1384f08cd4
hcq profile tests ( #5654 )
...
* profile tests
* fixes
* remove linter
2024-07-23 18:40:33 +03:00
qazal
5f394fc9c6
more work toward non-blocking process replay ( #5653 )
...
* non-blocking process replay
* more actionable
* test it
* revert the test
* %s/logging.warn/logging.warning
2024-07-23 14:26:31 +03:00
nimlgen
a93982ef42
hcq move out program call to base class ( #5638 )
...
* hcq move out program call to base class
* fix
2024-07-23 14:25:38 +03:00
qazal
7cb67e6fb2
merge gated stores spec ( #5652 )
...
* test_unmerged_ifs should merge ifs
* test_tiny_gate_store
* test_merge_ifs_alt
* assert assert asserts
2024-07-23 18:53:27 +08:00
nimlgen
4dcca0a6d4
amd tiny cleanups ( #5651 )
2024-07-23 13:06:23 +03:00
George Hotz
7c4b177e3a
add tests for uops stats ( #5649 )
...
* add tests for uops stats
* no locals skip is fine
* eh
2024-07-22 21:57:03 -07:00
chenyu
4f83da626e
uop symbolic simple mul mod ( #5648 )
2024-07-22 23:17:41 -04:00
George Hotz
4042bc2399
hotfix: put that space back in DEBUG=2
2024-07-22 20:11:15 -07:00
George Hotz
2a436fa5c6
memory estimate of cache also ( #5646 )
...
* print cache/mem ratio
* lds update
* min mem and lds
* cleanup
2024-07-22 19:56:36 -07:00
chenyu
efc7bf37a2
reuse UOp.sparents in UOps.vars [run_process_replay] ( #5647 )
...
also simplified a few set.union
2024-07-22 22:50:19 -04:00
chenyu
f2d2afdaa4
dumb linearizer example that max is not simplified ( #5644 )
...
* dumb linearizer example that max is not simplified
this might just get fix once basic mod simplification is done
* need local
2024-07-22 18:37:26 -04:00
chenyu
fe17ea5c88
typo in ops_amd invalidate_caches ( #5643 )
...
lead to silently not being called
2024-07-22 18:37:11 -04:00
George Hotz
ed2ee52b8b
fix arange 4096 with more folding rules ( #5641 )
2024-07-22 14:21:11 -07:00
chenyu
24505199fb
UOp.const(x.dtype, y) -> x.const(y) [run_process_replay] ( #5642 )
2024-07-22 17:09:40 -04:00
chenyu
97b116bb1d
UOp mul div simplification ( #5637 )
...
* UOp mul div simplification
* != 0 is fine
2024-07-22 16:14:12 -04:00
nimlgen
ee633c1988
hcq move out synchronize to base class ( #5634 )
2024-07-22 20:36:04 +03:00
nimlgen
26fc4610a0
amd more accurate cache managment ( #5631 )
...
* amd more accurate cache managment
* fix amd
* add memory_barrier + copies tests
* tranfer test as well
* linter
2024-07-22 19:07:01 +03:00
qazal
fe6f9b2048
more actionable verify_lazyop assert ( #5635 )
2024-07-23 00:06:11 +08:00
Vyacheslav Pachkov
edc58e6b6e
hcq: remove duplicate allocation of kernel args by abstracting ( #5633 )
2024-07-22 18:29:41 +03:00
nimlgen
08a9c0ae5e
hcq cache invalidation for beam ( #5630 )
...
* nv full cache invalidation
* the same command on amd
* linter
* fix amd
* nv no hardcoded consts
* beam default
2024-07-22 18:13:17 +03:00
qazal
c64e9591e3
replace gates in uopgraph [run_process_replay] ( #5632 )
...
* test
* replace gates in uopgraph [run_process_replay]
* rewrite all gates
* hmm, process replay passes?
* Revert "rewrite all gates"
This reverts commit 2425a443f3 .
* yea that makes sense
* remove unsued
* replace source should be up there
2024-07-22 15:56:55 +03:00
George Hotz
dc21e63bd2
test: put conv in one reduce ( #4441 )
...
* test: put conv in one reduce
* put reduce at the end
* more expand
* generic, and that expand was breaking things
* ratio
* don't undo the expand
* arg 1
* strides
* warning, for resnet
* warning removed
* disable cast
* handle cast
* op
* err, that's right
* fixup
* fix that
* a test to play with
* add double_reduces
* working up to final reshape
* fold the last reshape
* moved to schedule
* fix axis
* ci, need to bring arange back
* FUSE_CONV_BW maybe
* valid in 3.9
* test_expand_reduce_is_folded_on_different_axes
* add FUSE_CONV_BW=1
* test_fold_batchnorm_backward
* test_sgd_4convs_fuse
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-07-22 12:16:13 +03:00
George Hotz
386fb5e7f8
folding without UNMUL ( #5628 )
...
* folding without UNMUL
* fix failures, index_collapse
* import ReduceOps
* test_arange_4096 isn't folding
2024-07-21 20:14:44 -07:00
Vyacheslav Pachkov
583829ab44
helpers: remove duplicate data64 helpers in amd/nv ( #5627 )
2024-07-21 16:50:59 -07:00
George Hotz
6c6d74d922
parallel mcts ( #5626 )
...
* start work on parallel mcts
* compile was linearizing twice
* typing + more early stopping
* fix compiler error
2024-07-21 14:53:23 -07:00
chenyu
c56c9c7519
move ufix inside UOp [run_process_replay] ( #5621 )
...
dtype is always the dtype of the caller
2024-07-21 17:30:37 -04:00
George Hotz
ef179087a4
mcts exit condition wasn't right, also use it with BEAM>=100 ( #5619 )
...
* mcts exit condition wasn't right, also use it with BEAM>=100
* mcts touchups
* clean up sample
2024-07-21 10:16:47 -07:00
chenyu
a823759dc5
simpler pattern matcher rules [run_process_replay] ( #5620 )
2024-07-21 04:05:01 -04:00
George Hotz
0f67ef4674
mcts graph and dedup support ( #5618 )
...
* mcts graph and dedup support
* usable graph
* mcts colors
* C=4 seems better
* C=3 even better
* sample_tree
* backprop is external function
* late expand to match algo
2024-07-20 23:29:14 -07:00
George Hotz
7f5282b2f5
tests if the linearizer is generating dumb code ( #5611 )
...
* tests if the linearizer is generating dumb code
* push consts to the end
* sort adds
* sorted add and mul
* this better
* simple expand/contract
* no math contract/expand
2024-07-20 20:36:32 -07:00
chenyu
eddc5bcfd7
MCTS tweaks ( #5616 )
...
MCTS 500 is competitive with BEAM=8 on resnet on M1 Max.
- increment trial times even with compiled error and runtime error.
- use best time of children as the node value.
2024-07-20 19:45:59 -07:00
George Hotz
b399ccd6ef
BEAM bugfix, kernels dedup now ( #5617 )
...
* BEAM bugfix, kernels dedup now
* getenv is default
2024-07-20 19:43:50 -07:00
chenyu
92e7e65712
one more test case for symbolic mod mul ( #5615 )
2024-07-20 17:23:06 -04:00
chenyu
d71308ed68
copy mlperf 4.0 to mlperf 4.1 ( #5614 )
2024-07-20 16:12:00 -04:00
George Hotz
1113e47f96
print best in MCTS + light up the winner in hcopt
2024-07-20 09:39:36 -07:00
nimlgen
0de5812032
hcq move map to allocator ( #5610 )
...
* hcq move map to allocator
* fix
2024-07-20 19:02:45 +03:00