Commit Graph

2147 Commits

Author SHA1 Message Date
George Hotz
fa14f7b4fd switch contract arg to match expand arg [run_process_replay] (#5667)
* switch contract arg to match expand arg [run_process_replay]

* support multiaxis contract too, it's easy

* cancel contract/expand
2024-07-23 18:08:33 -07:00
George Hotz
a85493bdbe multiaxis contract test 2024-07-23 15:09:15 -07:00
George Hotz
e3f00ac77d Fix cuda tc emu test (#5663)
* fix acc folding for NV tensor cores

* fix correctness of reduce_before_expand

* fix test emulated CUDA tensor cores

* test_gemm_fp16 on some devices
2024-07-23 15:04:25 -07:00
chenyu
16c27ae400 update UOp.SPECIAL arg spec [run_process_replay] (#5661)
* update UOp.SPECIAL arg spec [run_process_replay]

from `(0, "gid0", 4)` to just `("gid0", 4)`. closer to a Variable

* fix ptx
2024-07-23 16:58:12 -04:00
chenyu
01fe00e055 skip test_failure_39 in CI (#5660)
took more than 2 minutes in ci metal, it's basically the same as test_failure_37 but 20X bigger
2024-07-23 14:47:05 -04:00
chenyu
199b3bf02b simple UOp lt/ge folding (#5657)
works if lhs is a DEFINE_VAR.
folds trivial x < -math.inf now, need to change SPECIAL to use DEFINE_VAR to fold more
2024-07-23 14:11:05 -04:00
qazal
b0fc5a4c6f start scheduler process replay (#5656) 2024-07-23 20:02:51 +03:00
chenyu
e210c87b4a uop mod-mod simplification (#5650) 2024-07-23 12:33:55 -04:00
nimlgen
1384f08cd4 hcq profile tests (#5654)
* profile tests

* fixes

* remove linter
2024-07-23 18:40:33 +03:00
qazal
5f394fc9c6 more work toward non-blocking process replay (#5653)
* non-blocking process replay

* more actionable

* test it

* revert the test

* %s/logging.warn/logging.warning
2024-07-23 14:26:31 +03:00
qazal
7cb67e6fb2 merge gated stores spec (#5652)
* test_unmerged_ifs should merge ifs

* test_tiny_gate_store

* test_merge_ifs_alt

* assert assert asserts
2024-07-23 18:53:27 +08:00
George Hotz
7c4b177e3a add tests for uops stats (#5649)
* add tests for uops stats

* no locals skip is fine

* eh
2024-07-22 21:57:03 -07:00
chenyu
4f83da626e uop symbolic simple mul mod (#5648) 2024-07-22 23:17:41 -04:00
chenyu
f2d2afdaa4 dumb linearizer example that max is not simplified (#5644)
* dumb linearizer example that max is not simplified

this might just get fix once basic mod simplification is done

* need local
2024-07-22 18:37:26 -04:00
chenyu
24505199fb UOp.const(x.dtype, y) -> x.const(y) [run_process_replay] (#5642) 2024-07-22 17:09:40 -04:00
chenyu
97b116bb1d UOp mul div simplification (#5637)
* UOp mul div simplification

* != 0 is fine
2024-07-22 16:14:12 -04:00
nimlgen
26fc4610a0 amd more accurate cache managment (#5631)
* amd more accurate cache managment

* fix amd

* add memory_barrier + copies tests

* tranfer test as well

* linter
2024-07-22 19:07:01 +03:00
Vyacheslav Pachkov
edc58e6b6e hcq: remove duplicate allocation of kernel args by abstracting (#5633) 2024-07-22 18:29:41 +03:00
George Hotz
dc21e63bd2 test: put conv in one reduce (#4441)
* test: put conv in one reduce

* put reduce at the end

* more expand

* generic, and that expand was breaking things

* ratio

* don't undo the expand

* arg 1

* strides

* warning, for resnet

* warning removed

* disable cast

* handle cast

* op

* err, that's right

* fixup

* fix that

* a test to play with

* add double_reduces

* working up to final reshape

* fold the last reshape

* moved to schedule

* fix axis

* ci, need to bring arange back

* FUSE_CONV_BW maybe

* valid in 3.9

* test_expand_reduce_is_folded_on_different_axes

* add FUSE_CONV_BW=1

* test_fold_batchnorm_backward

* test_sgd_4convs_fuse

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-07-22 12:16:13 +03:00
George Hotz
386fb5e7f8 folding without UNMUL (#5628)
* folding without UNMUL

* fix failures, index_collapse

* import ReduceOps

* test_arange_4096 isn't folding
2024-07-21 20:14:44 -07:00
George Hotz
7f5282b2f5 tests if the linearizer is generating dumb code (#5611)
* tests if the linearizer is generating dumb code

* push consts to the end

* sort adds

* sorted add and mul

* this better

* simple expand/contract

* no math contract/expand
2024-07-20 20:36:32 -07:00
George Hotz
b399ccd6ef BEAM bugfix, kernels dedup now (#5617)
* BEAM bugfix, kernels dedup now

* getenv is default
2024-07-20 19:43:50 -07:00
chenyu
92e7e65712 one more test case for symbolic mod mul (#5615) 2024-07-20 17:23:06 -04:00
qazal
3ab5fe4e1b test argmax multireduce failure (#5609) 2024-07-20 21:33:03 +08:00
chenyu
b991097d41 move UPat and PatternMatcher from uopgraph.py to uops.py (#5597)
* move UPat and PatternMatcher from uopgraph.py to uops.py

towards instant UOps rewrite on UOp.alu

[run_process_replay]

* fix imports
2024-07-19 19:28:24 -04:00
George Hotz
2e617ca59e lowerer img index (#5592) 2024-07-19 14:22:02 -07:00
nimlgen
b1782e3fef hcq refactor signal into class (#5575)
* hcq refactor signal into class

* fix amd

* amd do not use amd_signal_t

* cleanup

* signal setter

* fix linter

* docs

* more docs + types

* fix types
2024-07-19 23:23:05 +03:00
George Hotz
d0ab20a5e5 careful memory counting (with tests to specify behavior) (#5587) 2024-07-19 11:37:34 -07:00
chenyu
37dd233650 always reverse global dim (#5586)
* always reverse global dim

* one more test
2024-07-19 13:58:05 -04:00
George Hotz
10be05aae5 push contract through cast to fix test_float2_acc (try 2) (#5585)
* push contract through cast to fix test_float2_acc (try 2)

* contract push only on floats
2024-07-19 10:34:43 -07:00
George Hotz
51892c8fac Revert "push contract through cast to fix test_float2_acc (#5581)" (#5583)
This reverts commit ddda9420be.
2024-07-19 09:44:30 -07:00
George Hotz
ddda9420be push contract through cast to fix test_float2_acc (#5581)
* push contract through cast to fix test_float2_acc

* no_vectorized_alu applies to cast too
2024-07-19 09:30:26 -07:00
chenyu
3f590c3b31 some limit_dims to limit global merging (#5489)
only supports merging dims in a way that does not surpass limit, no splitting yet
2024-07-19 12:17:46 -04:00
George Hotz
0ad87021e2 move acc to end (#5568)
* move acc to end

* confirmed pictures are the same

* relax that

* Update test_ops.py
2024-07-19 03:06:52 -07:00
George Hotz
2de82b8a5d remove get_lazyop_info (#5570)
* don't use get_lazyop_info more

* keep that min

* no ptx for that test
2024-07-19 03:05:33 -07:00
nimlgen
9d7edc9269 hcq rename HCQCompat -> HCQ (#5577) 2024-07-19 11:34:17 +03:00
chenyu
2b2f8ad18c failed example of float2 acc no long applies (#5573)
* failed example of float2 acc no long applies

* # noqa: E501
2024-07-19 02:40:04 -04:00
qazal
e7a057c20f retire replay_schedule (#5563) 2024-07-18 23:07:02 +03:00
qazal
50aba32ea8 hotfix: don't assert process replay in master. (#5562)
This is because https://github.com/tinygrad/tinygrad/actions/runs/9996754763/job/27631802686 ran exactly when master changed state, causing the diff to assert.
if [run_process_replay] is green pre merge it's ok.
2024-07-18 22:05:00 +03:00
George Hotz
223d9283ee fix float4 acc by moving contracts (#5559) 2024-07-18 11:30:16 -07:00
chenyu
f5af98c450 failed test case that DEFINE_ACC no long uses float4 (#5555)
* failed test case that DEFINE_ACC no long uses float4

* line
2024-07-18 10:55:59 -07:00
George Hotz
923e0fe0b8 fix half4 folding (#5556) 2024-07-18 10:47:39 -07:00
chenyu
12e6771209 failed test case for unrolled half4 (#5552) 2024-07-18 13:05:52 -04:00
George Hotz
d1a7279605 indexing fold with casted bool (#5551)
* cast bool is where

* universal transform is wrong
2024-07-18 10:02:29 -07:00
kormann
2c4add6844 pretty print lazy op per default (#5505)
* pretty lop

* min diff

* walrus

* fix

* min diff

* simplify

* pretty helper function

* ws

* pretty uop upat

* tests

* stricter tests

* test passes

* ws

* stronger upat test

* delete print_tree

* min diff

* stricter exp test

* fix merge

* stronger uops eval test

* +readable and deep upat test

* +readable and deep upat test

* sort inv fix

* fix

* revert allowed_len
2024-07-18 09:34:08 -07:00
qazal
0ad1672d5f fuse indexing (LazyOp creation) (#5506)
* bring FUSE_AS_ONE_KERNEL back

* operands need reshape?

* fused but arange didnt fold

* something deeply wrong

* yay, fused

* derive broadcasts

* s/input/reduce_input

* _fixup_ones proved a point

* this is what it takes

* down to 3 required reshapes:

1. output_shape
2. the second reduce merge dims
3. remove dims for above reshape

* start real reshapes

* resolve shape in the edges pre lazyop

* outputs are the same shape

* rewrite1: just the reduce

* more correct

* fuse_as_one_kernel

* closer

* this passes

* dont rerun info

* dont need these

* not needed
2024-07-18 14:09:17 +03:00
George Hotz
fa7e734b49 MetaOps.KERNEL (#5543) 2024-07-17 19:41:23 -07:00
George Hotz
d3b098299d add failing regression test for image (#5540)
* add failing regression test for image

* tg type

* simpler test

* don't realize image to image casts caused issue

* simple pad
2024-07-17 17:27:18 -07:00
qazal
61ee02e93d start multireduce lowerer work (var/std) (#5537)
* multireduce no-opts works

* passed test_var_multireduce

* cleanup

* double reduce

* extra check for range_group

* more checking for range_groups

* cleaning up debug prints

* cleanup diff

* linters

* revert kernel changes

* these are uops toposort

---------

Co-authored-by: timmy <timmy0x@proton.me>
2024-07-17 23:43:46 +03:00
Francis Lam
c4eb30a04c test/test_linearizer_failures: add a new beautiful_mnist one (#5531)
* test/test_linearizer_failures: add a new beautiful_mnist one

this one is from a DEPTH=2 fuzz_linearizer search

* add GPU to test_failure_40

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-17 16:27:04 -04:00