Commit Graph

5174 Commits

Author SHA1 Message Date
George Hotz
ddda9420be push contract through cast to fix test_float2_acc (#5581)
* push contract through cast to fix test_float2_acc

* no_vectorized_alu applies to cast too
2024-07-19 09:30:26 -07:00
chenyu
3f590c3b31 some limit_dims to limit global merging (#5489)
only supports merging dims in a way that does not surpass limit, no splitting yet
2024-07-19 12:17:46 -04:00
George Hotz
e04704faff put acc first again (#5580) 2024-07-19 08:55:19 -07:00
chenyu
fc5b9f8dc9 Kernel.required_optimizations and Kernel.hand_coded_optimizations returns self (#5576)
[run_process_replay]
2024-07-19 10:55:14 -04:00
qazal
da34e1f617 scheduler refactors from the fuse_index branch (#5579)
* make simple_pads a safe set

* use is for comparing base

* 1 should continue
2024-07-19 16:23:31 +03:00
qazal
ecf88bb775 move assign_targets assignment (#5578) 2024-07-19 20:29:50 +08:00
George Hotz
0ad87021e2 move acc to end (#5568)
* move acc to end

* confirmed pictures are the same

* relax that

* Update test_ops.py
2024-07-19 03:06:52 -07:00
George Hotz
2de82b8a5d remove get_lazyop_info (#5570)
* don't use get_lazyop_info more

* keep that min

* no ptx for that test
2024-07-19 03:05:33 -07:00
nimlgen
9d7edc9269 hcq rename HCQCompat -> HCQ (#5577) 2024-07-19 11:34:17 +03:00
chenyu
2b2f8ad18c failed example of float2 acc no long applies (#5573)
* failed example of float2 acc no long applies

* # noqa: E501
2024-07-19 02:40:04 -04:00
chenyu
efccb1c3ba swap global for size 3 too (#5567)
hc path resnet on green 10% faster
2024-07-18 23:31:15 -04:00
chenyu
abe29a05b0 swap first and last global in hcopt / hc tc path (#5566) 2024-07-18 18:54:44 -04:00
George Hotz
946da97820 swap action (#5565)
* swap action

* don't allow same action expressed differently

* oops, was reversed

* one line is fine

* only swap
2024-07-18 15:19:40 -07:00
qazal
e7a057c20f retire replay_schedule (#5563) 2024-07-18 23:07:02 +03:00
qazal
50aba32ea8 hotfix: don't assert process replay in master. (#5562)
This is because https://github.com/tinygrad/tinygrad/actions/runs/9996754763/job/27631802686 ran exactly when master changed state, causing the diff to assert.
if [run_process_replay] is green pre merge it's ok.
2024-07-18 22:05:00 +03:00
George Hotz
223d9283ee fix float4 acc by moving contracts (#5559) 2024-07-18 11:30:16 -07:00
George Hotz
c41cd55556 remove vectorized alu in expander [run_process_replay] (#5561) 2024-07-18 11:27:40 -07:00
kormann
c951bc99af fix abstracions2 printout (#5557) 2024-07-18 21:21:45 +03:00
George Hotz
a7fec05acc fix broken store rule [run_process_replay] (#5558)
* remove unused store rule [run_process_replay]

* that should preserve behavior i think
2024-07-18 11:07:34 -07:00
chenyu
f5af98c450 failed test case that DEFINE_ACC no long uses float4 (#5555)
* failed test case that DEFINE_ACC no long uses float4

* line
2024-07-18 10:55:59 -07:00
George Hotz
923e0fe0b8 fix half4 folding (#5556) 2024-07-18 10:47:39 -07:00
chenyu
12e6771209 failed test case for unrolled half4 (#5552) 2024-07-18 13:05:52 -04:00
George Hotz
d1a7279605 indexing fold with casted bool (#5551)
* cast bool is where

* universal transform is wrong
2024-07-18 10:02:29 -07:00
qazal
fdfc0015a7 [run_process_replay] for opencl/openpilot (#5009)
* lil reset script

* find the prg

* use lower_schedule_item

* add process replay back

* cleanups
2024-07-18 19:42:33 +03:00
kormann
2c4add6844 pretty print lazy op per default (#5505)
* pretty lop

* min diff

* walrus

* fix

* min diff

* simplify

* pretty helper function

* ws

* pretty uop upat

* tests

* stricter tests

* test passes

* ws

* stronger upat test

* delete print_tree

* min diff

* stricter exp test

* fix merge

* stronger uops eval test

* +readable and deep upat test

* +readable and deep upat test

* sort inv fix

* fix

* revert allowed_len
2024-07-18 09:34:08 -07:00
nimlgen
c30092e56d amd remove useless barrier (#5550) 2024-07-18 18:05:33 +03:00
nimlgen
4e9d2b1615 nv memory_barrier command (#5548) 2024-07-18 16:23:11 +03:00
qazal
6d7cd34250 more save_schedule tooling (#5547) 2024-07-18 15:59:53 +03:00
qazal
0ad1672d5f fuse indexing (LazyOp creation) (#5506)
* bring FUSE_AS_ONE_KERNEL back

* operands need reshape?

* fused but arange didnt fold

* something deeply wrong

* yay, fused

* derive broadcasts

* s/input/reduce_input

* _fixup_ones proved a point

* this is what it takes

* down to 3 required reshapes:

1. output_shape
2. the second reduce merge dims
3. remove dims for above reshape

* start real reshapes

* resolve shape in the edges pre lazyop

* outputs are the same shape

* rewrite1: just the reduce

* more correct

* fuse_as_one_kernel

* closer

* this passes

* dont rerun info

* dont need these

* not needed
2024-07-18 14:09:17 +03:00
wozeparrot
6ccb2390c3 feat: update_benchmark_staging (#5529) 2024-07-17 20:40:57 -07:00
chenyu
e569c927cf remove Kernel.shape_offsets [run_process_replay] (#5544)
the only use case now can be further simplified
2024-07-17 23:16:47 -04:00
George Hotz
fa7e734b49 MetaOps.KERNEL (#5543) 2024-07-17 19:41:23 -07:00
George Hotz
d3b098299d add failing regression test for image (#5540)
* add failing regression test for image

* tg type

* simpler test

* don't realize image to image casts caused issue

* simple pad
2024-07-17 17:27:18 -07:00
wozeparrot
218e157f00 benchmark on update_benchmark_staging (#5541) 2024-07-17 17:11:52 -07:00
wozeparrot
8845a5dbfd feat: begin immediate (#5539) 2024-07-17 16:11:21 -07:00
George Hotz
a6e70f8a71 clean up expand function [run_process_replay] (#5538)
* clean up expand function [run_process_replay]

* lil cleaner

* add a type
2024-07-17 15:02:00 -07:00
qazal
61ee02e93d start multireduce lowerer work (var/std) (#5537)
* multireduce no-opts works

* passed test_var_multireduce

* cleanup

* double reduce

* extra check for range_group

* more checking for range_groups

* cleaning up debug prints

* cleanup diff

* linters

* revert kernel changes

* these are uops toposort

---------

Co-authored-by: timmy <timmy0x@proton.me>
2024-07-17 23:43:46 +03:00
qazal
67ea4af01f depth first recurse_reduceops (#5536)
* early recurse

p2

* yea cache shouldnt be there
2024-07-17 23:27:53 +03:00
Francis Lam
c4eb30a04c test/test_linearizer_failures: add a new beautiful_mnist one (#5531)
* test/test_linearizer_failures: add a new beautiful_mnist one

this one is from a DEPTH=2 fuzz_linearizer search

* add GPU to test_failure_40

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-17 16:27:04 -04:00
qazal
0259d76183 use Context only in replaying Kernel [run_process_replay] (#5535) 2024-07-18 03:46:14 +08:00
George Hotz
1a68854766 PatternMatcher add (#5532)
* PatternMatcher add [run_process_replay]

* f4 dynamic

* test_failure_36 is fixed

* fix PTX
2024-07-17 12:44:42 -07:00
qazal
d3c137d478 utility for computing reduceop output_shape (#5534)
* refactor to reduce_st

* update lazy
2024-07-17 22:40:07 +03:00
qazal
0a7872a62f use exec_alu in uops flop counting (#5511)
* use exec_alu for uops flop counting

* deal with sint
2024-07-17 22:39:27 +03:00
qazal
a7706e05f9 option to [skip_process_replay] (#5533) 2024-07-17 22:30:46 +03:00
chenyu
4193095f67 fix handcode_opt.py with DEBUG=2 (#5530)
only one ast per kernel now
2024-07-17 14:50:47 -04:00
chenyu
466555cd17 touchup Tensor.interpolate (#5525)
* touchup Tensor.interpolate and Tensor.lerp

rewrite lerp to save one sub and thus flops.
use Tensor.lerp for interpolate and some minor cleanups

* revert lerp change
2024-07-17 13:35:57 -04:00
George Hotz
1242b302fa expand UOps with rewrite rules (#5501)
* expand UOps with rewrite rules [run_process_replay]

* progress

* much closer

* close, way less bugs

* bunch of expander tests

* fix contract

* ops tests pass

* fix barrier

* mostly passing

* bitcast in expanded ops

* support more expand merges

* all tests pass maybe

* fix empty EXPAND

* fix LIN fuzzing

* add ALL_SAME assert

* all same

* all same work

* raise CompileError

* pass fuzz linearizer

* revert whitespace

* fix nv tensor core test

* fix mypy

* bug fix

* fuzzer passes

* put tests back

* expand arg to idx
2024-07-17 10:17:50 -07:00
George Hotz
158221b36b expand tests from uop_expander [run_process_replay] (#5524)
* expand tests from uop_expander

* more changes from the branch
2024-07-17 09:22:36 -07:00
George Hotz
42c25cc961 fix fixup_ast (#5523)
* fix fixup_ast

* these lin failures are fixed
2024-07-17 08:52:21 -07:00
qazal
fbe0233be3 infra for multi reduce asts (#5522)
* add reduce_info

* _recurse_reduceops base

* derive output shape

* refactor

* delete reduce_for_op

* save lines

* more line saving
2024-07-17 17:23:46 +03:00