George Hotz
fa14f7b4fd
switch contract arg to match expand arg [run_process_replay] ( #5667 )
...
* switch contract arg to match expand arg [run_process_replay]
* support multiaxis contract too, it's easy
* cancel contract/expand
2024-07-23 18:08:33 -07:00
George Hotz
a85493bdbe
multiaxis contract test
2024-07-23 15:09:15 -07:00
George Hotz
e3f00ac77d
Fix cuda tc emu test ( #5663 )
...
* fix acc folding for NV tensor cores
* fix correctness of reduce_before_expand
* fix test emulated CUDA tensor cores
* test_gemm_fp16 on some devices
2024-07-23 15:04:25 -07:00
chenyu
16c27ae400
update UOp.SPECIAL arg spec [run_process_replay] ( #5661 )
...
* update UOp.SPECIAL arg spec [run_process_replay]
from `(0, "gid0", 4)` to just `("gid0", 4)`. closer to a Variable
* fix ptx
2024-07-23 16:58:12 -04:00
chenyu
01fe00e055
skip test_failure_39 in CI ( #5660 )
...
took more than 2 minutes in ci metal, it's basically the same as test_failure_37 but 20X bigger
2024-07-23 14:47:05 -04:00
chenyu
199b3bf02b
simple UOp lt/ge folding ( #5657 )
...
works if lhs is a DEFINE_VAR.
folds trivial x < -math.inf now, need to change SPECIAL to use DEFINE_VAR to fold more
2024-07-23 14:11:05 -04:00
qazal
b0fc5a4c6f
start scheduler process replay ( #5656 )
2024-07-23 20:02:51 +03:00
chenyu
e210c87b4a
uop mod-mod simplification ( #5650 )
2024-07-23 12:33:55 -04:00
nimlgen
1384f08cd4
hcq profile tests ( #5654 )
...
* profile tests
* fixes
* remove linter
2024-07-23 18:40:33 +03:00
qazal
5f394fc9c6
more work toward non-blocking process replay ( #5653 )
...
* non-blocking process replay
* more actionable
* test it
* revert the test
* %s/logging.warn/logging.warning
2024-07-23 14:26:31 +03:00
qazal
7cb67e6fb2
merge gated stores spec ( #5652 )
...
* test_unmerged_ifs should merge ifs
* test_tiny_gate_store
* test_merge_ifs_alt
* assert assert asserts
2024-07-23 18:53:27 +08:00
George Hotz
7c4b177e3a
add tests for uops stats ( #5649 )
...
* add tests for uops stats
* no locals skip is fine
* eh
2024-07-22 21:57:03 -07:00
chenyu
4f83da626e
uop symbolic simple mul mod ( #5648 )
2024-07-22 23:17:41 -04:00
chenyu
f2d2afdaa4
dumb linearizer example that max is not simplified ( #5644 )
...
* dumb linearizer example that max is not simplified
this might just get fix once basic mod simplification is done
* need local
2024-07-22 18:37:26 -04:00
chenyu
24505199fb
UOp.const(x.dtype, y) -> x.const(y) [run_process_replay] ( #5642 )
2024-07-22 17:09:40 -04:00
chenyu
97b116bb1d
UOp mul div simplification ( #5637 )
...
* UOp mul div simplification
* != 0 is fine
2024-07-22 16:14:12 -04:00
nimlgen
26fc4610a0
amd more accurate cache managment ( #5631 )
...
* amd more accurate cache managment
* fix amd
* add memory_barrier + copies tests
* tranfer test as well
* linter
2024-07-22 19:07:01 +03:00
Vyacheslav Pachkov
edc58e6b6e
hcq: remove duplicate allocation of kernel args by abstracting ( #5633 )
2024-07-22 18:29:41 +03:00
George Hotz
dc21e63bd2
test: put conv in one reduce ( #4441 )
...
* test: put conv in one reduce
* put reduce at the end
* more expand
* generic, and that expand was breaking things
* ratio
* don't undo the expand
* arg 1
* strides
* warning, for resnet
* warning removed
* disable cast
* handle cast
* op
* err, that's right
* fixup
* fix that
* a test to play with
* add double_reduces
* working up to final reshape
* fold the last reshape
* moved to schedule
* fix axis
* ci, need to bring arange back
* FUSE_CONV_BW maybe
* valid in 3.9
* test_expand_reduce_is_folded_on_different_axes
* add FUSE_CONV_BW=1
* test_fold_batchnorm_backward
* test_sgd_4convs_fuse
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-07-22 12:16:13 +03:00
George Hotz
386fb5e7f8
folding without UNMUL ( #5628 )
...
* folding without UNMUL
* fix failures, index_collapse
* import ReduceOps
* test_arange_4096 isn't folding
2024-07-21 20:14:44 -07:00
George Hotz
7f5282b2f5
tests if the linearizer is generating dumb code ( #5611 )
...
* tests if the linearizer is generating dumb code
* push consts to the end
* sort adds
* sorted add and mul
* this better
* simple expand/contract
* no math contract/expand
2024-07-20 20:36:32 -07:00
George Hotz
b399ccd6ef
BEAM bugfix, kernels dedup now ( #5617 )
...
* BEAM bugfix, kernels dedup now
* getenv is default
2024-07-20 19:43:50 -07:00
chenyu
92e7e65712
one more test case for symbolic mod mul ( #5615 )
2024-07-20 17:23:06 -04:00
qazal
3ab5fe4e1b
test argmax multireduce failure ( #5609 )
2024-07-20 21:33:03 +08:00
chenyu
b991097d41
move UPat and PatternMatcher from uopgraph.py to uops.py ( #5597 )
...
* move UPat and PatternMatcher from uopgraph.py to uops.py
towards instant UOps rewrite on UOp.alu
[run_process_replay]
* fix imports
2024-07-19 19:28:24 -04:00
George Hotz
2e617ca59e
lowerer img index ( #5592 )
2024-07-19 14:22:02 -07:00
nimlgen
b1782e3fef
hcq refactor signal into class ( #5575 )
...
* hcq refactor signal into class
* fix amd
* amd do not use amd_signal_t
* cleanup
* signal setter
* fix linter
* docs
* more docs + types
* fix types
2024-07-19 23:23:05 +03:00
George Hotz
d0ab20a5e5
careful memory counting (with tests to specify behavior) ( #5587 )
2024-07-19 11:37:34 -07:00
chenyu
37dd233650
always reverse global dim ( #5586 )
...
* always reverse global dim
* one more test
2024-07-19 13:58:05 -04:00
George Hotz
10be05aae5
push contract through cast to fix test_float2_acc (try 2) ( #5585 )
...
* push contract through cast to fix test_float2_acc (try 2)
* contract push only on floats
2024-07-19 10:34:43 -07:00
George Hotz
51892c8fac
Revert "push contract through cast to fix test_float2_acc ( #5581 )" ( #5583 )
...
This reverts commit ddda9420be .
2024-07-19 09:44:30 -07:00
George Hotz
ddda9420be
push contract through cast to fix test_float2_acc ( #5581 )
...
* push contract through cast to fix test_float2_acc
* no_vectorized_alu applies to cast too
2024-07-19 09:30:26 -07:00
chenyu
3f590c3b31
some limit_dims to limit global merging ( #5489 )
...
only supports merging dims in a way that does not surpass limit, no splitting yet
2024-07-19 12:17:46 -04:00
George Hotz
0ad87021e2
move acc to end ( #5568 )
...
* move acc to end
* confirmed pictures are the same
* relax that
* Update test_ops.py
2024-07-19 03:06:52 -07:00
George Hotz
2de82b8a5d
remove get_lazyop_info ( #5570 )
...
* don't use get_lazyop_info more
* keep that min
* no ptx for that test
2024-07-19 03:05:33 -07:00
nimlgen
9d7edc9269
hcq rename HCQCompat -> HCQ ( #5577 )
2024-07-19 11:34:17 +03:00
chenyu
2b2f8ad18c
failed example of float2 acc no long applies ( #5573 )
...
* failed example of float2 acc no long applies
* # noqa: E501
2024-07-19 02:40:04 -04:00
qazal
e7a057c20f
retire replay_schedule ( #5563 )
2024-07-18 23:07:02 +03:00
qazal
50aba32ea8
hotfix: don't assert process replay in master. ( #5562 )
...
This is because https://github.com/tinygrad/tinygrad/actions/runs/9996754763/job/27631802686 ran exactly when master changed state, causing the diff to assert.
if [run_process_replay] is green pre merge it's ok.
2024-07-18 22:05:00 +03:00
George Hotz
223d9283ee
fix float4 acc by moving contracts ( #5559 )
2024-07-18 11:30:16 -07:00
chenyu
f5af98c450
failed test case that DEFINE_ACC no long uses float4 ( #5555 )
...
* failed test case that DEFINE_ACC no long uses float4
* line
2024-07-18 10:55:59 -07:00
George Hotz
923e0fe0b8
fix half4 folding ( #5556 )
2024-07-18 10:47:39 -07:00
chenyu
12e6771209
failed test case for unrolled half4 ( #5552 )
2024-07-18 13:05:52 -04:00
George Hotz
d1a7279605
indexing fold with casted bool ( #5551 )
...
* cast bool is where
* universal transform is wrong
2024-07-18 10:02:29 -07:00
kormann
2c4add6844
pretty print lazy op per default ( #5505 )
...
* pretty lop
* min diff
* walrus
* fix
* min diff
* simplify
* pretty helper function
* ws
* pretty uop upat
* tests
* stricter tests
* test passes
* ws
* stronger upat test
* delete print_tree
* min diff
* stricter exp test
* fix merge
* stronger uops eval test
* +readable and deep upat test
* +readable and deep upat test
* sort inv fix
* fix
* revert allowed_len
2024-07-18 09:34:08 -07:00
qazal
0ad1672d5f
fuse indexing (LazyOp creation) ( #5506 )
...
* bring FUSE_AS_ONE_KERNEL back
* operands need reshape?
* fused but arange didnt fold
* something deeply wrong
* yay, fused
* derive broadcasts
* s/input/reduce_input
* _fixup_ones proved a point
* this is what it takes
* down to 3 required reshapes:
1. output_shape
2. the second reduce merge dims
3. remove dims for above reshape
* start real reshapes
* resolve shape in the edges pre lazyop
* outputs are the same shape
* rewrite1: just the reduce
* more correct
* fuse_as_one_kernel
* closer
* this passes
* dont rerun info
* dont need these
* not needed
2024-07-18 14:09:17 +03:00
George Hotz
fa7e734b49
MetaOps.KERNEL ( #5543 )
2024-07-17 19:41:23 -07:00
George Hotz
d3b098299d
add failing regression test for image ( #5540 )
...
* add failing regression test for image
* tg type
* simpler test
* don't realize image to image casts caused issue
* simple pad
2024-07-17 17:27:18 -07:00
qazal
61ee02e93d
start multireduce lowerer work (var/std) ( #5537 )
...
* multireduce no-opts works
* passed test_var_multireduce
* cleanup
* double reduce
* extra check for range_group
* more checking for range_groups
* cleaning up debug prints
* cleanup diff
* linters
* revert kernel changes
* these are uops toposort
---------
Co-authored-by: timmy <timmy0x@proton.me >
2024-07-17 23:43:46 +03:00
Francis Lam
c4eb30a04c
test/test_linearizer_failures: add a new beautiful_mnist one ( #5531 )
...
* test/test_linearizer_failures: add a new beautiful_mnist one
this one is from a DEPTH=2 fuzz_linearizer search
* add GPU to test_failure_40
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-17 16:27:04 -04:00