Commit Graph

4618 Commits

Author SHA1 Message Date
George Hotz
10be05aae5 push contract through cast to fix test_float2_acc (try 2) (#5585)
* push contract through cast to fix test_float2_acc (try 2)

* contract push only on floats
2024-07-19 10:34:43 -07:00
George Hotz
51892c8fac Revert "push contract through cast to fix test_float2_acc (#5581)" (#5583)
This reverts commit ddda9420be.
2024-07-19 09:44:30 -07:00
George Hotz
ddda9420be push contract through cast to fix test_float2_acc (#5581)
* push contract through cast to fix test_float2_acc

* no_vectorized_alu applies to cast too
2024-07-19 09:30:26 -07:00
chenyu
3f590c3b31 some limit_dims to limit global merging (#5489)
only supports merging dims in a way that does not surpass limit, no splitting yet
2024-07-19 12:17:46 -04:00
George Hotz
0ad87021e2 move acc to end (#5568)
* move acc to end

* confirmed pictures are the same

* relax that

* Update test_ops.py
2024-07-19 03:06:52 -07:00
George Hotz
2de82b8a5d remove get_lazyop_info (#5570)
* don't use get_lazyop_info more

* keep that min

* no ptx for that test
2024-07-19 03:05:33 -07:00
nimlgen
9d7edc9269 hcq rename HCQCompat -> HCQ (#5577) 2024-07-19 11:34:17 +03:00
chenyu
2b2f8ad18c failed example of float2 acc no long applies (#5573)
* failed example of float2 acc no long applies

* # noqa: E501
2024-07-19 02:40:04 -04:00
qazal
e7a057c20f retire replay_schedule (#5563) 2024-07-18 23:07:02 +03:00
qazal
50aba32ea8 hotfix: don't assert process replay in master. (#5562)
This is because https://github.com/tinygrad/tinygrad/actions/runs/9996754763/job/27631802686 ran exactly when master changed state, causing the diff to assert.
if [run_process_replay] is green pre merge it's ok.
2024-07-18 22:05:00 +03:00
George Hotz
223d9283ee fix float4 acc by moving contracts (#5559) 2024-07-18 11:30:16 -07:00
chenyu
f5af98c450 failed test case that DEFINE_ACC no long uses float4 (#5555)
* failed test case that DEFINE_ACC no long uses float4

* line
2024-07-18 10:55:59 -07:00
George Hotz
923e0fe0b8 fix half4 folding (#5556) 2024-07-18 10:47:39 -07:00
chenyu
12e6771209 failed test case for unrolled half4 (#5552) 2024-07-18 13:05:52 -04:00
George Hotz
d1a7279605 indexing fold with casted bool (#5551)
* cast bool is where

* universal transform is wrong
2024-07-18 10:02:29 -07:00
kormann
2c4add6844 pretty print lazy op per default (#5505)
* pretty lop

* min diff

* walrus

* fix

* min diff

* simplify

* pretty helper function

* ws

* pretty uop upat

* tests

* stricter tests

* test passes

* ws

* stronger upat test

* delete print_tree

* min diff

* stricter exp test

* fix merge

* stronger uops eval test

* +readable and deep upat test

* +readable and deep upat test

* sort inv fix

* fix

* revert allowed_len
2024-07-18 09:34:08 -07:00
qazal
0ad1672d5f fuse indexing (LazyOp creation) (#5506)
* bring FUSE_AS_ONE_KERNEL back

* operands need reshape?

* fused but arange didnt fold

* something deeply wrong

* yay, fused

* derive broadcasts

* s/input/reduce_input

* _fixup_ones proved a point

* this is what it takes

* down to 3 required reshapes:

1. output_shape
2. the second reduce merge dims
3. remove dims for above reshape

* start real reshapes

* resolve shape in the edges pre lazyop

* outputs are the same shape

* rewrite1: just the reduce

* more correct

* fuse_as_one_kernel

* closer

* this passes

* dont rerun info

* dont need these

* not needed
2024-07-18 14:09:17 +03:00
George Hotz
fa7e734b49 MetaOps.KERNEL (#5543) 2024-07-17 19:41:23 -07:00
George Hotz
d3b098299d add failing regression test for image (#5540)
* add failing regression test for image

* tg type

* simpler test

* don't realize image to image casts caused issue

* simple pad
2024-07-17 17:27:18 -07:00
qazal
61ee02e93d start multireduce lowerer work (var/std) (#5537)
* multireduce no-opts works

* passed test_var_multireduce

* cleanup

* double reduce

* extra check for range_group

* more checking for range_groups

* cleaning up debug prints

* cleanup diff

* linters

* revert kernel changes

* these are uops toposort

---------

Co-authored-by: timmy <timmy0x@proton.me>
2024-07-17 23:43:46 +03:00
Francis Lam
c4eb30a04c test/test_linearizer_failures: add a new beautiful_mnist one (#5531)
* test/test_linearizer_failures: add a new beautiful_mnist one

this one is from a DEPTH=2 fuzz_linearizer search

* add GPU to test_failure_40

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-17 16:27:04 -04:00
qazal
0259d76183 use Context only in replaying Kernel [run_process_replay] (#5535) 2024-07-18 03:46:14 +08:00
George Hotz
1a68854766 PatternMatcher add (#5532)
* PatternMatcher add [run_process_replay]

* f4 dynamic

* test_failure_36 is fixed

* fix PTX
2024-07-17 12:44:42 -07:00
qazal
a7706e05f9 option to [skip_process_replay] (#5533) 2024-07-17 22:30:46 +03:00
George Hotz
1242b302fa expand UOps with rewrite rules (#5501)
* expand UOps with rewrite rules [run_process_replay]

* progress

* much closer

* close, way less bugs

* bunch of expander tests

* fix contract

* ops tests pass

* fix barrier

* mostly passing

* bitcast in expanded ops

* support more expand merges

* all tests pass maybe

* fix empty EXPAND

* fix LIN fuzzing

* add ALL_SAME assert

* all same

* all same work

* raise CompileError

* pass fuzz linearizer

* revert whitespace

* fix nv tensor core test

* fix mypy

* bug fix

* fuzzer passes

* put tests back

* expand arg to idx
2024-07-17 10:17:50 -07:00
George Hotz
158221b36b expand tests from uop_expander [run_process_replay] (#5524)
* expand tests from uop_expander

* more changes from the branch
2024-07-17 09:22:36 -07:00
George Hotz
42c25cc961 fix fixup_ast (#5523)
* fix fixup_ast

* these lin failures are fixed
2024-07-17 08:52:21 -07:00
nimlgen
dcd462860f elf loader (#5508)
* elf loader

* cleanup

* cleaner

* cleaner

* fixes

* revert this

* fix div 0

* fix nv

* amd fix

* fix mockgpu

* amd better?

* restore relocs for <12.4

* linter

* this is fixed now

* revert this

* process cdefines as function

* cleaner

* align

* save lines

* revert this change
2024-07-17 17:09:34 +03:00
Francis Lam
2d53abb04a test/external/fuzz_linearizer: fix for new AST changes (#5519)
* test/external/fuzz_linearizer: fix for new AST changes

also add beautiful_mnist failures

* add CLANG and LLVM to test_failure_35 failed_platforms

* fix test_linearizer_failure names
2024-07-17 00:08:07 -04:00
chenyu
6e405b0a2b add 0d tensor to trunc/floor/ceil/round tests (#5512)
existing trunc test passes backward but its backward is incorrect in general. added tests that would fail
2024-07-16 16:48:25 -04:00
Tobias Fischer
87a2ef2bc2 Add Interpolate Function (#5482)
* add interpolate function

* fixed linter issue

* reduced sizes in test

---------

Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2024-07-16 09:44:01 -07:00
qazal
173064c69c (re)start multireduce in codegen/* (#5391)
* test_var_multireduce

* run verify_lazyop

* test_var_multireduce

* assert lazyop

* add test_indexing_multireduce

* arange fuses (crude)

* note: extra reshape

* start readble

* test_arange_simple

* test_arange_expanded

* test_indexing_multireduce

* cleanups

* skip ptx

* skip nv and amd ci

* skip arange expanded too

* GPU=1 is slow too in CI
2024-07-16 14:20:48 +03:00
chenyu
07ff4b7d24 test_failure_33 ast that has UOps.UNMUL after linearize (#5504)
* test_failure_33 ast that has UOps.UNMUL after linearize

* smaller
2024-07-15 22:54:23 -04:00
chenyu
63990705b5 test kernel opts case for 4 local and 4 groups (#5499)
make sure local grouped dim is correct
2024-07-15 20:09:38 -04:00
Edward Wang
9a7d5a148e move colorize_float to helpers.py (#5490)
* add colorize_float to helpers.py

* update references
2024-07-15 11:29:03 -07:00
qazal
ac08f0eb00 reshape rawbufs in test_linearizer (#5492)
* reshape rawbufs in test_linearizer

* fix helper_linearizer_ast
2024-07-15 19:14:38 +03:00
qazal
ae4cb7994e run process replay with DEBUG=0 (#5491)
* process replay with DEBUG=0

* graceful shutdown

* use and
2024-07-15 16:30:57 +03:00
Tobias Fischer
e219103677 Add Pad to Pooling (#5488) 2024-07-14 21:50:20 -07:00
Tobias Fischer
5849130cbb gather negative dim fix (#5486) 2024-07-14 20:20:53 -04:00
qazal
3c378efcb6 process replay docs improvements (#5481)
* minor cleanups

* docs and logs

* shorter

* comma

* s/print/logging.info [run_process_replay]

* use logging.warn

* process name is noise

* revert lowerer change [run_process_replay]
2024-07-15 00:09:28 +03:00
chenyu
613a1dbeed render lidx starting with 0 (#5478)
* render lidx starting with 0

changed from
```
  int gidx0 = gid.x; /* 4096 */
  int lidx4 = lid.x; /* 8 */
  int gidx1 = gid.y; /* 7 */
  int lidx5 = lid.y; /* 8 */
  int gidx2 = gid.z; /* 7 */
  int lidx6 = lid.z; /* 2 */
```
to
```
  int gidx0 = gid.x; /* 4096 */
  int lidx0 = lid.x; /* 8 */
  int gidx1 = gid.y; /* 7 */
  int lidx1 = lid.y; /* 8 */
  int gidx2 = gid.z; /* 7 */
  int lidx2 = lid.z; /* 2 */
```

the existing one started from pre-limited global dims which skip number if there are more than 3 global dims

* don't need start_dim

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-07-14 16:34:04 -04:00
qazal
671779f280 limit process replay diff to ~20% of kernels (#5480)
* render lidx starting with 0

changed from
```
  int gidx0 = gid.x; /* 4096 */
  int lidx4 = lid.x; /* 8 */
  int gidx1 = gid.y; /* 7 */
  int lidx5 = lid.y; /* 8 */
  int gidx2 = gid.z; /* 7 */
  int lidx6 = lid.z; /* 2 */
```
to
```
  int gidx0 = gid.x; /* 4096 */
  int lidx0 = lid.x; /* 8 */
  int gidx1 = gid.y; /* 7 */
  int lidx1 = lid.y; /* 8 */
  int gidx2 = gid.z; /* 7 */
  int lidx2 = lid.z; /* 2 */
```

the existing one started from pre-limited global dims which skip number if there are more than 3 global dims

* don't need start_dim

* add changed

* env var

* more early exit

* simpler?

* Revert "Merge branch 'lidx0' into process_replay_limit"

This reverts commit cbadcfa5e9, reversing
changes made to fc9bf37ee7.

* minor cleanup

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-14 23:10:08 +03:00
chenyu
f8a47608cc test dtype.min and dtype.max (#5479)
compared with np.iinfo for integer dtype
2024-07-14 15:31:37 -04:00
George Hotz
a9f5a764dc make BatchNorm work for 2D and 3D (#5477)
* make BatchNorm work for 2D and 3D

* beautiful mnist shouldn't use BatchNorm2d
2024-07-14 11:39:58 -07:00
chenyu
e41ab66653 use is to compare types (#5476)
new rule in latest ruff
2024-07-14 14:26:41 -04:00
nimlgen
61822d1a14 nv fix timeline signal rollover on copy queue (#5473)
* hotfix: nv rollover to 32bits

* test both queues
2024-07-14 16:06:12 +03:00
nimlgen
8835d6c49a cleanup nv/amd program (#5449)
* cleanup nv/amd program

* fix amd

* a bit cleaner

* ugh, typo

* linter

* fix nv

* tiny thing
2024-07-14 14:08:35 +03:00
qazal
0b3a34e3b1 vectorize folding [run_process_replay] (#5470)
* test_gep_vec_fold

* remove that

* fix process replay

* lint
2024-07-14 09:41:48 +03:00
chenyu
28972418c4 s/get_linearizer/get_kernel [run_process_replay] (#5467) 2024-07-13 20:32:22 -04:00
Francis Lata
0345577032 UNet3D dataloader shared memory fix (#5465)
* create separate SharedMemory between inputs and labels

* update path check for shared mem

* clean up unit test for dataset
2024-07-13 20:26:00 -04:00