George Hotz
10be05aae5
push contract through cast to fix test_float2_acc (try 2) ( #5585 )
...
* push contract through cast to fix test_float2_acc (try 2)
* contract push only on floats
2024-07-19 10:34:43 -07:00
George Hotz
51892c8fac
Revert "push contract through cast to fix test_float2_acc ( #5581 )" ( #5583 )
...
This reverts commit ddda9420be .
2024-07-19 09:44:30 -07:00
George Hotz
ddda9420be
push contract through cast to fix test_float2_acc ( #5581 )
...
* push contract through cast to fix test_float2_acc
* no_vectorized_alu applies to cast too
2024-07-19 09:30:26 -07:00
chenyu
3f590c3b31
some limit_dims to limit global merging ( #5489 )
...
only supports merging dims in a way that does not surpass limit, no splitting yet
2024-07-19 12:17:46 -04:00
George Hotz
0ad87021e2
move acc to end ( #5568 )
...
* move acc to end
* confirmed pictures are the same
* relax that
* Update test_ops.py
2024-07-19 03:06:52 -07:00
George Hotz
2de82b8a5d
remove get_lazyop_info ( #5570 )
...
* don't use get_lazyop_info more
* keep that min
* no ptx for that test
2024-07-19 03:05:33 -07:00
nimlgen
9d7edc9269
hcq rename HCQCompat -> HCQ ( #5577 )
2024-07-19 11:34:17 +03:00
chenyu
2b2f8ad18c
failed example of float2 acc no long applies ( #5573 )
...
* failed example of float2 acc no long applies
* # noqa: E501
2024-07-19 02:40:04 -04:00
qazal
e7a057c20f
retire replay_schedule ( #5563 )
2024-07-18 23:07:02 +03:00
qazal
50aba32ea8
hotfix: don't assert process replay in master. ( #5562 )
...
This is because https://github.com/tinygrad/tinygrad/actions/runs/9996754763/job/27631802686 ran exactly when master changed state, causing the diff to assert.
if [run_process_replay] is green pre merge it's ok.
2024-07-18 22:05:00 +03:00
George Hotz
223d9283ee
fix float4 acc by moving contracts ( #5559 )
2024-07-18 11:30:16 -07:00
chenyu
f5af98c450
failed test case that DEFINE_ACC no long uses float4 ( #5555 )
...
* failed test case that DEFINE_ACC no long uses float4
* line
2024-07-18 10:55:59 -07:00
George Hotz
923e0fe0b8
fix half4 folding ( #5556 )
2024-07-18 10:47:39 -07:00
chenyu
12e6771209
failed test case for unrolled half4 ( #5552 )
2024-07-18 13:05:52 -04:00
George Hotz
d1a7279605
indexing fold with casted bool ( #5551 )
...
* cast bool is where
* universal transform is wrong
2024-07-18 10:02:29 -07:00
kormann
2c4add6844
pretty print lazy op per default ( #5505 )
...
* pretty lop
* min diff
* walrus
* fix
* min diff
* simplify
* pretty helper function
* ws
* pretty uop upat
* tests
* stricter tests
* test passes
* ws
* stronger upat test
* delete print_tree
* min diff
* stricter exp test
* fix merge
* stronger uops eval test
* +readable and deep upat test
* +readable and deep upat test
* sort inv fix
* fix
* revert allowed_len
2024-07-18 09:34:08 -07:00
qazal
0ad1672d5f
fuse indexing (LazyOp creation) ( #5506 )
...
* bring FUSE_AS_ONE_KERNEL back
* operands need reshape?
* fused but arange didnt fold
* something deeply wrong
* yay, fused
* derive broadcasts
* s/input/reduce_input
* _fixup_ones proved a point
* this is what it takes
* down to 3 required reshapes:
1. output_shape
2. the second reduce merge dims
3. remove dims for above reshape
* start real reshapes
* resolve shape in the edges pre lazyop
* outputs are the same shape
* rewrite1: just the reduce
* more correct
* fuse_as_one_kernel
* closer
* this passes
* dont rerun info
* dont need these
* not needed
2024-07-18 14:09:17 +03:00
George Hotz
fa7e734b49
MetaOps.KERNEL ( #5543 )
2024-07-17 19:41:23 -07:00
George Hotz
d3b098299d
add failing regression test for image ( #5540 )
...
* add failing regression test for image
* tg type
* simpler test
* don't realize image to image casts caused issue
* simple pad
2024-07-17 17:27:18 -07:00
qazal
61ee02e93d
start multireduce lowerer work (var/std) ( #5537 )
...
* multireduce no-opts works
* passed test_var_multireduce
* cleanup
* double reduce
* extra check for range_group
* more checking for range_groups
* cleaning up debug prints
* cleanup diff
* linters
* revert kernel changes
* these are uops toposort
---------
Co-authored-by: timmy <timmy0x@proton.me >
2024-07-17 23:43:46 +03:00
Francis Lam
c4eb30a04c
test/test_linearizer_failures: add a new beautiful_mnist one ( #5531 )
...
* test/test_linearizer_failures: add a new beautiful_mnist one
this one is from a DEPTH=2 fuzz_linearizer search
* add GPU to test_failure_40
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-17 16:27:04 -04:00
qazal
0259d76183
use Context only in replaying Kernel [run_process_replay] ( #5535 )
2024-07-18 03:46:14 +08:00
George Hotz
1a68854766
PatternMatcher add ( #5532 )
...
* PatternMatcher add [run_process_replay]
* f4 dynamic
* test_failure_36 is fixed
* fix PTX
2024-07-17 12:44:42 -07:00
qazal
a7706e05f9
option to [skip_process_replay] ( #5533 )
2024-07-17 22:30:46 +03:00
George Hotz
1242b302fa
expand UOps with rewrite rules ( #5501 )
...
* expand UOps with rewrite rules [run_process_replay]
* progress
* much closer
* close, way less bugs
* bunch of expander tests
* fix contract
* ops tests pass
* fix barrier
* mostly passing
* bitcast in expanded ops
* support more expand merges
* all tests pass maybe
* fix empty EXPAND
* fix LIN fuzzing
* add ALL_SAME assert
* all same
* all same work
* raise CompileError
* pass fuzz linearizer
* revert whitespace
* fix nv tensor core test
* fix mypy
* bug fix
* fuzzer passes
* put tests back
* expand arg to idx
2024-07-17 10:17:50 -07:00
George Hotz
158221b36b
expand tests from uop_expander [run_process_replay] ( #5524 )
...
* expand tests from uop_expander
* more changes from the branch
2024-07-17 09:22:36 -07:00
George Hotz
42c25cc961
fix fixup_ast ( #5523 )
...
* fix fixup_ast
* these lin failures are fixed
2024-07-17 08:52:21 -07:00
nimlgen
dcd462860f
elf loader ( #5508 )
...
* elf loader
* cleanup
* cleaner
* cleaner
* fixes
* revert this
* fix div 0
* fix nv
* amd fix
* fix mockgpu
* amd better?
* restore relocs for <12.4
* linter
* this is fixed now
* revert this
* process cdefines as function
* cleaner
* align
* save lines
* revert this change
2024-07-17 17:09:34 +03:00
Francis Lam
2d53abb04a
test/external/fuzz_linearizer: fix for new AST changes ( #5519 )
...
* test/external/fuzz_linearizer: fix for new AST changes
also add beautiful_mnist failures
* add CLANG and LLVM to test_failure_35 failed_platforms
* fix test_linearizer_failure names
2024-07-17 00:08:07 -04:00
chenyu
6e405b0a2b
add 0d tensor to trunc/floor/ceil/round tests ( #5512 )
...
existing trunc test passes backward but its backward is incorrect in general. added tests that would fail
2024-07-16 16:48:25 -04:00
Tobias Fischer
87a2ef2bc2
Add Interpolate Function ( #5482 )
...
* add interpolate function
* fixed linter issue
* reduced sizes in test
---------
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2024-07-16 09:44:01 -07:00
qazal
173064c69c
(re)start multireduce in codegen/* ( #5391 )
...
* test_var_multireduce
* run verify_lazyop
* test_var_multireduce
* assert lazyop
* add test_indexing_multireduce
* arange fuses (crude)
* note: extra reshape
* start readble
* test_arange_simple
* test_arange_expanded
* test_indexing_multireduce
* cleanups
* skip ptx
* skip nv and amd ci
* skip arange expanded too
* GPU=1 is slow too in CI
2024-07-16 14:20:48 +03:00
chenyu
07ff4b7d24
test_failure_33 ast that has UOps.UNMUL after linearize ( #5504 )
...
* test_failure_33 ast that has UOps.UNMUL after linearize
* smaller
2024-07-15 22:54:23 -04:00
chenyu
63990705b5
test kernel opts case for 4 local and 4 groups ( #5499 )
...
make sure local grouped dim is correct
2024-07-15 20:09:38 -04:00
Edward Wang
9a7d5a148e
move colorize_float to helpers.py ( #5490 )
...
* add colorize_float to helpers.py
* update references
2024-07-15 11:29:03 -07:00
qazal
ac08f0eb00
reshape rawbufs in test_linearizer ( #5492 )
...
* reshape rawbufs in test_linearizer
* fix helper_linearizer_ast
2024-07-15 19:14:38 +03:00
qazal
ae4cb7994e
run process replay with DEBUG=0 ( #5491 )
...
* process replay with DEBUG=0
* graceful shutdown
* use and
2024-07-15 16:30:57 +03:00
Tobias Fischer
e219103677
Add Pad to Pooling ( #5488 )
2024-07-14 21:50:20 -07:00
Tobias Fischer
5849130cbb
gather negative dim fix ( #5486 )
2024-07-14 20:20:53 -04:00
qazal
3c378efcb6
process replay docs improvements ( #5481 )
...
* minor cleanups
* docs and logs
* shorter
* comma
* s/print/logging.info [run_process_replay]
* use logging.warn
* process name is noise
* revert lowerer change [run_process_replay]
2024-07-15 00:09:28 +03:00
chenyu
613a1dbeed
render lidx starting with 0 ( #5478 )
...
* render lidx starting with 0
changed from
```
int gidx0 = gid.x; /* 4096 */
int lidx4 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx5 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx6 = lid.z; /* 2 */
```
to
```
int gidx0 = gid.x; /* 4096 */
int lidx0 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx1 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx2 = lid.z; /* 2 */
```
the existing one started from pre-limited global dims which skip number if there are more than 3 global dims
* don't need start_dim
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-07-14 16:34:04 -04:00
qazal
671779f280
limit process replay diff to ~20% of kernels ( #5480 )
...
* render lidx starting with 0
changed from
```
int gidx0 = gid.x; /* 4096 */
int lidx4 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx5 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx6 = lid.z; /* 2 */
```
to
```
int gidx0 = gid.x; /* 4096 */
int lidx0 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx1 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx2 = lid.z; /* 2 */
```
the existing one started from pre-limited global dims which skip number if there are more than 3 global dims
* don't need start_dim
* add changed
* env var
* more early exit
* simpler?
* Revert "Merge branch 'lidx0' into process_replay_limit"
This reverts commit cbadcfa5e9 , reversing
changes made to fc9bf37ee7 .
* minor cleanup
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-14 23:10:08 +03:00
chenyu
f8a47608cc
test dtype.min and dtype.max ( #5479 )
...
compared with np.iinfo for integer dtype
2024-07-14 15:31:37 -04:00
George Hotz
a9f5a764dc
make BatchNorm work for 2D and 3D ( #5477 )
...
* make BatchNorm work for 2D and 3D
* beautiful mnist shouldn't use BatchNorm2d
2024-07-14 11:39:58 -07:00
chenyu
e41ab66653
use is to compare types ( #5476 )
...
new rule in latest ruff
2024-07-14 14:26:41 -04:00
nimlgen
61822d1a14
nv fix timeline signal rollover on copy queue ( #5473 )
...
* hotfix: nv rollover to 32bits
* test both queues
2024-07-14 16:06:12 +03:00
nimlgen
8835d6c49a
cleanup nv/amd program ( #5449 )
...
* cleanup nv/amd program
* fix amd
* a bit cleaner
* ugh, typo
* linter
* fix nv
* tiny thing
2024-07-14 14:08:35 +03:00
qazal
0b3a34e3b1
vectorize folding [run_process_replay] ( #5470 )
...
* test_gep_vec_fold
* remove that
* fix process replay
* lint
2024-07-14 09:41:48 +03:00
chenyu
28972418c4
s/get_linearizer/get_kernel [run_process_replay] ( #5467 )
2024-07-13 20:32:22 -04:00
Francis Lata
0345577032
UNet3D dataloader shared memory fix ( #5465 )
...
* create separate SharedMemory between inputs and labels
* update path check for shared mem
* clean up unit test for dataset
2024-07-13 20:26:00 -04:00