Commit Graph

10417 Commits

Author SHA1 Message Date
chenyu
6e405b0a2b add 0d tensor to trunc/floor/ceil/round tests (#5512)
existing trunc test passes backward but its backward is incorrect in general. added tests that would fail
2024-07-16 16:48:25 -04:00
chenyu
0afcbfae84 docs: add Tensor.interpolate to doc page (#5510) 2024-07-16 14:17:19 -04:00
Tobias Fischer
87a2ef2bc2 Add Interpolate Function (#5482)
* add interpolate function

* fixed linter issue

* reduced sizes in test

---------

Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2024-07-16 09:44:01 -07:00
gswangg
203161c75d refactor VECTORIZE/GEP rules (#5507) 2024-07-16 09:41:23 -07:00
qazal
173064c69c (re)start multireduce in codegen/* (#5391)
* test_var_multireduce

* run verify_lazyop

* test_var_multireduce

* assert lazyop

* add test_indexing_multireduce

* arange fuses (crude)

* note: extra reshape

* start readble

* test_arange_simple

* test_arange_expanded

* test_indexing_multireduce

* cleanups

* skip ptx

* skip nv and amd ci

* skip arange expanded too

* GPU=1 is slow too in CI
2024-07-16 14:20:48 +03:00
chenyu
07ff4b7d24 test_failure_33 ast that has UOps.UNMUL after linearize (#5504)
* test_failure_33 ast that has UOps.UNMUL after linearize

* smaller
2024-07-15 22:54:23 -04:00
chenyu
1ccd987e6a simpler tc permaxis in fixup_ast.fix_st [run_process_replay] (#5502) 2024-07-15 21:35:32 -04:00
George Hotz
9d4c3c553c prepare expand to support multiexpand [run_process_replay] (#5503) 2024-07-15 18:21:24 -07:00
chenyu
fd43d33b7d shave some lines from transcend math [run_process_replay] (#5500)
* shave some lines from transcend math [run_process_replay]

* put input_dtype back
2024-07-15 21:02:24 -04:00
chenyu
63990705b5 test kernel opts case for 4 local and 4 groups (#5499)
make sure local grouped dim is correct
2024-07-15 20:09:38 -04:00
Alessandro Benetti
13e200b437 add strict mkdocs check (#5497) 2024-07-15 14:21:37 -07:00
nimlgen
8dfd11c1d8 docs: hcq add types (#5495)
* docs: hcq add types

* linter
2024-07-15 22:14:48 +03:00
George Hotz
aab1e8c6dc uniform init to match torch (#5494) 2024-07-15 12:07:44 -07:00
George Hotz
338b7590b9 hotfix: docs for BatchNorm 2024-07-15 12:04:17 -07:00
nimlgen
c9ec7ce070 start hcq docs (#5411)
* start hcq docs

* more hcq docs

* docs

* docs

* linter

* correct args

* linter

* ts returns int
2024-07-15 21:31:11 +03:00
Edward Wang
9a7d5a148e move colorize_float to helpers.py (#5490)
* add colorize_float to helpers.py

* update references
2024-07-15 11:29:03 -07:00
P4ssenger
a347d91e0e remove outdated thread local aliases (#5493) 2024-07-15 11:28:11 -07:00
qazal
ac08f0eb00 reshape rawbufs in test_linearizer (#5492)
* reshape rawbufs in test_linearizer

* fix helper_linearizer_ast
2024-07-15 19:14:38 +03:00
qazal
ae4cb7994e run process replay with DEBUG=0 (#5491)
* process replay with DEBUG=0

* graceful shutdown

* use and
2024-07-15 16:30:57 +03:00
Tobias Fischer
e219103677 Add Pad to Pooling (#5488) 2024-07-14 21:50:20 -07:00
chenyu
eef43c9f49 include dims in kernel/nv invalid err msg (#5487) 2024-07-14 22:51:30 -04:00
chenyu
c80801c266 len(full_shape)-ki.upcasted -> first_upcasted (#5485)
[run_process_replay]
2024-07-14 20:21:18 -04:00
Tobias Fischer
5849130cbb gather negative dim fix (#5486) 2024-07-14 20:20:53 -04:00
qazal
3c378efcb6 process replay docs improvements (#5481)
* minor cleanups

* docs and logs

* shorter

* comma

* s/print/logging.info [run_process_replay]

* use logging.warn

* process name is noise

* revert lowerer change [run_process_replay]
2024-07-15 00:09:28 +03:00
chenyu
613a1dbeed render lidx starting with 0 (#5478)
* render lidx starting with 0

changed from
```
  int gidx0 = gid.x; /* 4096 */
  int lidx4 = lid.x; /* 8 */
  int gidx1 = gid.y; /* 7 */
  int lidx5 = lid.y; /* 8 */
  int gidx2 = gid.z; /* 7 */
  int lidx6 = lid.z; /* 2 */
```
to
```
  int gidx0 = gid.x; /* 4096 */
  int lidx0 = lid.x; /* 8 */
  int gidx1 = gid.y; /* 7 */
  int lidx1 = lid.y; /* 8 */
  int gidx2 = gid.z; /* 7 */
  int lidx2 = lid.z; /* 2 */
```

the existing one started from pre-limited global dims which skip number if there are more than 3 global dims

* don't need start_dim

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-07-14 16:34:04 -04:00
qazal
671779f280 limit process replay diff to ~20% of kernels (#5480)
* render lidx starting with 0

changed from
```
  int gidx0 = gid.x; /* 4096 */
  int lidx4 = lid.x; /* 8 */
  int gidx1 = gid.y; /* 7 */
  int lidx5 = lid.y; /* 8 */
  int gidx2 = gid.z; /* 7 */
  int lidx6 = lid.z; /* 2 */
```
to
```
  int gidx0 = gid.x; /* 4096 */
  int lidx0 = lid.x; /* 8 */
  int gidx1 = gid.y; /* 7 */
  int lidx1 = lid.y; /* 8 */
  int gidx2 = gid.z; /* 7 */
  int lidx2 = lid.z; /* 2 */
```

the existing one started from pre-limited global dims which skip number if there are more than 3 global dims

* don't need start_dim

* add changed

* env var

* more early exit

* simpler?

* Revert "Merge branch 'lidx0' into process_replay_limit"

This reverts commit cbadcfa5e9, reversing
changes made to fc9bf37ee7.

* minor cleanup

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-14 23:10:08 +03:00
chenyu
f8a47608cc test dtype.min and dtype.max (#5479)
compared with np.iinfo for integer dtype
2024-07-14 15:31:37 -04:00
George Hotz
a9f5a764dc make BatchNorm work for 2D and 3D (#5477)
* make BatchNorm work for 2D and 3D

* beautiful mnist shouldn't use BatchNorm2d
2024-07-14 11:39:58 -07:00
chenyu
e41ab66653 use is to compare types (#5476)
new rule in latest ruff
2024-07-14 14:26:41 -04:00
George Hotz
aade18d20c beautiful_mnist in torch 2024-07-14 11:09:58 -07:00
nimlgen
604fb60143 docs: fix link to jit in env_vars (#5474) 2024-07-14 16:08:16 +03:00
nimlgen
61822d1a14 nv fix timeline signal rollover on copy queue (#5473)
* hotfix: nv rollover to 32bits

* test both queues
2024-07-14 16:06:12 +03:00
nimlgen
8835d6c49a cleanup nv/amd program (#5449)
* cleanup nv/amd program

* fix amd

* a bit cleaner

* ugh, typo

* linter

* fix nv

* tiny thing
2024-07-14 14:08:35 +03:00
qazal
0b3a34e3b1 vectorize folding [run_process_replay] (#5470)
* test_gep_vec_fold

* remove that

* fix process replay

* lint
2024-07-14 09:41:48 +03:00
George Hotz
cdf63e41bf mnist mlx example uses compile to be fair to tinyjit 2024-07-13 18:14:45 -07:00
George Hotz
8940530290 add mlx beautiful_mnist example 2024-07-13 17:55:47 -07:00
chenyu
28972418c4 s/get_linearizer/get_kernel [run_process_replay] (#5467) 2024-07-13 20:32:22 -04:00
Francis Lata
0345577032 UNet3D dataloader shared memory fix (#5465)
* create separate SharedMemory between inputs and labels

* update path check for shared mem

* clean up unit test for dataset
2024-07-13 20:26:00 -04:00
Carson Powers
ef578b4de8 new UOp style patterns [run_process_replay] (#5444)
* express permute srcs in uop

* loop folding / sum collapse pats -> uop style

* UNMUL, const, phi on DEFINE_ACC pats -> uop style

* fix: cvar not const

* DEFINE_ACC w/o inputs, VECTORIZE-PHI-GEP pats -> uop style

* fix VECTORIZE-PHI-GEP pat

* contractor, reducer, float4 pats -> uop style

* arange folding .where

* one more

* revert permute expression in UOp
2024-07-13 17:21:08 -07:00
George Hotz
942c58be90 BEAM_COMPARE=2 validates the correctness of BEAM kernels (#5458)
* beam compare 2

* found issue maybe

* correct, not fail

* full rand

* less numpy

* extra simplify doesn't fix it

* reorder

* no numpy

* check in reverse

* test new tensor behavior

* better error msg
2024-07-13 13:53:43 -07:00
nimlgen
6943ea5f29 nv remove copy_from_cpu command (#5459) 2024-07-13 23:08:49 +03:00
nimlgen
67f70cef02 amd better allocation error messages (#5462)
* amd better allocation error messages

* a bit better
2024-07-13 22:55:09 +03:00
wozeparrot
2427f149a3 threefry as pattern matcher (#5371) 2024-07-13 11:59:03 -07:00
qazal
487ceff825 hotfix: ASSERT_PROCESS_REPLAY sometimes doesn't exist (#5456) 2024-07-13 21:15:40 +03:00
chenyu
de6ab56458 clean up transcend math with uop syntactic sugar [run_process_replay] (#5455)
* clean up transcend math with uop syntactic sugar [run_process_replay]

* that?

* maybe
2024-07-13 14:00:14 -04:00
qazal
40ec9410f9 simpler process replay (#5452)
* remove check_process_replay

* that can go to the top

* add assert back

* [run_process_replay]

* checkout code [run_process_replay]

* temp [run_process_replay]

* revert temp [run_process_replay]

* ahh this is why [run_process_replay]

* revert temp [run_process_replay]
2024-07-13 19:55:06 +03:00
chenyu
d2933d3548 simplify transcend math [run_process_replay] (#5454)
there are some (x - x) in dfadd2_f2_f2_f2, dfmul2_f2_f2_f2, dfdiv2_f2_f2_f2 that were removed by pattern matcher
2024-07-13 12:43:31 -04:00
qazal
23b907efbb restore process replay runs by their id (#5453) 2024-07-13 19:32:34 +03:00
qazal
b8c9298164 verify_lazyop in for WMMA and group_for_reduces (#5448)
* try passing no tc and group for reduces

* minor

* use op.arg

* group_for_reduces
2024-07-13 18:06:19 +03:00
George Hotz
955e1179fb move compile tests and merge (#5451)
* move compile tests and merge

* revert enet move, bump download cache

* oh, try setting clang
2024-07-13 08:04:46 -07:00