George Hotz
1242b302fa
expand UOps with rewrite rules ( #5501 )
...
* expand UOps with rewrite rules [run_process_replay]
* progress
* much closer
* close, way less bugs
* bunch of expander tests
* fix contract
* ops tests pass
* fix barrier
* mostly passing
* bitcast in expanded ops
* support more expand merges
* all tests pass maybe
* fix empty EXPAND
* fix LIN fuzzing
* add ALL_SAME assert
* all same
* all same work
* raise CompileError
* pass fuzz linearizer
* revert whitespace
* fix nv tensor core test
* fix mypy
* bug fix
* fuzzer passes
* put tests back
* expand arg to idx
2024-07-17 10:17:50 -07:00
George Hotz
158221b36b
expand tests from uop_expander [run_process_replay] ( #5524 )
...
* expand tests from uop_expander
* more changes from the branch
2024-07-17 09:22:36 -07:00
George Hotz
42c25cc961
fix fixup_ast ( #5523 )
...
* fix fixup_ast
* these lin failures are fixed
2024-07-17 08:52:21 -07:00
qazal
fbe0233be3
infra for multi reduce asts ( #5522 )
...
* add reduce_info
* _recurse_reduceops base
* derive output shape
* refactor
* delete reduce_for_op
* save lines
* more line saving
2024-07-17 17:23:46 +03:00
nimlgen
dcd462860f
elf loader ( #5508 )
...
* elf loader
* cleanup
* cleaner
* cleaner
* fixes
* revert this
* fix div 0
* fix nv
* amd fix
* fix mockgpu
* amd better?
* restore relocs for <12.4
* linter
* this is fixed now
* revert this
* process cdefines as function
* cleaner
* align
* save lines
* revert this change
2024-07-17 17:09:34 +03:00
nimlgen
661da32aff
nv do not map regions twice ( #5521 )
2024-07-17 11:20:02 +03:00
Francis Lam
2d53abb04a
test/external/fuzz_linearizer: fix for new AST changes ( #5519 )
...
* test/external/fuzz_linearizer: fix for new AST changes
also add beautiful_mnist failures
* add CLANG and LLVM to test_failure_35 failed_platforms
* fix test_linearizer_failure names
2024-07-17 00:08:07 -04:00
Tobias Fischer
85d4ca7caa
FID Inception Model ( #5516 )
...
* added model impl
* minor cleanups
* extracted weights loading into from_pretrained
* reorganized model for better weight loading
* removed lru cache for state dict loading
2024-07-16 23:12:03 -04:00
chenyu
4ad83d032e
remove Kernel.lazyops [run_process_replay] ( #5517 )
...
always use Kernel.ast.lazyops
2024-07-16 19:47:42 -04:00
wozeparrot
1c1d6d3a4a
feat: show caller when tracemeta >= 2 ( #5514 )
2024-07-16 15:06:02 -07:00
chenyu
5aad043522
cleanup fixup_ast local shape long line [run_process_replay] ( #5513 )
2024-07-16 17:29:38 -04:00
chenyu
6e405b0a2b
add 0d tensor to trunc/floor/ceil/round tests ( #5512 )
...
existing trunc test passes backward but its backward is incorrect in general. added tests that would fail
2024-07-16 16:48:25 -04:00
chenyu
0afcbfae84
docs: add Tensor.interpolate to doc page ( #5510 )
2024-07-16 14:17:19 -04:00
Tobias Fischer
87a2ef2bc2
Add Interpolate Function ( #5482 )
...
* add interpolate function
* fixed linter issue
* reduced sizes in test
---------
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2024-07-16 09:44:01 -07:00
gswangg
203161c75d
refactor VECTORIZE/GEP rules ( #5507 )
2024-07-16 09:41:23 -07:00
qazal
173064c69c
(re)start multireduce in codegen/* ( #5391 )
...
* test_var_multireduce
* run verify_lazyop
* test_var_multireduce
* assert lazyop
* add test_indexing_multireduce
* arange fuses (crude)
* note: extra reshape
* start readble
* test_arange_simple
* test_arange_expanded
* test_indexing_multireduce
* cleanups
* skip ptx
* skip nv and amd ci
* skip arange expanded too
* GPU=1 is slow too in CI
2024-07-16 14:20:48 +03:00
chenyu
07ff4b7d24
test_failure_33 ast that has UOps.UNMUL after linearize ( #5504 )
...
* test_failure_33 ast that has UOps.UNMUL after linearize
* smaller
2024-07-15 22:54:23 -04:00
chenyu
1ccd987e6a
simpler tc permaxis in fixup_ast.fix_st [run_process_replay] ( #5502 )
2024-07-15 21:35:32 -04:00
George Hotz
9d4c3c553c
prepare expand to support multiexpand [run_process_replay] ( #5503 )
2024-07-15 18:21:24 -07:00
chenyu
fd43d33b7d
shave some lines from transcend math [run_process_replay] ( #5500 )
...
* shave some lines from transcend math [run_process_replay]
* put input_dtype back
2024-07-15 21:02:24 -04:00
chenyu
63990705b5
test kernel opts case for 4 local and 4 groups ( #5499 )
...
make sure local grouped dim is correct
2024-07-15 20:09:38 -04:00
Alessandro Benetti
13e200b437
add strict mkdocs check ( #5497 )
2024-07-15 14:21:37 -07:00
nimlgen
8dfd11c1d8
docs: hcq add types ( #5495 )
...
* docs: hcq add types
* linter
2024-07-15 22:14:48 +03:00
George Hotz
aab1e8c6dc
uniform init to match torch ( #5494 )
2024-07-15 12:07:44 -07:00
George Hotz
338b7590b9
hotfix: docs for BatchNorm
2024-07-15 12:04:17 -07:00
nimlgen
c9ec7ce070
start hcq docs ( #5411 )
...
* start hcq docs
* more hcq docs
* docs
* docs
* linter
* correct args
* linter
* ts returns int
2024-07-15 21:31:11 +03:00
Edward Wang
9a7d5a148e
move colorize_float to helpers.py ( #5490 )
...
* add colorize_float to helpers.py
* update references
2024-07-15 11:29:03 -07:00
P4ssenger
a347d91e0e
remove outdated thread local aliases ( #5493 )
2024-07-15 11:28:11 -07:00
qazal
ac08f0eb00
reshape rawbufs in test_linearizer ( #5492 )
...
* reshape rawbufs in test_linearizer
* fix helper_linearizer_ast
2024-07-15 19:14:38 +03:00
qazal
ae4cb7994e
run process replay with DEBUG=0 ( #5491 )
...
* process replay with DEBUG=0
* graceful shutdown
* use and
2024-07-15 16:30:57 +03:00
Tobias Fischer
e219103677
Add Pad to Pooling ( #5488 )
2024-07-14 21:50:20 -07:00
chenyu
eef43c9f49
include dims in kernel/nv invalid err msg ( #5487 )
2024-07-14 22:51:30 -04:00
chenyu
c80801c266
len(full_shape)-ki.upcasted -> first_upcasted ( #5485 )
...
[run_process_replay]
2024-07-14 20:21:18 -04:00
Tobias Fischer
5849130cbb
gather negative dim fix ( #5486 )
2024-07-14 20:20:53 -04:00
qazal
3c378efcb6
process replay docs improvements ( #5481 )
...
* minor cleanups
* docs and logs
* shorter
* comma
* s/print/logging.info [run_process_replay]
* use logging.warn
* process name is noise
* revert lowerer change [run_process_replay]
2024-07-15 00:09:28 +03:00
chenyu
613a1dbeed
render lidx starting with 0 ( #5478 )
...
* render lidx starting with 0
changed from
```
int gidx0 = gid.x; /* 4096 */
int lidx4 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx5 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx6 = lid.z; /* 2 */
```
to
```
int gidx0 = gid.x; /* 4096 */
int lidx0 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx1 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx2 = lid.z; /* 2 */
```
the existing one started from pre-limited global dims which skip number if there are more than 3 global dims
* don't need start_dim
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-07-14 16:34:04 -04:00
qazal
671779f280
limit process replay diff to ~20% of kernels ( #5480 )
...
* render lidx starting with 0
changed from
```
int gidx0 = gid.x; /* 4096 */
int lidx4 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx5 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx6 = lid.z; /* 2 */
```
to
```
int gidx0 = gid.x; /* 4096 */
int lidx0 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx1 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx2 = lid.z; /* 2 */
```
the existing one started from pre-limited global dims which skip number if there are more than 3 global dims
* don't need start_dim
* add changed
* env var
* more early exit
* simpler?
* Revert "Merge branch 'lidx0' into process_replay_limit"
This reverts commit cbadcfa5e9 , reversing
changes made to fc9bf37ee7 .
* minor cleanup
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-14 23:10:08 +03:00
chenyu
f8a47608cc
test dtype.min and dtype.max ( #5479 )
...
compared with np.iinfo for integer dtype
2024-07-14 15:31:37 -04:00
George Hotz
a9f5a764dc
make BatchNorm work for 2D and 3D ( #5477 )
...
* make BatchNorm work for 2D and 3D
* beautiful mnist shouldn't use BatchNorm2d
2024-07-14 11:39:58 -07:00
chenyu
e41ab66653
use is to compare types ( #5476 )
...
new rule in latest ruff
2024-07-14 14:26:41 -04:00
George Hotz
aade18d20c
beautiful_mnist in torch
2024-07-14 11:09:58 -07:00
nimlgen
604fb60143
docs: fix link to jit in env_vars ( #5474 )
2024-07-14 16:08:16 +03:00
nimlgen
61822d1a14
nv fix timeline signal rollover on copy queue ( #5473 )
...
* hotfix: nv rollover to 32bits
* test both queues
2024-07-14 16:06:12 +03:00
nimlgen
8835d6c49a
cleanup nv/amd program ( #5449 )
...
* cleanup nv/amd program
* fix amd
* a bit cleaner
* ugh, typo
* linter
* fix nv
* tiny thing
2024-07-14 14:08:35 +03:00
qazal
0b3a34e3b1
vectorize folding [run_process_replay] ( #5470 )
...
* test_gep_vec_fold
* remove that
* fix process replay
* lint
2024-07-14 09:41:48 +03:00
George Hotz
cdf63e41bf
mnist mlx example uses compile to be fair to tinyjit
2024-07-13 18:14:45 -07:00
George Hotz
8940530290
add mlx beautiful_mnist example
2024-07-13 17:55:47 -07:00
chenyu
28972418c4
s/get_linearizer/get_kernel [run_process_replay] ( #5467 )
2024-07-13 20:32:22 -04:00
Francis Lata
0345577032
UNet3D dataloader shared memory fix ( #5465 )
...
* create separate SharedMemory between inputs and labels
* update path check for shared mem
* clean up unit test for dataset
2024-07-13 20:26:00 -04:00
Carson Powers
ef578b4de8
new UOp style patterns [run_process_replay] ( #5444 )
...
* express permute srcs in uop
* loop folding / sum collapse pats -> uop style
* UNMUL, const, phi on DEFINE_ACC pats -> uop style
* fix: cvar not const
* DEFINE_ACC w/o inputs, VECTORIZE-PHI-GEP pats -> uop style
* fix VECTORIZE-PHI-GEP pat
* contractor, reducer, float4 pats -> uop style
* arange folding .where
* one more
* revert permute expression in UOp
2024-07-13 17:21:08 -07:00