Commit Graph

1363 Commits

Author SHA1 Message Date
Park Jun
c3ad7b2a84 create randperm and support pytorch backend (#10019) 2025-04-24 07:29:02 -04:00
Matthew Daiter
b545338e59 isin_Tensor_out added (#10018) 2025-04-24 07:26:51 -04:00
nimlgen
1c5e353249 am: use mmio iface (#10012)
* am: use mmio iface

* linters

* fixes

* fixes + cleanups

* mute

* mypy

* style
2025-04-24 00:27:04 +03:00
Francis Lata
defa1e77f6 get the proper dataset count (#9962) 2025-04-21 12:11:37 -04:00
Francis Lata
d7e247f329 RetinaNet INITMLPERF support (#9950)
* fixes to make fake data work

* fix eval beam

* fix merge issue
2025-04-21 10:32:05 -04:00
akhuntsaria
2d423e6737 fix assertion message for supported device in export_model (#9957) 2025-04-21 09:23:44 -04:00
qazal
e20ef7196a Tensor.kernelize (#9845)
* add kernelize

* remove that

* kernelize returns self

* update abstractions2.py

* kernelize in test_schedule

* temp: assert BUFFER_VIEW's existence

* ASSIGN must have a buffer or subbuffer target

* assert and shrink

* fix

* padded setitem

* var

* toposort once

* extra

* base_buffer

* end with BUFFER_VIEW

* setitem for disk

* test_setitem_becomes_subbuffer

* mul slice test

* torch backend fix 1

* non-deterministic

* keep subbuffer
2025-04-20 20:53:49 +08:00
chenyu
6c30948df6 hand_coded_optimizations returns list[Opt] [pr] (#9938)
new api looks like `k.apply_opts(hand_coded_optimizations(k))`
2025-04-19 20:26:59 -04:00
chenyu
720f20865b remove required_optimizations (#9848) 2025-04-19 16:51:16 -04:00
qazal
16dfe0a902 upstream remu (#9921) 2025-04-18 01:57:36 +03:00
chenyu
f5256e0020 Kernel.apply_opts [pr] (#9917)
* Kernel.apply_opts [pr]

updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization

* not you yet
2025-04-17 08:00:56 -04:00
Xingyu
047c8fd70d Add amax support to Tensor operations in Torch Backend (#9905)
* Add amax support to Tensor operations
- Implemented amax function in backend.py for tensor max operations.
- Added unit tests for amax in test.py to ensure correct functionality.

* Fix formatting in amax output function
- Adjusted spacing in the amax output lambda function in backend.py
- Improved code readability for better maintenance
2025-04-16 10:35:50 +01:00
geohotstan
4e8f25109a Revert "ONNX add output shape validation (#9720)" (#9904)
This reverts commit ac713e04db.
2025-04-16 03:15:56 -04:00
nimlgen
7c466c24f7 am_smi: refactor to support arches (#9864)
* am_smi: refactor to support arches

* shorter
2025-04-12 20:37:01 +03:00
chenyu
8c6299bced move hand_coded_optimizations to heuristic.py [pr] (#9844)
* move hand_coded_optimizations to heuristic.py [pr]

also folded all long lines

* make a copy and rename self -> k

* fix test
2025-04-10 23:40:16 -04:00
Francis Lata
eb2e59db42 RetinaNet model type annotations and loss functions (#9822)
* add type annotations and loss functions for training

* combine sum of multiple dims inside loss functions
2025-04-10 00:31:37 -04:00
Francis Lata
7bb36d71b2 remove openimages iterate (#9820) 2025-04-09 22:54:12 -04:00
chenyu
c5db5b83b9 add SHOULD_USE_TC=1 check to simple_matmul (#9802)
* add SHOULD_USE_TC=1 check to simple_matmul

also zero centered the random input and update atol for tf32

* ATOL=2e-2 for HALF
2025-04-09 02:24:42 -04:00
George Hotz
78caf55154 Revert "FP8 support on NVIDIA (#8631)"
This reverts commit 2c8e4ea865.
2025-04-09 12:27:41 +08:00
George Hotz
14928fecff Revert "fix TF32 tensor core dropped in tc_sm89 (#9798)"
This reverts commit 7c9a96824f.
2025-04-09 12:27:39 +08:00
chenyu
7c9a96824f fix TF32 tensor core dropped in tc_sm89 (#9798)
also add `SHOULD_USE_TC=1` to verify TC is applied in simple_matmul
2025-04-08 23:20:50 -04:00
pkotzbach
2c8e4ea865 FP8 support on NVIDIA (#8631)
* squashed fp8 commits

* tensorcore start

* minor changes

* pre-commit

* pylint

* Delete fp8mul.cu

* clean

* small bugfix

* fix test_dtype

* fix test_dtype_alu

* add EMULATE_CUDA_SM89

* fix ci

* fix test_linearizer

* fix test_linearizer

* fix swizzle

* add debug to simple_matmul

* fixed swizzle

* python emulator

* refactor python emulator

* setup fix

* numpy setup

* ml_dtypes only in emulate_cuda_sm89

* fix pylint

* fix tests

* fix mypy

* fix mypy

* fix ruff

* done python emulator

* add acc type

* tests

* mypy

* clean code

* add cuda tensor core tests to CI

* minor fix

* clean test_dtype.py

* clean cstyle.py

* clean test_ops.py

* fix test

* fix test

* whitespaces

* pylint

* pylint

* amd?

* amd?

* amd

* reduce lines

* mockgpu remove

* fix

* ruff

* ruff

* fix mypy

* ruff

* test only for cuda

* fixed formatting

* small fixes

* small fix

* least_upper_dtype if fp8s not supported

* log and reciprocal are supported for fp8s

* ops python fixes

* dtypes.fp8s use

* e4m3 + e5m2 result dtype test

* truncate linter fix

---------

Co-authored-by: pkotzbach <pawkotz@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-04-08 21:54:04 -04:00
Francis Lata
f8fe15e64e move BoxCoder to mlperf helpers (#9773) 2025-04-07 20:27:06 -04:00
Francis Lata
71b8890dd6 use validation dataloader inside retinanet eval (#9747) 2025-04-05 16:46:55 -04:00
geohotstan
ac713e04db ONNX add output shape validation (#9720)
* add output shape validation and remove support for sequence_type

* nit better err msg

* add sequence_type back

* improve err msg

* Revert "improve err msg"

This reverts commit dc9eaea4bb.

* Revert "add sequence_type back"

This reverts commit 288170b2d9.

* do explicit shape equality

* small nit
2025-04-03 05:44:53 -04:00
chenyu
7dadbf3697 insert float() in bert acc (#9726)
sum of bool by default uses default_float for acc. So without float, it might overflow with a large BS and default_float=HALF.

fixed clsf_accuracy to not be inf in mi300x bert
2025-04-03 05:44:09 -04:00
George Hotz
5c7b549eab use functools.cache instead of lru_cache(None) [pr] (#9714)
* use functools.cache instead of lru_cache(None) [pr]

* more cache
2025-04-03 11:47:13 +08:00
geohotstan
e1d7e47cca fix ONNX IsInf unintended dtype promotion (#9711)
* add IsInf

* add corresponding test

* that float16 is kinda silly
2025-04-02 22:46:15 -04:00
George Hotz
f72a87fd0e add proper support for Ops.IGNORE to remove store masks (#9692)
* add proper support for Ops.IGNORE to remove store masks

* remove useless NHWC

* revert that
2025-04-02 16:38:01 +08:00
George Hotz
6f812d3f2f fixes from the dsp branch + 12500 lines (#9683)
* fixes from the dsp branch

* more changes

* those are gep pushing
2025-04-02 13:07:17 +08:00
qazal
eee0dcc37a merge viz back into one file (#9672)
* merge viz back into one file

* work

* rename lib to js directory

* fix diff

* less indenting

* memory graph is back

* viz_sz.py
2025-04-01 19:52:02 +08:00
nimlgen
3e2f42c2e8 autogen: remove am headers from extra (#9666) 2025-04-01 14:45:30 +07:00
Anish Umale
a1ee4d587f Fix test_ops for tiny backend (#9302)
* fix some tests in test_ops for torch backend(171 failing)

* fix more tests (135 failures)

* fix tests (126 failing)

* handle transposed convs (109 tests failing)

* fix slice

* fix lshift & rshift and more tests (87 tests failing)

* revert accidental change

* remove unnecessary changes (82 failures)

* fix backward for avg_pool2d (78 failures)

* fix backward for avg_pool2d (78 failures)

* fix replication backpass

* fix reflection pad back pass (71 failures)

* cummax with indicies, aten.mv and move out methods (67 failures)

* extract avg_pool2d and avg_pool3d to separate functions (62 failures)

* revert changes for cat_out

* rewrite avg_pool and pad without repetition

* remove duplicates from decomps

* slice rewrite and add slice_backward (59 failures)

* add dtype fixup from https://github.com/tinygrad/tinygrad/pull/9297

* fix linter error and remove Tensor.pad (48 failures)

* add select_backward and index_put (40 failures)

* fix some more tests (36 failures)

* fix more tests (12 failures)

* some cleanups and fix couple more tests (10 failures)

* cleaner way to write upsample

* some more upsample cleanups

* use lambda for upsample

* add autowrapper for upsample forward

* cumsum and max_dim without aten functions

* revert _log_softmax

* fix more tests (1 failure)

* make linter happy

* move import to appropriate func

* make linter happy

* add codes for noqa

* some more refactors

* remove comment

* remove dependency on aten function for conv backward

* some more refactors

* add returns

* revert a change from merge

* some cleanups

* remove whitespace

* remove ruff change

* revert upsample

* add masked_fill_.Tensor and scatter.src_out

* add todo

* fix test_biased_conv2d

* fix test_var_one_in_axis & test_std_one_in_axis but break test_biased_conv2d :(

* revert torch_debug

* revert torch_debug

* skip test_gather_failure for the tiny backend

* make padding registration more consise

* add nonzero

* remove scatter_add since we already have the out

* fix scatter

* remove some repetition

* make upsample backward registrations more concise

* remove select.int

* use Tensor.cumsum

* realize conv2d outputs before backward to fix test_biased_conv2d

* add a todo for realize(1 failure)

* add new_empty and new_empty_strided

* make test_pad_circular_mode forward only and remove redundant stuff

* fix linter errors

* remove expect failure

* just tb

* slice is a view_op

* contiguous only when lazydata.is_realized

* fix backward for test_pad_circular_mode

* revert torch.nn.functional.pad override

* add transpose.int and make constant_pad_nd contiguous

* slice_backwards has no kwargs

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-31 21:13:09 -04:00
Priyank Patel
e2d9322d21 torch backend: partial fix for strided related test fails (#9642)
* partial fix for strided related test fails

* cleanup

* fix lint
2025-03-31 05:45:18 -04:00
George Hotz
49b1c46d16 good changes from the dsp branch (#9638) 2025-03-31 13:02:53 +08:00
Yvon Manzi
6652003839 Add cumprod to Tensor (#9629)
* probably how cumprod should look like

* update _cumalu to work with MUL

* shorter

* cumprod testing

* clean

* more cleanup

* add cumprod to torch backend.

* make it look like cumsum

* mypy fix

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-30 21:49:18 -04:00
geohotstan
d52e91db7b ONNX ops clean ups (#9622)
* combine work from remove numpy and onnx ops tests

* clippy

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-30 21:39:22 -04:00
uuuvn
2a4247b8c2 RDNA 3.5 support (#9627) 2025-03-31 01:15:20 +08:00
geohotstan
a08b07b4da Bump onnx==1.17.0 (#9618)
* bump

* remove resize tf_crop_and_resize

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-30 03:21:51 -04:00
nimlgen
54e1e59b44 am: rdna 4 support (#9621)
* hm

* fix

* return this

* fine

* g

* ruff

* fix
2025-03-29 23:16:27 +07:00
uuuvn
5908b89f71 MI300X support (WIP) (#9585) 2025-03-29 19:46:42 +08:00
uuuvn
dd9aae02c3 Refactor ops_amd.py (MI300X prereq) (#9428) 2025-03-29 00:17:20 +07:00
George Hotz
1e6e75e39a little changes from dsp branch (#9582)
* little changes from dsp branch

* not that one

* need the where

* Revert "need the where"

This reverts commit 140f89c878.
2025-03-26 20:01:21 +08:00
Andrey
7b865ed03d use tuple in isinstance for type checking (#9583) 2025-03-26 19:36:48 +08:00
nimlgen
4cf2b68ca8 am_smi: fix init for newer versions (#9559) 2025-03-25 23:48:05 +07:00
Priyank Patel
4f5e03bd60 better fix inplace detach (#9557) 2025-03-24 22:50:28 +08:00
George Hotz
74d98eafb8 add onnx frontend stub [pr] (#9558) 2025-03-24 12:24:34 +08:00
geohotstan
309afa20b7 add Tensor.max_unpool2d (#9518)
* why does max_unpool2d feel slower than out.gradient ...

* slightly cleaner

* what happened to ruff

* need to think about this some more

* slightly faster now?

* clean up, 1 more failing edge case

* ok good

* working TINY_BACKEND

* nit doc wording

* retry CI
2025-03-22 12:11:33 -04:00
Francis Lata
eb95825eea RetinaNet dataloader (#9442)
* retinanet dataloader

* remove batch_size from generate_anchors

* refactor kits19 dataset tests

* add tests for dataloader

* fix testing setup and cleanups

* remove unused import
2025-03-21 13:36:41 -04:00
George Hotz
8e555c586c switch quantization to unsigned/unsigned + add Ops.REDUCE (#9527)
* switch quantization to unsigned/unsigned + add Ops.REDUCE

* tests

* nhwc + replay pkl
2025-03-21 17:02:37 +08:00