Commit Graph

870 Commits

Author SHA1 Message Date
nimlgen
1c5e353249 am: use mmio iface (#10012)
* am: use mmio iface

* linters

* fixes

* fixes + cleanups

* mute

* mypy

* style
2025-04-24 00:27:04 +03:00
George Hotz
2ed3acd767 toposort is a function [pr] (#10004) 2025-04-23 16:25:03 +01:00
George Hotz
71ecc7fa1a use a pattern matcher for upcast [pr] (#10000) 2025-04-23 14:24:23 +01:00
George Hotz
cc1087d2ec move simplify into views_to_indexed_uops (#9999)
* move simplify into views_to_indexed_uops

* cache that
2025-04-23 13:50:27 +01:00
George Hotz
d1f6701eb7 hotfix: lower amd threshold + improve block reorder test 2025-04-22 20:44:29 +01:00
qazal
1d90be2cff match kernelize API in process replay (#9948) 2025-04-21 05:23:41 +08:00
chenyu
6c30948df6 hand_coded_optimizations returns list[Opt] [pr] (#9938)
new api looks like `k.apply_opts(hand_coded_optimizations(k))`
2025-04-19 20:26:59 -04:00
chenyu
720f20865b remove required_optimizations (#9848) 2025-04-19 16:51:16 -04:00
qazal
b58decac0c fix diamond assigns before mapping tensors UOps to assigns (#9855)
* keep tensor_map until diamond assign fixup

* ctx
2025-04-18 14:17:43 +03:00
George Hotz
aa98aff4cd don't use ops name, just keep sink (#9922)
* don't use ops name, just keep sink

* fix test

* endif sink
2025-04-18 08:59:18 +01:00
chenyu
f5256e0020 Kernel.apply_opts [pr] (#9917)
* Kernel.apply_opts [pr]

updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization

* not you yet
2025-04-17 08:00:56 -04:00
geohotstan
4e8f25109a Revert "ONNX add output shape validation (#9720)" (#9904)
This reverts commit ac713e04db.
2025-04-16 03:15:56 -04:00
nimlgen
83ae83d871 compare amd and am to cpu as well (#9896) 2025-04-15 13:32:18 +03:00
nimlgen
23a95dd84d script to compare amd and am kerns (#9889)
* script to compare amd and am kerns

* tool

* is it used???
2025-04-15 00:11:22 +03:00
qazal
e201bc3e93 process replay kernel asts in toposort order [pr] (#9869)
* process replay kernel asts in toposort order [pr]

* use HEAD replay
2025-04-13 17:20:34 +08:00
Alexey Zaytsev
7dda6aae7d Skip CLOUD in external_test_example (#9857)
Closes #9814
2025-04-12 10:17:44 +08:00
chenyu
8c6299bced move hand_coded_optimizations to heuristic.py [pr] (#9844)
* move hand_coded_optimizations to heuristic.py [pr]

also folded all long lines

* make a copy and rename self -> k

* fix test
2025-04-10 23:40:16 -04:00
qazal
fbc6aa53d4 script for local process_replay + fix viz name [pr] (#9837) 2025-04-11 00:39:18 +08:00
qazal
16afe04f45 move process replay to grouper (#9830)
* simpler

* sched
2025-04-10 18:27:42 +08:00
chenyu
c462162db8 update benchmark bert scripts with BS and ACC_DTYPE (#9826)
BS=16, ACC_DTYPE=half for tinybox, BS=128, ACC_DTYPE=float for mi300x
2025-04-10 02:06:02 -04:00
George Hotz
fefee5d3ab single kernel softmax (#9776)
* real single kernel softmax

* cleanup

* fix blockend insertion

* add to bert test
2025-04-08 12:35:48 +08:00
George Hotz
db22094d35 hotfix: update softmax fusion test 2025-04-08 11:23:19 +08:00
Sieds Lykles
07d1aefaf4 fast idiv (#9755)
* fast idiv with tests and fuzzer

* Add todo comment

* Add env variable to toggle fast_idiv

* Move env check

* Add fuzz fast_idiv to ci

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-04-07 08:32:24 -04:00
chenyu
b190d85ad7 benchmark script bert softmax (#9759) 2025-04-07 00:31:18 -04:00
chenyu
43e4565148 weighted linear in external_benchmark_bert_matmuls (#9757)
include the linear to get qkv, and permute so that stride matches with the real run
2025-04-06 23:35:42 -04:00
chenyu
8a585dc5c1 benchmark script for matmuls in bert (#9752)
2 main matmuls in the bert layers. getting these to be fast makes bert fast
2025-04-06 19:34:25 +08:00
George Hotz
926b0bcc57 cache folded upcast [pr] (#9733) 2025-04-04 11:23:19 +08:00
geohotstan
ac713e04db ONNX add output shape validation (#9720)
* add output shape validation and remove support for sequence_type

* nit better err msg

* add sequence_type back

* improve err msg

* Revert "improve err msg"

This reverts commit dc9eaea4bb.

* Revert "add sequence_type back"

This reverts commit 288170b2d9.

* do explicit shape equality

* small nit
2025-04-03 05:44:53 -04:00
George Hotz
49dafe6d43 add gc tests [pr] (#9718)
* add gc tests [pr]

* del

* more gc tests

* add NullGraph
2025-04-03 14:08:32 +08:00
geohotstan
e1d7e47cca fix ONNX IsInf unintended dtype promotion (#9711)
* add IsInf

* add corresponding test

* that float16 is kinda silly
2025-04-02 22:46:15 -04:00
qazal
bb94f13e58 add RECORD_TRACEBACKS=1 option to process replay (#9679)
* add RECORD_TRACEBACKS=1 option to process replay

* stack
2025-04-02 11:58:27 +08:00
chenyu
c672716b38 improve vmin/vmax for IDIV (#9678) 2025-04-01 23:16:01 -04:00
geohotstan
d52e91db7b ONNX ops clean ups (#9622)
* combine work from remove numpy and onnx ops tests

* clippy

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-30 21:39:22 -04:00
geohotstan
a08b07b4da Bump onnx==1.17.0 (#9618)
* bump

* remove resize tf_crop_and_resize

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-30 03:21:51 -04:00
nimlgen
54e1e59b44 am: rdna 4 support (#9621)
* hm

* fix

* return this

* fine

* g

* ruff

* fix
2025-03-29 23:16:27 +07:00
nimlgen
118bd1cbed hotfix: amd imports (#9620) 2025-03-29 20:19:53 +07:00
George Hotz
9115ce8860 linearizer fixups from DSP branch (#9581) 2025-03-26 18:28:15 +08:00
George Hotz
74d98eafb8 add onnx frontend stub [pr] (#9558) 2025-03-24 12:24:34 +08:00
nimlgen
d5667419af am: move out pte creation logic (#9548)
* am: move out pte creation logic

* emu

* ops
2025-03-23 18:29:10 +07:00
geohotstan
309afa20b7 add Tensor.max_unpool2d (#9518)
* why does max_unpool2d feel slower than out.gradient ...

* slightly cleaner

* what happened to ruff

* need to think about this some more

* slightly faster now?

* clean up, 1 more failing edge case

* ok good

* working TINY_BACKEND

* nit doc wording

* retry CI
2025-03-22 12:11:33 -04:00
Francis Lata
1a1087e3a0 cleanups on losses and dataset tests (#9538) 2025-03-21 17:03:18 -04:00
Francis Lata
8cbe4009fc RetinaNet losses (#9536)
* add sigmoid_focal_loss and l1_loss

* update ref implementation comment
2025-03-21 15:52:54 -04:00
Francis Lata
e6389184c5 update comment for retinanet dataloader implementations (#9534)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-21 15:07:45 -04:00
Francis Lata
eb95825eea RetinaNet dataloader (#9442)
* retinanet dataloader

* remove batch_size from generate_anchors

* refactor kits19 dataset tests

* add tests for dataloader

* fix testing setup and cleanups

* remove unused import
2025-03-21 13:36:41 -04:00
geohotstan
1d64c12f2b add Topk to tensor (#9343)
* terrible but somewhat working impl

* linux behaves differently than macos?

* slightly better impl

* small clean up; haven't figured this out yet

* better

* torch has different behavior on linux and macos for duplicated values

* add sum docs

* fix test

* add torch return_type test

* add an exception test

* wrap_fxn instead, and move op lower in order

* better repeated values test

* rerun ci
2025-03-09 20:01:42 -04:00
nimlgen
243078dda9 am: optimize tlb usage (#9049)
* am: optimize tlb usage

* fxies

* comments

* tiny
2025-03-07 19:37:29 +03:00
geohotstan
088d86691b fix onnx gather and onnx auto_pad VALID mode (#9375)
* fix gather and auto_pad

* long -> int64
2025-03-07 10:27:23 -05:00
nimlgen
9bd13de44c lower test_gemv_4096_16384 to 750 for red (#9367) 2025-03-05 22:44:48 +03:00
chenyu
2cb2fce8d9 lower test_gemm_8192 amd_tflops to 65 (#9364) 2025-03-05 14:06:11 -05:00
nimlgen
14c88abf27 add some options to allreduce bench (#9348) 2025-03-04 23:46:36 +03:00