nimlgen
1c5e353249
am: use mmio iface ( #10012 )
...
* am: use mmio iface
* linters
* fixes
* fixes + cleanups
* mute
* mypy
* style
2025-04-24 00:27:04 +03:00
George Hotz
2ed3acd767
toposort is a function [pr] ( #10004 )
2025-04-23 16:25:03 +01:00
George Hotz
71ecc7fa1a
use a pattern matcher for upcast [pr] ( #10000 )
2025-04-23 14:24:23 +01:00
George Hotz
cc1087d2ec
move simplify into views_to_indexed_uops ( #9999 )
...
* move simplify into views_to_indexed_uops
* cache that
2025-04-23 13:50:27 +01:00
George Hotz
d1f6701eb7
hotfix: lower amd threshold + improve block reorder test
2025-04-22 20:44:29 +01:00
qazal
1d90be2cff
match kernelize API in process replay ( #9948 )
2025-04-21 05:23:41 +08:00
chenyu
6c30948df6
hand_coded_optimizations returns list[Opt] [pr] ( #9938 )
...
new api looks like `k.apply_opts(hand_coded_optimizations(k))`
2025-04-19 20:26:59 -04:00
chenyu
720f20865b
remove required_optimizations ( #9848 )
2025-04-19 16:51:16 -04:00
qazal
b58decac0c
fix diamond assigns before mapping tensors UOps to assigns ( #9855 )
...
* keep tensor_map until diamond assign fixup
* ctx
2025-04-18 14:17:43 +03:00
George Hotz
aa98aff4cd
don't use ops name, just keep sink ( #9922 )
...
* don't use ops name, just keep sink
* fix test
* endif sink
2025-04-18 08:59:18 +01:00
chenyu
f5256e0020
Kernel.apply_opts [pr] ( #9917 )
...
* Kernel.apply_opts [pr]
updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization
* not you yet
2025-04-17 08:00:56 -04:00
geohotstan
4e8f25109a
Revert "ONNX add output shape validation ( #9720 )" ( #9904 )
...
This reverts commit ac713e04db .
2025-04-16 03:15:56 -04:00
nimlgen
83ae83d871
compare amd and am to cpu as well ( #9896 )
2025-04-15 13:32:18 +03:00
nimlgen
23a95dd84d
script to compare amd and am kerns ( #9889 )
...
* script to compare amd and am kerns
* tool
* is it used???
2025-04-15 00:11:22 +03:00
qazal
e201bc3e93
process replay kernel asts in toposort order [pr] ( #9869 )
...
* process replay kernel asts in toposort order [pr]
* use HEAD replay
2025-04-13 17:20:34 +08:00
Alexey Zaytsev
7dda6aae7d
Skip CLOUD in external_test_example ( #9857 )
...
Closes #9814
2025-04-12 10:17:44 +08:00
chenyu
8c6299bced
move hand_coded_optimizations to heuristic.py [pr] ( #9844 )
...
* move hand_coded_optimizations to heuristic.py [pr]
also folded all long lines
* make a copy and rename self -> k
* fix test
2025-04-10 23:40:16 -04:00
qazal
fbc6aa53d4
script for local process_replay + fix viz name [pr] ( #9837 )
2025-04-11 00:39:18 +08:00
qazal
16afe04f45
move process replay to grouper ( #9830 )
...
* simpler
* sched
2025-04-10 18:27:42 +08:00
chenyu
c462162db8
update benchmark bert scripts with BS and ACC_DTYPE ( #9826 )
...
BS=16, ACC_DTYPE=half for tinybox, BS=128, ACC_DTYPE=float for mi300x
2025-04-10 02:06:02 -04:00
George Hotz
fefee5d3ab
single kernel softmax ( #9776 )
...
* real single kernel softmax
* cleanup
* fix blockend insertion
* add to bert test
2025-04-08 12:35:48 +08:00
George Hotz
db22094d35
hotfix: update softmax fusion test
2025-04-08 11:23:19 +08:00
Sieds Lykles
07d1aefaf4
fast idiv ( #9755 )
...
* fast idiv with tests and fuzzer
* Add todo comment
* Add env variable to toggle fast_idiv
* Move env check
* Add fuzz fast_idiv to ci
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-04-07 08:32:24 -04:00
chenyu
b190d85ad7
benchmark script bert softmax ( #9759 )
2025-04-07 00:31:18 -04:00
chenyu
43e4565148
weighted linear in external_benchmark_bert_matmuls ( #9757 )
...
include the linear to get qkv, and permute so that stride matches with the real run
2025-04-06 23:35:42 -04:00
chenyu
8a585dc5c1
benchmark script for matmuls in bert ( #9752 )
...
2 main matmuls in the bert layers. getting these to be fast makes bert fast
2025-04-06 19:34:25 +08:00
George Hotz
926b0bcc57
cache folded upcast [pr] ( #9733 )
2025-04-04 11:23:19 +08:00
geohotstan
ac713e04db
ONNX add output shape validation ( #9720 )
...
* add output shape validation and remove support for sequence_type
* nit better err msg
* add sequence_type back
* improve err msg
* Revert "improve err msg"
This reverts commit dc9eaea4bb .
* Revert "add sequence_type back"
This reverts commit 288170b2d9 .
* do explicit shape equality
* small nit
2025-04-03 05:44:53 -04:00
George Hotz
49dafe6d43
add gc tests [pr] ( #9718 )
...
* add gc tests [pr]
* del
* more gc tests
* add NullGraph
2025-04-03 14:08:32 +08:00
geohotstan
e1d7e47cca
fix ONNX IsInf unintended dtype promotion ( #9711 )
...
* add IsInf
* add corresponding test
* that float16 is kinda silly
2025-04-02 22:46:15 -04:00
qazal
bb94f13e58
add RECORD_TRACEBACKS=1 option to process replay ( #9679 )
...
* add RECORD_TRACEBACKS=1 option to process replay
* stack
2025-04-02 11:58:27 +08:00
chenyu
c672716b38
improve vmin/vmax for IDIV ( #9678 )
2025-04-01 23:16:01 -04:00
geohotstan
d52e91db7b
ONNX ops clean ups ( #9622 )
...
* combine work from remove numpy and onnx ops tests
* clippy
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-30 21:39:22 -04:00
geohotstan
a08b07b4da
Bump onnx==1.17.0 ( #9618 )
...
* bump
* remove resize tf_crop_and_resize
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-30 03:21:51 -04:00
nimlgen
54e1e59b44
am: rdna 4 support ( #9621 )
...
* hm
* fix
* return this
* fine
* g
* ruff
* fix
2025-03-29 23:16:27 +07:00
nimlgen
118bd1cbed
hotfix: amd imports ( #9620 )
2025-03-29 20:19:53 +07:00
George Hotz
9115ce8860
linearizer fixups from DSP branch ( #9581 )
2025-03-26 18:28:15 +08:00
George Hotz
74d98eafb8
add onnx frontend stub [pr] ( #9558 )
2025-03-24 12:24:34 +08:00
nimlgen
d5667419af
am: move out pte creation logic ( #9548 )
...
* am: move out pte creation logic
* emu
* ops
2025-03-23 18:29:10 +07:00
geohotstan
309afa20b7
add Tensor.max_unpool2d ( #9518 )
...
* why does max_unpool2d feel slower than out.gradient ...
* slightly cleaner
* what happened to ruff
* need to think about this some more
* slightly faster now?
* clean up, 1 more failing edge case
* ok good
* working TINY_BACKEND
* nit doc wording
* retry CI
2025-03-22 12:11:33 -04:00
Francis Lata
1a1087e3a0
cleanups on losses and dataset tests ( #9538 )
2025-03-21 17:03:18 -04:00
Francis Lata
8cbe4009fc
RetinaNet losses ( #9536 )
...
* add sigmoid_focal_loss and l1_loss
* update ref implementation comment
2025-03-21 15:52:54 -04:00
Francis Lata
e6389184c5
update comment for retinanet dataloader implementations ( #9534 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-21 15:07:45 -04:00
Francis Lata
eb95825eea
RetinaNet dataloader ( #9442 )
...
* retinanet dataloader
* remove batch_size from generate_anchors
* refactor kits19 dataset tests
* add tests for dataloader
* fix testing setup and cleanups
* remove unused import
2025-03-21 13:36:41 -04:00
geohotstan
1d64c12f2b
add Topk to tensor ( #9343 )
...
* terrible but somewhat working impl
* linux behaves differently than macos?
* slightly better impl
* small clean up; haven't figured this out yet
* better
* torch has different behavior on linux and macos for duplicated values
* add sum docs
* fix test
* add torch return_type test
* add an exception test
* wrap_fxn instead, and move op lower in order
* better repeated values test
* rerun ci
2025-03-09 20:01:42 -04:00
nimlgen
243078dda9
am: optimize tlb usage ( #9049 )
...
* am: optimize tlb usage
* fxies
* comments
* tiny
2025-03-07 19:37:29 +03:00
geohotstan
088d86691b
fix onnx gather and onnx auto_pad VALID mode ( #9375 )
...
* fix gather and auto_pad
* long -> int64
2025-03-07 10:27:23 -05:00
nimlgen
9bd13de44c
lower test_gemv_4096_16384 to 750 for red ( #9367 )
2025-03-05 22:44:48 +03:00
chenyu
2cb2fce8d9
lower test_gemm_8192 amd_tflops to 65 ( #9364 )
2025-03-05 14:06:11 -05:00
nimlgen
14c88abf27
add some options to allreduce bench ( #9348 )
2025-03-04 23:46:36 +03:00