Commit Graph

4433 Commits

Author SHA1 Message Date
George Hotz
1714fc3ba4 start work on speed [pr] (#9707)
* fix get_location

* fix get_location try 2

* clean up split_load_store [pr]

* SHR fixup [pr]
2025-04-03 10:39:01 +08:00
George Hotz
0f1ffc2050 hotfix: cat tests 2048 instead of 256 2025-04-03 10:37:56 +08:00
Ignacio Sica
2d6d8b7355 add bf16 mfma support (#9695)
* add bf16 mfma support

* skip tc if emulated_amd and dtypes is bf16

* hotfix
2025-04-02 21:44:49 +08:00
chenyu
3b8d923692 remove skip LLVM in test_div_int (#9686) 2025-04-02 04:15:00 -04:00
George Hotz
e78e8722dc Revert "LDS noop and spec (#9669)" (#9691)
This reverts commit 870b545ace.

Co-authored-by: Ignacio Sica <mignacio.sica@gmail.com>
2025-04-02 15:31:32 +08:00
chenyu
c20f112e9f example test use z3 to verify valid simplification (#9684) 2025-04-02 01:05:52 -04:00
chenyu
bca0c85193 skip CI CPU test_data_parallel_resnet_train_step (#9685)
flaky
2025-04-02 01:04:54 -04:00
qazal
bb94f13e58 add RECORD_TRACEBACKS=1 option to process replay (#9679)
* add RECORD_TRACEBACKS=1 option to process replay

* stack
2025-04-02 11:58:27 +08:00
chenyu
c672716b38 improve vmin/vmax for IDIV (#9678) 2025-04-01 23:16:01 -04:00
chenyu
8dd88ad476 don't div_and_mod_folding for negative numerator with remainder (#9674)
can be wrong in C div since it truncates towards zero
2025-04-01 16:26:23 -04:00
chenyu
0e34f9082e helper functions for cstyle div mod [pr] (#9673) 2025-04-01 08:06:56 -04:00
Ignacio Sica
870b545ace LDS noop and spec (#9669)
* init lds noop and lds_0 spec

* refactor lds helper test

* fix typo

* test all lds at the same time

* change comment

* comment

* start test_lds_full

* test_lds_tc

* add tc spec
2025-04-01 18:44:55 +08:00
b1tg
d9af4cfc1b AMD_LLVM: tensor cores support (#9613)
* tensor cores support

* test tesor cores codegen

* use rewrite rules

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-04-01 09:56:27 +08:00
Ignacio Sica
1444069c09 Uppercase K for dimension and lowercase k for kernel in linearizer tc helper test (#9649) 2025-03-31 19:05:36 +08:00
Ignacio Sica
baa67fd124 Uppercase N and M (standalone syntax change) (#9647) 2025-03-31 18:45:30 +08:00
Yvon Manzi
6652003839 Add cumprod to Tensor (#9629)
* probably how cumprod should look like

* update _cumalu to work with MUL

* shorter

* cumprod testing

* clean

* more cleanup

* add cumprod to torch backend.

* make it look like cumsum

* mypy fix

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-30 21:49:18 -04:00
geohotstan
d52e91db7b ONNX ops clean ups (#9622)
* combine work from remove numpy and onnx ops tests

* clippy

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-30 21:39:22 -04:00
geohotstan
a08b07b4da Bump onnx==1.17.0 (#9618)
* bump

* remove resize tf_crop_and_resize

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-30 03:21:51 -04:00
nimlgen
54e1e59b44 am: rdna 4 support (#9621)
* hm

* fix

* return this

* fine

* g

* ruff

* fix
2025-03-29 23:16:27 +07:00
nimlgen
118bd1cbed hotfix: amd imports (#9620) 2025-03-29 20:19:53 +07:00
uuuvn
dd9aae02c3 Refactor ops_amd.py (MI300X prereq) (#9428) 2025-03-29 00:17:20 +07:00
nimlgen
fa0ebbd237 jit: optimize before pickle (#9611)
* jit: optimize before pickle

* optimize weights

* fix

* mypy

* mypy2
2025-03-28 19:06:09 +07:00
Andrew Furey
50dee4a7b3 add test for checking const gradients (#9598) 2025-03-27 15:17:37 -04:00
chenyu
5358b0904b update uop_given_valid if a node becomes const (#9604)
* update uop_given_valid if a node becomes const

* cleanup
2025-03-27 14:57:46 -04:00
qazal
bf94924d5a fix viz with nested graph_rewrite (#9595) 2025-03-27 13:14:28 +08:00
qazal
e5ff7b23d7 refactor to @track_matches + add failing test_nested_rewrite (#9592)
* test_nested_rewrite

* refactor to track_matches

* positional arg
2025-03-27 11:11:56 +08:00
nimlgen
dc9da1d917 memplan into one buffer (#9526)
* new memplanner

* new should works

* fix

* VALIDATE_MEMORY_PLANNER

* hm?

* ugh

* fix alignment

* fix2

* rm

* tiny fixes

* test

* comments and fixes

* fix2

* liiiinetr

* t

* fix
2025-03-27 01:46:50 +07:00
nimlgen
e88a640ca5 fix _access_resources for offset buffers (#9580)
* fix _access_resources for offset buffers

* test
2025-03-26 18:42:43 +07:00
George Hotz
9115ce8860 linearizer fixups from DSP branch (#9581) 2025-03-26 18:28:15 +08:00
nimlgen
ccbcdca473 add memplanner tests (#9577) 2025-03-26 10:59:39 +07:00
chenyu
cddd750d68 add a failed test case for jit/nojit rand [pr] (#9574)
currently adding jit produced different rand values
2025-03-25 13:32:44 -04:00
qazal
52301fe68e move Buffer refcount increment out of schedule.py (#9564)
* move Buffer refcount increment out of schedule.py

* add TestGC.test_assign_refcount

* refcount refers to Ops.BUFFER UOps
2025-03-25 12:08:27 +08:00
chenyu
6427272bf6 minor update to rand [pr] (#9566) 2025-03-24 18:49:50 -04:00
qazal
d7c754ce49 failing test for UOp buffer ref count (#9563)
* failing test for UOp buffer ref count

* lint
2025-03-25 00:10:48 +08:00
b1tg
f90001e1a6 amd llvm render (no_comgr prereq) (#9543)
* amd llvm render

* skip test_div_rounding_mode

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-03-24 22:50:51 +08:00
George Hotz
74d98eafb8 add onnx frontend stub [pr] (#9558) 2025-03-24 12:24:34 +08:00
chenyu
ba41076e94 update embedding test to not use dtypes.long [pr] (#9556) 2025-03-23 21:33:38 -04:00
nimlgen
d5667419af am: move out pte creation logic (#9548)
* am: move out pte creation logic

* emu

* ops
2025-03-23 18:29:10 +07:00
geohotstan
309afa20b7 add Tensor.max_unpool2d (#9518)
* why does max_unpool2d feel slower than out.gradient ...

* slightly cleaner

* what happened to ruff

* need to think about this some more

* slightly faster now?

* clean up, 1 more failing edge case

* ok good

* working TINY_BACKEND

* nit doc wording

* retry CI
2025-03-22 12:11:33 -04:00
quortus
bdd44d4255 Fix DSP transcendentals (#9542) 2025-03-22 11:08:18 +08:00
chenyu
c33679c47b increase size in test_multinomial_counterexample (#9540)
should be less flaky
2025-03-21 17:46:52 -04:00
Francis Lata
1a1087e3a0 cleanups on losses and dataset tests (#9538) 2025-03-21 17:03:18 -04:00
Francis Lata
8cbe4009fc RetinaNet losses (#9536)
* add sigmoid_focal_loss and l1_loss

* update ref implementation comment
2025-03-21 15:52:54 -04:00
Francis Lata
e6389184c5 update comment for retinanet dataloader implementations (#9534)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-21 15:07:45 -04:00
Francis Lata
eb95825eea RetinaNet dataloader (#9442)
* retinanet dataloader

* remove batch_size from generate_anchors

* refactor kits19 dataset tests

* add tests for dataloader

* fix testing setup and cleanups

* remove unused import
2025-03-21 13:36:41 -04:00
b1tg
58206fa8a9 add amd llvm compiler (#9519)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-21 23:13:27 +08:00
George Hotz
8e555c586c switch quantization to unsigned/unsigned + add Ops.REDUCE (#9527)
* switch quantization to unsigned/unsigned + add Ops.REDUCE

* tests

* nhwc + replay pkl
2025-03-21 17:02:37 +08:00
George Hotz
3c5161b4cb add validation of the bounds of Ops.INDEX (#9503)
* add validation of the bounds of Ops.INDEX

* do mask properly

* more validation

* correct

* fix gated

* add CAST support to vmin/vmax

* fix ptx and image

* ptx no diff

* upat.index also stays

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-03-20 12:15:55 +08:00
qazal
0b20f91ce7 remove move_mask from the devectorizer (#9511)
* remove move_mask from the devectorizer

* add (wrong) ptx

* reason

* enable index addition in PTX, we won't have the INDEX anyways

* space
2025-03-20 11:53:12 +08:00
qazal
1839e8c9b3 place masks in INDEX for TestGatedStoreRewrite [pr] (#9512) 2025-03-20 09:46:53 +08:00