Commit Graph

10417 Commits

Author SHA1 Message Date
deftdawg
32bbff942c amd: add nbio 7.2.0 for some rdna2 (#9964)
* - Updated of #9700 which fixes #9665 but for the Steam Deck which was erroring on NBIO 7.2.0

* unrelated change

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-04-22 12:10:48 +03:00
Ignacio Sica
0e79aee706 use_tensor_cores bugfix (#9969) 2025-04-21 22:58:17 -03:00
chenyu
5294c32279 dev scripts for retinanet (#9968)
also BASE_DIR -> BASEDIR for consistency, and move wandb up a bit for more accurate timing
2025-04-21 17:54:56 -04:00
nimlgen
4340197132 am: download fw from web (#9956)
* am: download fw from web

* tested

* link works

* default to web

* this is default

* not used
2025-04-21 23:26:33 +03:00
nimlgen
7244ca863c am: fix double read of sdma fw (#9965) 2025-04-21 23:04:34 +03:00
uuuvn
b35f94b6ec Don't hardcode default CLOUDDEV (#9935) 2025-04-21 18:46:55 +01:00
Francis Lata
defa1e77f6 get the proper dataset count (#9962) 2025-04-21 12:11:37 -04:00
qazal
36ed3c3253 fix kernelize with VIEW children (#9961) 2025-04-21 23:38:46 +08:00
uuuvn
757533cbe6 Less verbose cloud multiprocessing start (#9960)
The set name before starting part used to be required for #9935 when
CLOUDDEV was a global variable, now just readability improvement
2025-04-21 16:19:54 +01:00
Francis Lata
d7e247f329 RetinaNet INITMLPERF support (#9950)
* fixes to make fake data work

* fix eval beam

* fix merge issue
2025-04-21 10:32:05 -04:00
kamilisjon
014f870733 rm (#9959)
Co-authored-by: KamilisJonkus <kamilis.jonkus@agmis.com>
2025-04-21 15:23:45 +01:00
chenyu
f68c7041c4 doc fix is_floating_point dtype.float -> dtypes.float (#9958) 2025-04-21 09:23:59 -04:00
akhuntsaria
2d423e6737 fix assertion message for supported device in export_model (#9957) 2025-04-21 09:23:44 -04:00
ttomsa
783a191925 rm mul from _masked_setitem (#9951) 2025-04-21 06:41:50 -04:00
nimlgen
46469f00a2 am: tiny changes in psp load (#9952) 2025-04-21 11:52:02 +03:00
qazal
0bee225a58 Tensor.kernelize docs (#9946)
* Tensor.kernelize docs

* syntax

* test_kernelize_bw

* Tensor.kernelize docstring

* pruning

* tiny details

* details 2

* becomes_map terminology

* more changes to becomes
2025-04-21 16:34:03 +08:00
Francis Lata
ea4cb2c715 small cleanups (#9947) 2025-04-20 20:33:20 -04:00
qazal
e8910540f6 Kernelize can be called multiple times on a Tensor (#9949)
* Kernelize can be called multiple times on a Tensor

* add (failing) test_kernelize_bw
2025-04-21 06:28:47 +08:00
qazal
1d90be2cff match kernelize API in process replay (#9948) 2025-04-21 05:23:41 +08:00
qazal
343a5eb588 dedup assigns in grouper VIZ name function [pr] (#9942) 2025-04-20 21:42:25 +08:00
qazal
e20ef7196a Tensor.kernelize (#9845)
* add kernelize

* remove that

* kernelize returns self

* update abstractions2.py

* kernelize in test_schedule

* temp: assert BUFFER_VIEW's existence

* ASSIGN must have a buffer or subbuffer target

* assert and shrink

* fix

* padded setitem

* var

* toposort once

* extra

* base_buffer

* end with BUFFER_VIEW

* setitem for disk

* test_setitem_becomes_subbuffer

* mul slice test

* torch backend fix 1

* non-deterministic

* keep subbuffer
2025-04-20 20:53:49 +08:00
qazal
dd16087f62 fold double ASSIGN to same target (#9941) 2025-04-20 19:06:38 +08:00
qazal
9a9aba4cd5 setitem tests (some failing) from kernelize (#9940) 2025-04-20 18:47:55 +08:00
chenyu
6c30948df6 hand_coded_optimizations returns list[Opt] [pr] (#9938)
new api looks like `k.apply_opts(hand_coded_optimizations(k))`
2025-04-19 20:26:59 -04:00
chenyu
720f20865b remove required_optimizations (#9848) 2025-04-19 16:51:16 -04:00
qazal
218e01833d update scheduler section for abstractions2.py [pr] (#9927) 2025-04-19 12:09:14 +03:00
chenyu
3fdba48fc7 update bert green and README (#9934)
submission candidate
2025-04-18 21:21:28 -04:00
George Hotz
b359125ebf rewrite the linearizer (#9885)
* random speedups [pr]

* speeding up linearizer

* test_gemm passes

* progress

* test_gemm passes

* working

* simpler

* blockstart unneeded

* simpler

* bugfix

* work

* don't compare

* faster

* progress

* cleanups

* work

* cleanups

* working

* reorder

* name is dumb

* fix tests

* lin2 works

* clean ctx

* mostly bottom up

* passes

* same speed now

* new lin is faster

* dedup

* lines and tuples

* track that

* lin

* revert that

* tests should pass

* merge siblings

* cleaner expression

* only lin2

* finally, some speed

* simpler

* fix unmergables with blockends
2025-04-18 22:35:40 +01:00
Ignacio Sica
023b1c28a2 test_tensor_cores_padded refactor (#9724)
* set pad t 3 for amd padded tc test

* change pad for amd regardless CI

* test tc padded uops and correctness separately

* add test_tensor_cores_padded_uops test to ci

* remove redundant chack for amd device

* cleanup
2025-04-18 17:05:54 -03:00
Ignacio Sica
afff82ba0f fix ptx linearizer bug [pr] (#9926)
* fix ptx bug

* align 16

* revert align because it breaks pr

* smallest diff that fixes ptx bug
2025-04-18 13:48:43 -03:00
chenyu
617b45748f fuse embedding for bert on red (#9925)
also updated BEAM param and use AMD driver for actual run. 535ms step
2025-04-18 07:20:25 -04:00
qazal
b58decac0c fix diamond assigns before mapping tensors UOps to assigns (#9855)
* keep tensor_map until diamond assign fixup

* ctx
2025-04-18 14:17:43 +03:00
qazal
a37d921917 get name from SINK in process replay (#9924)
* get name from SINK in process replay

* space
2025-04-18 13:51:11 +03:00
George Hotz
aa98aff4cd don't use ops name, just keep sink (#9922)
* don't use ops name, just keep sink

* fix test

* endif sink
2025-04-18 08:59:18 +01:00
George Hotz
8919370c76 hotfix: fix test_save_all_dtypes on METAL 2025-04-18 08:42:31 +01:00
qazal
16dfe0a902 upstream remu (#9921) 2025-04-18 01:57:36 +03:00
qazal
d287afe3b1 remove shapeless const check in full_shape [pr] (#9911)
* remove shapeless const check in full_shape [pr]

* those can go too
2025-04-18 00:00:26 +03:00
chenyu
fe6a482f1d pin hypothesis version to 6.131.0 (#9920)
6.131.1 seems to cause timeout in CI
2025-04-17 16:34:10 -04:00
chenyu
f5256e0020 Kernel.apply_opts [pr] (#9917)
* Kernel.apply_opts [pr]

updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization

* not you yet
2025-04-17 08:00:56 -04:00
chenyu
e2ed673c94 FUSE_ARANGE_UINT to not fuse uint (#9915)
hack to bypass rand, can FUSE_ARANGE on green for 6ms per step
2025-04-16 18:49:38 -04:00
qazal
497daa658a hotfix: edge-labels go above the overlay (#9910) 2025-04-16 23:38:12 +08:00
qazal
e8e43c6dad ensure edge labels are always on top (#9908) 2025-04-16 21:08:06 +08:00
qazal
5265f25088 add counter for incoming edges in viz (#9907) 2025-04-16 20:14:14 +08:00
Eitan Turok
2c7c205bc5 Fix dtype comparisons in vectorized transcendental + tests (#9794)
* init test

* cleanup

* init

* update

* fix

* fix python runtime for vectorized code

* awesome helper

* update

* update

* cleanup

* more cleaning

* cleanup more

* fix tests

* more cleaning

* cleanup more

* fix

* even cleaner

* failing tests is sad

* cleanup

* better name

* make tests pass

* remove vec from python runtime

* remove vec from eval_uop

* remove expected failues

* better name
2025-04-16 08:06:12 -04:00
qazal
929e5a9905 do not construct GrouperContext [pr] (#9906) 2025-04-16 18:26:31 +08:00
Xingyu
047c8fd70d Add amax support to Tensor operations in Torch Backend (#9905)
* Add amax support to Tensor operations
- Implemented amax function in backend.py for tensor max operations.
- Added unit tests for amax in test.py to ensure correct functionality.

* Fix formatting in amax output function
- Adjusted spacing in the amax output lambda function in backend.py
- Improved code readability for better maintenance
2025-04-16 10:35:50 +01:00
uuuvn
d7f623dac2 Use Buffer in cloud server instead of opaques (#9875)
Not-quite-required but makes cloud graph a *lot* cleaner because unlike
raw compiled programs `GraphRunner` takes `Buffer`s like other runners.

Otherwise either of: adding a new option to not free on `__del__`,
(ab)using `external_ptr` to prevent free, or making something like a
`FakeBuffer` is required.
2025-04-16 10:17:32 +01:00
qazal
05334e0f3f construct children from UOp.toposort [pr] (#9882)
* construct children from UOp.toposort [pr]

* only for bases
2025-04-16 16:55:59 +08:00
geohotstan
4e8f25109a Revert "ONNX add output shape validation (#9720)" (#9904)
This reverts commit ac713e04db.
2025-04-16 03:15:56 -04:00
chenyu
e8024c8281 faster bert global_norm (#9901)
tinyamd 2% faster.  also updated beam params that's 2-3% faster.

update mlperf doc and steps too
2025-04-15 18:24:44 -04:00