Commit Graph

10490 Commits

Author SHA1 Message Date
George Hotz
cc1087d2ec move simplify into views_to_indexed_uops (#9999)
* move simplify into views_to_indexed_uops

* cache that
2025-04-23 13:50:27 +01:00
chenyu
c39128133c retinanet green scripts (#9996)
also removed realize in data_get and used empty for fake data. slightly bigger lr. https://wandb.ai/chenyuxyz/MLPerf-RetinaNet/runs/8skid0e8?nw=nwuserchenyuxyz
2025-04-23 08:28:03 -04:00
George Hotz
a4a5f2d54a faster block order [pr] (#9998)
* faster block reorder [pr]

* ahh, that's even faster
2025-04-23 13:11:30 +01:00
chenyu
61bfd23881 update mlperf-logging version (#9995) 2025-04-22 19:32:39 -04:00
pkotzbach
dbbd755cba FP8s truncate (#9937)
* truncate fp8

* fix

* maybe like that?

* fix linters

* ruff

* move from extra and add ml_types to tests

* minor changes

* str to dtypes and nan support

---------

Co-authored-by: pkotzbach <pawkotz@gmail.com>
2025-04-22 19:12:49 -04:00
qazal
58180caad3 schedule linearize small cleanups [pr] (#9994) 2025-04-23 05:42:29 +08:00
qazal
f4ec57baff new schedule linearizer enqueues KERNEL UOps [pr] (#9993)
* new schedule linearizer enqueues kernels [pr]

* no defaultdict

* diff

* minor
2025-04-23 05:17:58 +08:00
George Hotz
d1f6701eb7 hotfix: lower amd threshold + improve block reorder test 2025-04-22 20:44:29 +01:00
nimlgen
db51133537 rename HWInterface -> FileIOInterface (#9989)
* rename HWInterface -> FileIOInterface

* ugh
2025-04-22 22:18:57 +03:00
George Hotz
c1539b0319 putting add first orders loads as expected (#9991) 2025-04-22 20:12:05 +01:00
nimlgen
bd580d8ea4 hcq: use mmio interface in nv (#9986)
* hcq: start mmio interface

* allow double cast

* revert

* faster?

* simpler, not needed more now

* dd

* types

* fix
2025-04-22 21:58:12 +03:00
George Hotz
feee6986c9 faster block reorder (#9990)
* faster block reorder [pr]

* that shouldn't change order

* key just in sorted

* ind
2025-04-22 19:18:57 +01:00
qazal
6cb2d18c03 refactor schedule linearize to defaultdict [pr] (#9984)
* refactor schedule linearize to defaultdict [pr]

* skip that

* don't need .get
2025-04-23 00:00:23 +08:00
chenyu
9e5e371999 make DISABLE_COMPILER_CACHE a ContextVar [pr] (#9983) 2025-04-22 10:32:54 -04:00
qazal
bbc324f5dc remove CAST_AFTER_EXPAND (#9980) 2025-04-22 21:06:11 +08:00
George Hotz
c519b553db non recursive toposort is 2x+ faster (#9979)
* non recursive toposort is 2x+ faster

* don't change the order
2025-04-22 13:59:38 +01:00
qazal
0d9014d021 place create_ast last, type_verify in the end (once) [pr] (#9977) 2025-04-22 20:15:23 +08:00
chenyu
fb89d9a584 retinanet eval combine output on GPUS[0] (#9966)
eval 35 sec -> 20 sec. it was spending 13 seconds assembling output tensor on CPU backend. GPUS[0] seems to have enough memory, otherwise we can lower EVAL_BS
2025-04-22 07:43:51 -04:00
qazal
7b55846e08 prep STORE UOp creation for multi output [pr] (#9975)
* prep STORE UOp creation for multi output [pr]

* test_multioutput_ast
2025-04-22 19:34:52 +08:00
George Hotz
e358e0a0c6 move metadata set to tensor [pr] (#9976)
* move metadata set to tensor [pr]

* only track that in tensor.py
2025-04-22 12:30:35 +01:00
qazal
f6271515fe refactor UOp.st [pr] (#9973) 2025-04-22 18:46:56 +08:00
George Hotz
f5dc70c624 microbenchmarks + micro speed ups (#9972)
* microbenchmarks

* forgot the ubenchs

* clean up type verify
2025-04-22 11:30:46 +01:00
qazal
1cf4e24ca5 fix kernelize usage with pm_gradient (#9953)
* fix kernelize usage with pm_gradient

* remove that
2025-04-22 17:26:05 +08:00
deftdawg
32bbff942c amd: add nbio 7.2.0 for some rdna2 (#9964)
* - Updated of #9700 which fixes #9665 but for the Steam Deck which was erroring on NBIO 7.2.0

* unrelated change

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-04-22 12:10:48 +03:00
Ignacio Sica
0e79aee706 use_tensor_cores bugfix (#9969) 2025-04-21 22:58:17 -03:00
chenyu
5294c32279 dev scripts for retinanet (#9968)
also BASE_DIR -> BASEDIR for consistency, and move wandb up a bit for more accurate timing
2025-04-21 17:54:56 -04:00
nimlgen
4340197132 am: download fw from web (#9956)
* am: download fw from web

* tested

* link works

* default to web

* this is default

* not used
2025-04-21 23:26:33 +03:00
nimlgen
7244ca863c am: fix double read of sdma fw (#9965) 2025-04-21 23:04:34 +03:00
uuuvn
b35f94b6ec Don't hardcode default CLOUDDEV (#9935) 2025-04-21 18:46:55 +01:00
Francis Lata
defa1e77f6 get the proper dataset count (#9962) 2025-04-21 12:11:37 -04:00
qazal
36ed3c3253 fix kernelize with VIEW children (#9961) 2025-04-21 23:38:46 +08:00
uuuvn
757533cbe6 Less verbose cloud multiprocessing start (#9960)
The set name before starting part used to be required for #9935 when
CLOUDDEV was a global variable, now just readability improvement
2025-04-21 16:19:54 +01:00
Francis Lata
d7e247f329 RetinaNet INITMLPERF support (#9950)
* fixes to make fake data work

* fix eval beam

* fix merge issue
2025-04-21 10:32:05 -04:00
kamilisjon
014f870733 rm (#9959)
Co-authored-by: KamilisJonkus <kamilis.jonkus@agmis.com>
2025-04-21 15:23:45 +01:00
chenyu
f68c7041c4 doc fix is_floating_point dtype.float -> dtypes.float (#9958) 2025-04-21 09:23:59 -04:00
akhuntsaria
2d423e6737 fix assertion message for supported device in export_model (#9957) 2025-04-21 09:23:44 -04:00
ttomsa
783a191925 rm mul from _masked_setitem (#9951) 2025-04-21 06:41:50 -04:00
nimlgen
46469f00a2 am: tiny changes in psp load (#9952) 2025-04-21 11:52:02 +03:00
qazal
0bee225a58 Tensor.kernelize docs (#9946)
* Tensor.kernelize docs

* syntax

* test_kernelize_bw

* Tensor.kernelize docstring

* pruning

* tiny details

* details 2

* becomes_map terminology

* more changes to becomes
2025-04-21 16:34:03 +08:00
Francis Lata
ea4cb2c715 small cleanups (#9947) 2025-04-20 20:33:20 -04:00
qazal
e8910540f6 Kernelize can be called multiple times on a Tensor (#9949)
* Kernelize can be called multiple times on a Tensor

* add (failing) test_kernelize_bw
2025-04-21 06:28:47 +08:00
qazal
1d90be2cff match kernelize API in process replay (#9948) 2025-04-21 05:23:41 +08:00
qazal
343a5eb588 dedup assigns in grouper VIZ name function [pr] (#9942) 2025-04-20 21:42:25 +08:00
qazal
e20ef7196a Tensor.kernelize (#9845)
* add kernelize

* remove that

* kernelize returns self

* update abstractions2.py

* kernelize in test_schedule

* temp: assert BUFFER_VIEW's existence

* ASSIGN must have a buffer or subbuffer target

* assert and shrink

* fix

* padded setitem

* var

* toposort once

* extra

* base_buffer

* end with BUFFER_VIEW

* setitem for disk

* test_setitem_becomes_subbuffer

* mul slice test

* torch backend fix 1

* non-deterministic

* keep subbuffer
2025-04-20 20:53:49 +08:00
qazal
dd16087f62 fold double ASSIGN to same target (#9941) 2025-04-20 19:06:38 +08:00
qazal
9a9aba4cd5 setitem tests (some failing) from kernelize (#9940) 2025-04-20 18:47:55 +08:00
chenyu
6c30948df6 hand_coded_optimizations returns list[Opt] [pr] (#9938)
new api looks like `k.apply_opts(hand_coded_optimizations(k))`
2025-04-19 20:26:59 -04:00
chenyu
720f20865b remove required_optimizations (#9848) 2025-04-19 16:51:16 -04:00
qazal
218e01833d update scheduler section for abstractions2.py [pr] (#9927) 2025-04-19 12:09:14 +03:00
chenyu
3fdba48fc7 update bert green and README (#9934)
submission candidate
2025-04-18 21:21:28 -04:00