Commit Graph

9095 Commits

Author SHA1 Message Date
wozeparrot
0d86f8d375 fix failed threefry (#10646) 2025-06-05 17:17:42 -07:00
chenyu
e67642d430 update doc example for multinomial (#10657)
also added many `s` for consistency
2025-06-05 20:16:52 -04:00
Eitan Turok
61352b8aa2 Add some more docs (#10634)
* more docs

* Add multinomial to ops

* better doc
2025-06-05 19:40:37 -04:00
qazal
884b6cf288 remove gbarrier on const (#10656) 2025-06-06 02:36:52 +03:00
chenyu
ff1aad7b69 fix const float pow to int tensor (#10655)
was incorrectly casted into int
2025-06-05 19:15:12 -04:00
George Hotz
6619f17e26 force store to be contiguous (#10652) 2025-06-05 15:42:54 -07:00
wozeparrot
37e1ef1be3 feat: cleanup old AM processes (#10653) 2025-06-05 15:41:00 -07:00
George Hotz
baba274a76 minimal mstack pr to fix allreduce (#10649)
* minimal mstack pr to fix allreduce

* fix webgpu
2025-06-05 15:14:53 -07:00
George Hotz
4c315f8e17 MSTACK little non-functional changes (#10648) 2025-06-05 13:20:22 -07:00
b1tg
79d04d1baf AMD_LLVM: support mfma for mi300x (#10625)
* amd llvm: support mfma for mi300x

* don't pass self

* refactor wmma render

* arch as lambda arg

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-06-05 15:55:44 -04:00
chenyu
46811d0d3c minor external_model_benchmark cleanup (#10644) 2025-06-05 14:13:28 -04:00
qazal
26afbc954f delete redundant tests from test_schedule [pr] (#10643) 2025-06-05 20:08:39 +03:00
chenyu
80ebce421d remove metal buffer limit in external_model_benchmark [pr] (#10642)
not needed anymore
2025-06-05 13:00:51 -04:00
qazal
28c4997236 check for matching shape order in fused reduce (#10641)
* failing test

* shapes match with ones removed
2025-06-05 19:37:22 +03:00
qazal
1190062812 prevent grouper can_chase while fusing arange [pr] (#10623) 2025-06-05 18:50:21 +03:00
uuuvn
69f7778985 refactor renderer launch bounds [pr] (#10617) 2025-06-05 08:38:04 -07:00
qazal
8c5ea00522 push permutes through fused reduces (#10628)
* fix pushing reshapes through reduceops

* reduceop_view_right should assert on ndims mismatch

* update that, view.reshape asserts it
2025-06-05 16:14:04 +03:00
qazal
8db0ba1161 simpler swizzle_reducop + comments [pr] (#10638) 2025-06-05 13:54:49 +03:00
qazal
ed37f29184 remove unused lib directory from viz setup [pr] (#10639) 2025-06-05 13:54:31 +03:00
chenyu
f6d7db25b7 simpler unbind_view [pr] (#10636) 2025-06-05 01:03:27 -04:00
chenyu
d0969f5a1f cleanup multi tests (#10635) 2025-06-05 00:28:44 -04:00
qazal
571c0296a9 linearizer failure from FUSE_ARANGE default diff (#10629)
* start with test_arange_sum

* test_arange_avgpool2d

* device.renderer.supports_float4
2025-06-04 19:11:52 +03:00
qazal
5056d21b29 add failing TestSchedule.test_arange_sum [pr] (#10627) 2025-06-04 17:23:59 +03:00
gill
9acaa6bc9a Fix button layout in viz UI for safari (#10621)
Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-06-04 15:33:22 +03:00
Xingyu
7a1bfb668d Implement linalg_eigh function for tensor eigenvalue decomposition in torch backend (#10612)
* Implement private _linalg_eigh function for tensor eigenvalue decomposition in torch backend

* Add unit test for linalg.eigh function in TestTorchBackend

This test verifies the eigenvalue decomposition of a 2x2 tensor using the linalg.eigh function, ensuring the computed eigenvalues and reconstructed tensor match the expected results.
2025-06-04 07:59:50 -04:00
qazal
7114b6ab31 viz browser tests (#10626)
* viz browser tests

* expect failure if js/ isn't included

* back green
2025-06-04 14:58:24 +03:00
Fang-Pen Lin
b0913295d2 Add missing js files in python package data for viz (#10624) 2025-06-04 10:49:43 +03:00
wozeparrot
4d1686f767 clean: becnhmark -> benchmark (#10620) 2025-06-03 19:28:18 -07:00
chenyu
18e9ec3ea1 add wino cifar to search benchmark (#10615)
* add wino cifar to search benchmark

* FUSE_OPTIM=1

* revert those
2025-06-03 20:38:43 -04:00
Bhavya Gada
bafd0c30d7 fix some minor typos and grammar (#10619) 2025-06-03 15:55:25 -07:00
nimlgen
4381b54543 am: disable page migration (#10608)
* am: disable page migration

* fixed

* enable

* fxi

* typ

* fix check
2025-06-03 18:51:28 +03:00
chenyu
1c1f578490 DISABLE_COMPILER_CACHE in sdxl search (#10614) 2025-06-03 09:22:25 -04:00
qazal
ce9f12dc13 reorder cast before masking constants (#10609)
* failing test from fuzzer

* .numpy() handles bfloat16 better

* const->view->cast becomes const->cast->view

* update TestMovedConstFolding.test_cast_padded
2025-06-03 15:44:03 +03:00
qazal
910cabb081 add kernel count to grouper process replay differ [pr] (#10611) 2025-06-03 15:21:27 +03:00
chenyu
26dee71bc1 hotfix don't overwrite acc dtype in scatter_reduce (#10606)
dtype is inferred by individul reduce
2025-06-02 21:17:01 -04:00
ihar
ba02a6331e removed unnecessary 'isinstance(data, UOp)' check (#10605) 2025-06-02 20:58:14 -04:00
nimlgen
07de095b27 am: more info on PFs (#10602)
* am: more info on PFs

* fix
2025-06-02 23:48:40 +03:00
qazal
b8fb2ba829 rename to finalize_gbarrier [pr] (#10596) 2025-06-02 12:55:31 +03:00
Ahmed Harmouche
650404a143 [webgpu] Proper shared mem size for packed types (#10585)
* Proper shared mem size in webgpu

* Add test

* Refactor test
2025-06-01 20:18:33 -04:00
qazal
00822603ec allow stacking of VIEW UOps [pr] (#10532)
* allow stacking of VIEW UOps [pr]

* merge_views is first

* simpler

* loc for pr, this needs a helper

* keep

* diff [pr]

* formatting
2025-06-01 23:27:04 +03:00
qazal
3cc73a0172 simpler process replay main loop [pr] (#10588)
* simpler process replay main loop [pr]

* use logging

* default to 1
2025-06-01 15:03:21 +03:00
qazal
dc882d3d7d merge process replay and viz captures [pr] (#10581)
* refactoring

* test script

* work

* more work

* diff

* repr splits lines correctly

* that

* add location

* add location

* also don't need name_override

* k.copy

* [pr]

* name_override 2

* err
2025-06-01 12:30:10 +03:00
qazal
1f8a8721e9 remove test_unaligns_idxs, UOps don't have order like this [pr] (#10587) 2025-06-01 12:16:14 +03:00
ihar
c45936c4fc replaced '.upper()' which is never needed with '.lower()' which were duplicated (#10586) 2025-05-31 20:58:42 -04:00
ihar
88f38d3fcc remove '_metaop' because it is an old wrapper around 'UOp.metaop' with no additional functionality anymore (#10583) 2025-05-31 14:06:39 -04:00
chenyu
77c7989fa0 remove a MUL rewrite rule for wgsl (#10582)
tests are fine without it
2025-05-31 14:05:49 -04:00
Ahmed Harmouche
35eb4d357a [webgpu] Fix atomic shared mem load inside loop (#10530)
* Disable shared mem atomics on webgpu

* allow_any_len in load pattern matcher to fix temp load inside loop
2025-05-31 09:29:02 -04:00
qazal
6af4b02374 use plain dict and list in grouper [pr] (#10580) 2025-05-31 13:09:59 +03:00
chenyu
4ab3391e6f set -o pipefail for mlperf run_and_time (#10577)
also run the 5.1 script in ci cron job
2025-05-30 16:36:44 -04:00
chenyu
baf482d314 copy mlperf stuff to 5.1 (#10576)
5.0 is finalized, new changes go to 5.1
2025-05-30 16:12:39 -04:00