Commit Graph

10490 Commits

Author SHA1 Message Date
Nino Risteski
a1a146a499 adding enable_gqa in SDPA (#11097)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-06 23:25:33 -07:00
nimlgen
b73e89110e nv: align allocations for perf (#11114) 2025-07-06 22:32:11 +03:00
chenyu
7468959f4b Tensor.argsort (#11112) 2025-07-06 13:56:35 -04:00
kevvz
b7af9cf849 clean svd tests, set full_matrices false in torch backend (#11113)
* clean tests, set full_matrices false

* add more shape asserts
2025-07-06 13:55:49 -04:00
qazal
a556f50668 viz: small ui fixes (#11110)
* share styling of ctx-list and metadata

* scrollbar-gutter: stable prevents layout shift when changing steps

* margin-left makes left side unaligned
2025-07-06 17:05:36 +03:00
chenyu
ba88ec3ad0 pipe linalg svd to torch (#11109)
and found a bug in svd
2025-07-06 08:37:25 -04:00
chenyu
845a4d32bc Tensor.diag (#11108)
also updated Tensor.eye to use it
2025-07-05 23:03:02 -04:00
ttomsa
4905af4ae0 remove invalid int div test (#11106)
* rm test

* also rm this
2025-07-05 18:57:55 -04:00
qazal
a4aa769c0a fix: type checking for track_rewrites key [pr] (#11104)
* fix: type checking for track_rewrites key [pr]

* also for cpu_profile

* func.__name__ to start
2025-07-05 20:11:21 +03:00
qazal
81781dc12b viz: renames and spacing changes to tracing (#11102) 2025-07-05 18:40:39 +03:00
qazal
7619bf35e7 cleanup: remove disabled TestIndexingOrdering (#11101)
* cleanup: remove disabled TestIndexingOrdering

* don't import kernelize internals
2025-07-05 18:14:37 +03:00
qazal
4fcfaa0ef7 viz: switch to TracingKey (#11100)
* viz: switch to TracingKey

* tuple

* order is name, keys, fmt

* add test_tracing_key
2025-07-05 17:46:18 +03:00
qazal
458be950d9 viz: add TINY device (#11095)
* viz: add TINY device

* replace Any with a proper type

* reorder

* diff

* rename

* space

* from diff

* multiple keys
2025-07-05 16:54:55 +03:00
nimlgen
4dccb2ea49 am_smi: increase kill retries (#11099) 2025-07-05 16:23:50 +03:00
chenyu
39b4d72687 remove flatten and reshape in sparse_categorical_crossentropy [pr] (#11093)
not needed, directly operating on the classes dim is fine
2025-07-04 15:15:27 -04:00
nimlgen
577afc9f05 hcq: remove redunt syncs and fix typing (#11096)
Before this patch the code could issues reduntdant syncs because of
the typing issue. Current tests should cover all correctness checks.
2025-07-04 21:49:47 +03:00
qazal
41aa54eb5a viz: resolve all graph references in python (#11087)
* viz: resolve all graph references in python

* it just maps things to the index

* always map the name

* key on the uop

* diff

* close
2025-07-04 20:35:25 +03:00
qazal
3d8569f6d8 hotfix: infinite loop in tracking pattern matcher (#11094)
* failing test

* fix that

* given matchers
2025-07-04 19:55:26 +03:00
qazal
a783211fc7 viz: allow end_time=None in trace events (#11092) 2025-07-04 17:48:17 +03:00
0xSG
17119b0f23 hip_ioctl: platform.machine added (#11084) 2025-07-04 17:20:24 +03:00
nimlgen
6656aa162c nv: enable huge pages (#11091) 2025-07-04 17:17:24 +03:00
nimlgen
01f3c4f44d memory: simpler paddr allocation logic (#11090)
* memory: new paddr allocation logic

* am fix

* am refactrros

* fix

* mypy

* use it

* am
2025-07-04 17:00:36 +03:00
qazal
f6d55d9272 viz: pickle UPat location (#11086) 2025-07-04 13:09:00 +03:00
qazal
2403f126ed move printable out of UPat [pr] (#11085)
* move printable out of UPat [pr]

* print_match_stats
2025-07-04 12:31:11 +03:00
qazal
988540f401 support capturing cpu_profile on error (#11078)
* support capturing cpu_profile on error

* spacing

* pylint complains
2025-07-04 11:53:12 +03:00
chenyu
a2f5a54458 move sparse_categorical_crossentropy to test_ops (#11083)
also flattened the tests
2025-07-03 21:40:54 -04:00
chenyu
7c8ccb0267 sparse_categorical_crossentropy cleanup [pr] (#11082) 2025-07-03 18:32:52 -04:00
nimlgen
e02ee8ef1b nv: cleanups from 5090 (#11081) 2025-07-04 00:08:47 +03:00
George Hotz
e9a01dd04a Revert "Fix division by zero in add views (#11075)" (#11080)
This reverts commit 19f07e72f6.
2025-07-03 11:39:44 -07:00
Sieds Lykles
19f07e72f6 Fix division by zero in add views (#11075) 2025-07-03 11:37:59 -07:00
chenyu
678cabc6f2 use argfix in Tensor.stack (#11077)
works for multiple Tensor args or single tuple/list of Tensors, but not the mixed
2025-07-03 12:15:11 -04:00
qazal
b695e8c4d6 viz: remove support for naming with self (#11076) 2025-07-03 17:29:14 +03:00
Sieds Lykles
53985297bd add test, fix rewrite rule and raise error on division by zero (#11073) 2025-07-03 08:25:06 -04:00
nimlgen
2d138c6cf1 am: factor out init_sw (#11070) 2025-07-03 11:01:17 +03:00
quortus
a937ac80dc Replace ASSIGN with STORE in UPat compiler (#11065) 2025-07-02 19:15:43 -07:00
George Hotz
d049639221 little setitem test (#11064)
* setitem has one less realize, why broken

* put realize back
2025-07-02 15:10:24 -07:00
quortus
17d85b9793 Refactor STORE implementation in ops_python (#11060) 2025-07-02 14:29:12 -07:00
George Hotz
3b85534df0 outerworld range test [pr] (#11059)
* outerworld range test [pr]

* bound range

* grad acc test

* more tests

* 5 steps is fine
2025-07-02 14:28:44 -07:00
chenyu
425d5f55c4 generate kernel dataset and upload artifact (#11063) 2025-07-02 17:21:25 -04:00
chenyu
09cc64eea7 remove const 0 clause in "UOp with size 0 is zero" [pr] (#11061) 2025-07-02 16:36:40 -04:00
chenyu
4d57437a67 add timeout to benchmark_search and mlperf action (#11058)
default timeout is 6 hours which is too long and occupies a box
2025-07-02 14:17:34 -04:00
nimlgen
6067568087 nv: remove hardcoded CTRL_CMD_VASPACE_COPY_SERVER_RESERVED_PDES (#11057) 2025-07-02 20:41:10 +03:00
qazal
ad155f5454 print inputs to get_program in process replay [pr] (#11051)
* print inputs to get_program in process replay [pr]

* colors

* keep dataclass default escapes

* Revert "keep dataclass default escapes"

This reverts commit c6db7e8a7a.

* note for ast_repr

* add that back
2025-07-02 20:20:01 +03:00
Ignacio Sica
a22aa77c82 cleanup opts_to_apply (#11055)
* fix kernelinfo init in fixup_ast

* opts_to_apply None
2025-07-02 20:03:19 +03:00
qazal
a919b8325b add test_kernel_info (#11054)
* add test_kernel_info

* reorder
2025-07-02 19:48:12 +03:00
kevvz
3b041d188f [bounty] Singular Value Decomposition (#10875)
* inital commit

* add qr + expand svd to full matrix

* add odd number support

* add linalg tests

* qr supports dims of arbitrary size

* add qr tests

* svd supports dims of arbitrary size

* small cleanip

* improvements over svd batch handling

* improve linalg tests

* make u_pad match q shape

* add nonfull matrix tests

* little less verbose nonfull svd test

* added dtypes on svd + return vt instead of vt

* lint

* more lint

* lint + set seed

* small fix

* small lint

* lint

* add int casting to indices and shapes

* remove int from shape tuple in svd

* small cleanup

* add return types

* reuse inverse_permute

* refactoring

* whitespace

* remove regularization term to prevent bad outputs on ill conditioned matrices

* remove seed

* refactor

* lint

* refactor

* spacing

* remove clone

* line reduction

* smarter heuristic for iterations_per_round

* add big test

* lint

* turns out no constant needed?

* wrap tests

* some small matrices need the constant

* remove realize

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-07-02 09:06:03 -07:00
Ignacio Sica
fc42c3063e use kernel info (#11049)
* use kernel info

* keep api

* revert change in comment
2025-07-02 08:42:32 -07:00
Ahmed Harmouche
e992ed10dc WebGPU on Windows (#10890)
* WebGPU on Windows

* Fix dawn-python install

* New test

* pydeps

* Minor fix

* Only install dawn-python on windows webgpu

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-07-02 08:38:45 -07:00
nimlgen
e67a6d2310 nv: tiny cleanups (#11053) 2025-07-02 18:37:32 +03:00
chenyu
4626e9c172 is_numpy_ndarray helper [pr] (#11050) 2025-07-02 09:12:53 -04:00