Nino Risteski
a1a146a499
adding enable_gqa in SDPA ( #11097 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-06 23:25:33 -07:00
nimlgen
b73e89110e
nv: align allocations for perf ( #11114 )
2025-07-06 22:32:11 +03:00
chenyu
7468959f4b
Tensor.argsort ( #11112 )
2025-07-06 13:56:35 -04:00
kevvz
b7af9cf849
clean svd tests, set full_matrices false in torch backend ( #11113 )
...
* clean tests, set full_matrices false
* add more shape asserts
2025-07-06 13:55:49 -04:00
qazal
a556f50668
viz: small ui fixes ( #11110 )
...
* share styling of ctx-list and metadata
* scrollbar-gutter: stable prevents layout shift when changing steps
* margin-left makes left side unaligned
2025-07-06 17:05:36 +03:00
chenyu
ba88ec3ad0
pipe linalg svd to torch ( #11109 )
...
and found a bug in svd
2025-07-06 08:37:25 -04:00
chenyu
845a4d32bc
Tensor.diag ( #11108 )
...
also updated Tensor.eye to use it
2025-07-05 23:03:02 -04:00
ttomsa
4905af4ae0
remove invalid int div test ( #11106 )
...
* rm test
* also rm this
2025-07-05 18:57:55 -04:00
qazal
a4aa769c0a
fix: type checking for track_rewrites key [pr] ( #11104 )
...
* fix: type checking for track_rewrites key [pr]
* also for cpu_profile
* func.__name__ to start
2025-07-05 20:11:21 +03:00
qazal
81781dc12b
viz: renames and spacing changes to tracing ( #11102 )
2025-07-05 18:40:39 +03:00
qazal
7619bf35e7
cleanup: remove disabled TestIndexingOrdering ( #11101 )
...
* cleanup: remove disabled TestIndexingOrdering
* don't import kernelize internals
2025-07-05 18:14:37 +03:00
qazal
4fcfaa0ef7
viz: switch to TracingKey ( #11100 )
...
* viz: switch to TracingKey
* tuple
* order is name, keys, fmt
* add test_tracing_key
2025-07-05 17:46:18 +03:00
qazal
458be950d9
viz: add TINY device ( #11095 )
...
* viz: add TINY device
* replace Any with a proper type
* reorder
* diff
* rename
* space
* from diff
* multiple keys
2025-07-05 16:54:55 +03:00
nimlgen
4dccb2ea49
am_smi: increase kill retries ( #11099 )
2025-07-05 16:23:50 +03:00
chenyu
39b4d72687
remove flatten and reshape in sparse_categorical_crossentropy [pr] ( #11093 )
...
not needed, directly operating on the classes dim is fine
2025-07-04 15:15:27 -04:00
nimlgen
577afc9f05
hcq: remove redunt syncs and fix typing ( #11096 )
...
Before this patch the code could issues reduntdant syncs because of
the typing issue. Current tests should cover all correctness checks.
2025-07-04 21:49:47 +03:00
qazal
41aa54eb5a
viz: resolve all graph references in python ( #11087 )
...
* viz: resolve all graph references in python
* it just maps things to the index
* always map the name
* key on the uop
* diff
* close
2025-07-04 20:35:25 +03:00
qazal
3d8569f6d8
hotfix: infinite loop in tracking pattern matcher ( #11094 )
...
* failing test
* fix that
* given matchers
2025-07-04 19:55:26 +03:00
qazal
a783211fc7
viz: allow end_time=None in trace events ( #11092 )
2025-07-04 17:48:17 +03:00
0xSG
17119b0f23
hip_ioctl: platform.machine added ( #11084 )
2025-07-04 17:20:24 +03:00
nimlgen
6656aa162c
nv: enable huge pages ( #11091 )
2025-07-04 17:17:24 +03:00
nimlgen
01f3c4f44d
memory: simpler paddr allocation logic ( #11090 )
...
* memory: new paddr allocation logic
* am fix
* am refactrros
* fix
* mypy
* use it
* am
2025-07-04 17:00:36 +03:00
qazal
f6d55d9272
viz: pickle UPat location ( #11086 )
2025-07-04 13:09:00 +03:00
qazal
2403f126ed
move printable out of UPat [pr] ( #11085 )
...
* move printable out of UPat [pr]
* print_match_stats
2025-07-04 12:31:11 +03:00
qazal
988540f401
support capturing cpu_profile on error ( #11078 )
...
* support capturing cpu_profile on error
* spacing
* pylint complains
2025-07-04 11:53:12 +03:00
chenyu
a2f5a54458
move sparse_categorical_crossentropy to test_ops ( #11083 )
...
also flattened the tests
2025-07-03 21:40:54 -04:00
chenyu
7c8ccb0267
sparse_categorical_crossentropy cleanup [pr] ( #11082 )
2025-07-03 18:32:52 -04:00
nimlgen
e02ee8ef1b
nv: cleanups from 5090 ( #11081 )
2025-07-04 00:08:47 +03:00
George Hotz
e9a01dd04a
Revert "Fix division by zero in add views ( #11075 )" ( #11080 )
...
This reverts commit 19f07e72f6 .
2025-07-03 11:39:44 -07:00
Sieds Lykles
19f07e72f6
Fix division by zero in add views ( #11075 )
2025-07-03 11:37:59 -07:00
chenyu
678cabc6f2
use argfix in Tensor.stack ( #11077 )
...
works for multiple Tensor args or single tuple/list of Tensors, but not the mixed
2025-07-03 12:15:11 -04:00
qazal
b695e8c4d6
viz: remove support for naming with self ( #11076 )
2025-07-03 17:29:14 +03:00
Sieds Lykles
53985297bd
add test, fix rewrite rule and raise error on division by zero ( #11073 )
2025-07-03 08:25:06 -04:00
nimlgen
2d138c6cf1
am: factor out init_sw ( #11070 )
2025-07-03 11:01:17 +03:00
quortus
a937ac80dc
Replace ASSIGN with STORE in UPat compiler ( #11065 )
2025-07-02 19:15:43 -07:00
George Hotz
d049639221
little setitem test ( #11064 )
...
* setitem has one less realize, why broken
* put realize back
2025-07-02 15:10:24 -07:00
quortus
17d85b9793
Refactor STORE implementation in ops_python ( #11060 )
2025-07-02 14:29:12 -07:00
George Hotz
3b85534df0
outerworld range test [pr] ( #11059 )
...
* outerworld range test [pr]
* bound range
* grad acc test
* more tests
* 5 steps is fine
2025-07-02 14:28:44 -07:00
chenyu
425d5f55c4
generate kernel dataset and upload artifact ( #11063 )
2025-07-02 17:21:25 -04:00
chenyu
09cc64eea7
remove const 0 clause in "UOp with size 0 is zero" [pr] ( #11061 )
2025-07-02 16:36:40 -04:00
chenyu
4d57437a67
add timeout to benchmark_search and mlperf action ( #11058 )
...
default timeout is 6 hours which is too long and occupies a box
2025-07-02 14:17:34 -04:00
nimlgen
6067568087
nv: remove hardcoded CTRL_CMD_VASPACE_COPY_SERVER_RESERVED_PDES ( #11057 )
2025-07-02 20:41:10 +03:00
qazal
ad155f5454
print inputs to get_program in process replay [pr] ( #11051 )
...
* print inputs to get_program in process replay [pr]
* colors
* keep dataclass default escapes
* Revert "keep dataclass default escapes"
This reverts commit c6db7e8a7a .
* note for ast_repr
* add that back
2025-07-02 20:20:01 +03:00
Ignacio Sica
a22aa77c82
cleanup opts_to_apply ( #11055 )
...
* fix kernelinfo init in fixup_ast
* opts_to_apply None
2025-07-02 20:03:19 +03:00
qazal
a919b8325b
add test_kernel_info ( #11054 )
...
* add test_kernel_info
* reorder
2025-07-02 19:48:12 +03:00
kevvz
3b041d188f
[bounty] Singular Value Decomposition ( #10875 )
...
* inital commit
* add qr + expand svd to full matrix
* add odd number support
* add linalg tests
* qr supports dims of arbitrary size
* add qr tests
* svd supports dims of arbitrary size
* small cleanip
* improvements over svd batch handling
* improve linalg tests
* make u_pad match q shape
* add nonfull matrix tests
* little less verbose nonfull svd test
* added dtypes on svd + return vt instead of vt
* lint
* more lint
* lint + set seed
* small fix
* small lint
* lint
* add int casting to indices and shapes
* remove int from shape tuple in svd
* small cleanup
* add return types
* reuse inverse_permute
* refactoring
* whitespace
* remove regularization term to prevent bad outputs on ill conditioned matrices
* remove seed
* refactor
* lint
* refactor
* spacing
* remove clone
* line reduction
* smarter heuristic for iterations_per_round
* add big test
* lint
* turns out no constant needed?
* wrap tests
* some small matrices need the constant
* remove realize
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-07-02 09:06:03 -07:00
Ignacio Sica
fc42c3063e
use kernel info ( #11049 )
...
* use kernel info
* keep api
* revert change in comment
2025-07-02 08:42:32 -07:00
Ahmed Harmouche
e992ed10dc
WebGPU on Windows ( #10890 )
...
* WebGPU on Windows
* Fix dawn-python install
* New test
* pydeps
* Minor fix
* Only install dawn-python on windows webgpu
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-07-02 08:38:45 -07:00
nimlgen
e67a6d2310
nv: tiny cleanups ( #11053 )
2025-07-02 18:37:32 +03:00
chenyu
4626e9c172
is_numpy_ndarray helper [pr] ( #11050 )
2025-07-02 09:12:53 -04:00