nimlgen
346b8542da
nv: fix inval from gpu_get_id_info_v2 ( #10670 )
2025-06-07 00:54:32 +03:00
chenyu
bdede4924e
fix odd number in get_test_global_size ( #10671 )
...
factor might not be a integer if input global_size has an odd number in it
2025-06-06 17:31:35 -04:00
George Hotz
bf4ffc054c
mstack replaces scheduler complexity ( #10654 )
...
* mstack replaces scheduler complexity
* leave that one
* contiguous
* work
* upd
* minimal failing test
* simpler
* attention is broken
* fix transformer
* failing tests
* real fix for llama
* kv cache test
* jit multi assign test
* better tests
* comment
* fix jit issue
* traverse after buf_uop
2025-06-06 11:31:41 -07:00
George Hotz
7f0f97aa76
new test_multitensor tests ( #10667 )
...
* new test_multitensor tests
* cleanup scheduler
2025-06-06 10:26:28 -07:00
qazal
5170f387b3
remove UOp.metaop [pr] ( #10664 )
...
* little simpler UOp.const_like [pr]
* remove UOp.metaop
* bind
* remove
* min diff
* that comment is fine
2025-06-06 16:21:48 +03:00
chenyu
4a6d84c4c3
hotfix llama start_pos vmax is max_context-1 ( #10659 )
...
* hotfix llama start_pos vmax is max_context-1
fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0`
* hotfix: multitensor transformer test tests kv cache
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2025-06-06 00:41:25 -04:00
George Hotz
5eb6e1e65a
Revert "hotfix: multitensor transformer test tests kv cache"
...
This reverts commit ad9f88419a .
2025-06-05 21:15:34 -07:00
George Hotz
ad9f88419a
hotfix: multitensor transformer test tests kv cache
2025-06-05 21:08:57 -07:00
George Hotz
8325c4f192
tests for multi assign ( #10658 )
...
* tests for multi assign
* transformer tests
* add that assert
2025-06-05 20:56:40 -07:00
wozeparrot
0d86f8d375
fix failed threefry ( #10646 )
2025-06-05 17:17:42 -07:00
chenyu
e67642d430
update doc example for multinomial ( #10657 )
...
also added many `s` for consistency
2025-06-05 20:16:52 -04:00
Eitan Turok
61352b8aa2
Add some more docs ( #10634 )
...
* more docs
* Add multinomial to ops
* better doc
2025-06-05 19:40:37 -04:00
qazal
884b6cf288
remove gbarrier on const ( #10656 )
2025-06-06 02:36:52 +03:00
chenyu
ff1aad7b69
fix const float pow to int tensor ( #10655 )
...
was incorrectly casted into int
2025-06-05 19:15:12 -04:00
George Hotz
6619f17e26
force store to be contiguous ( #10652 )
2025-06-05 15:42:54 -07:00
wozeparrot
37e1ef1be3
feat: cleanup old AM processes ( #10653 )
2025-06-05 15:41:00 -07:00
George Hotz
baba274a76
minimal mstack pr to fix allreduce ( #10649 )
...
* minimal mstack pr to fix allreduce
* fix webgpu
2025-06-05 15:14:53 -07:00
George Hotz
4c315f8e17
MSTACK little non-functional changes ( #10648 )
2025-06-05 13:20:22 -07:00
b1tg
79d04d1baf
AMD_LLVM: support mfma for mi300x ( #10625 )
...
* amd llvm: support mfma for mi300x
* don't pass self
* refactor wmma render
* arch as lambda arg
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-06-05 15:55:44 -04:00
chenyu
46811d0d3c
minor external_model_benchmark cleanup ( #10644 )
2025-06-05 14:13:28 -04:00
qazal
26afbc954f
delete redundant tests from test_schedule [pr] ( #10643 )
2025-06-05 20:08:39 +03:00
chenyu
80ebce421d
remove metal buffer limit in external_model_benchmark [pr] ( #10642 )
...
not needed anymore
2025-06-05 13:00:51 -04:00
qazal
28c4997236
check for matching shape order in fused reduce ( #10641 )
...
* failing test
* shapes match with ones removed
2025-06-05 19:37:22 +03:00
qazal
1190062812
prevent grouper can_chase while fusing arange [pr] ( #10623 )
2025-06-05 18:50:21 +03:00
uuuvn
69f7778985
refactor renderer launch bounds [pr] ( #10617 )
2025-06-05 08:38:04 -07:00
qazal
8c5ea00522
push permutes through fused reduces ( #10628 )
...
* fix pushing reshapes through reduceops
* reduceop_view_right should assert on ndims mismatch
* update that, view.reshape asserts it
2025-06-05 16:14:04 +03:00
qazal
8db0ba1161
simpler swizzle_reducop + comments [pr] ( #10638 )
2025-06-05 13:54:49 +03:00
qazal
ed37f29184
remove unused lib directory from viz setup [pr] ( #10639 )
2025-06-05 13:54:31 +03:00
chenyu
f6d7db25b7
simpler unbind_view [pr] ( #10636 )
2025-06-05 01:03:27 -04:00
chenyu
d0969f5a1f
cleanup multi tests ( #10635 )
2025-06-05 00:28:44 -04:00
qazal
571c0296a9
linearizer failure from FUSE_ARANGE default diff ( #10629 )
...
* start with test_arange_sum
* test_arange_avgpool2d
* device.renderer.supports_float4
2025-06-04 19:11:52 +03:00
qazal
5056d21b29
add failing TestSchedule.test_arange_sum [pr] ( #10627 )
2025-06-04 17:23:59 +03:00
gill
9acaa6bc9a
Fix button layout in viz UI for safari ( #10621 )
...
Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local >
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2025-06-04 15:33:22 +03:00
Xingyu
7a1bfb668d
Implement linalg_eigh function for tensor eigenvalue decomposition in torch backend ( #10612 )
...
* Implement private _linalg_eigh function for tensor eigenvalue decomposition in torch backend
* Add unit test for linalg.eigh function in TestTorchBackend
This test verifies the eigenvalue decomposition of a 2x2 tensor using the linalg.eigh function, ensuring the computed eigenvalues and reconstructed tensor match the expected results.
2025-06-04 07:59:50 -04:00
qazal
7114b6ab31
viz browser tests ( #10626 )
...
* viz browser tests
* expect failure if js/ isn't included
* back green
2025-06-04 14:58:24 +03:00
Fang-Pen Lin
b0913295d2
Add missing js files in python package data for viz ( #10624 )
2025-06-04 10:49:43 +03:00
wozeparrot
4d1686f767
clean: becnhmark -> benchmark ( #10620 )
2025-06-03 19:28:18 -07:00
chenyu
18e9ec3ea1
add wino cifar to search benchmark ( #10615 )
...
* add wino cifar to search benchmark
* FUSE_OPTIM=1
* revert those
2025-06-03 20:38:43 -04:00
Bhavya Gada
bafd0c30d7
fix some minor typos and grammar ( #10619 )
2025-06-03 15:55:25 -07:00
nimlgen
4381b54543
am: disable page migration ( #10608 )
...
* am: disable page migration
* fixed
* enable
* fxi
* typ
* fix check
2025-06-03 18:51:28 +03:00
chenyu
1c1f578490
DISABLE_COMPILER_CACHE in sdxl search ( #10614 )
2025-06-03 09:22:25 -04:00
qazal
ce9f12dc13
reorder cast before masking constants ( #10609 )
...
* failing test from fuzzer
* .numpy() handles bfloat16 better
* const->view->cast becomes const->cast->view
* update TestMovedConstFolding.test_cast_padded
2025-06-03 15:44:03 +03:00
qazal
910cabb081
add kernel count to grouper process replay differ [pr] ( #10611 )
2025-06-03 15:21:27 +03:00
chenyu
26dee71bc1
hotfix don't overwrite acc dtype in scatter_reduce ( #10606 )
...
dtype is inferred by individul reduce
2025-06-02 21:17:01 -04:00
ihar
ba02a6331e
removed unnecessary 'isinstance(data, UOp)' check ( #10605 )
2025-06-02 20:58:14 -04:00
nimlgen
07de095b27
am: more info on PFs ( #10602 )
...
* am: more info on PFs
* fix
2025-06-02 23:48:40 +03:00
qazal
b8fb2ba829
rename to finalize_gbarrier [pr] ( #10596 )
2025-06-02 12:55:31 +03:00
Ahmed Harmouche
650404a143
[webgpu] Proper shared mem size for packed types ( #10585 )
...
* Proper shared mem size in webgpu
* Add test
* Refactor test
2025-06-01 20:18:33 -04:00
qazal
00822603ec
allow stacking of VIEW UOps [pr] ( #10532 )
...
* allow stacking of VIEW UOps [pr]
* merge_views is first
* simpler
* loc for pr, this needs a helper
* keep
* diff [pr]
* formatting
2025-06-01 23:27:04 +03:00
qazal
3cc73a0172
simpler process replay main loop [pr] ( #10588 )
...
* simpler process replay main loop [pr]
* use logging
* default to 1
2025-06-01 15:03:21 +03:00