qazal
a773ff73e3
match image cast folding on the cast itself [pr] ( #9166 )
2025-02-19 09:31:34 +02:00
qazal
9a20063837
create subbuffer immediately before constructing ScheduleItem [pr] ( #9162 )
2025-02-18 21:07:52 +01:00
qazal
1c92534bff
hotfix: viz should show if there's a rewrite [pr] ( #9161 )
2025-02-18 19:11:03 +01:00
George Hotz
a330f3338c
save applied opts in ProgramSpec [pr] ( #9150 )
2025-02-19 00:40:03 +08:00
chenyu
ff05bff221
put bert data shard inside jit ( #9160 )
...
python time 45ms -> 9ms, it was spending time to schedule the shard
also init bert data on CLANG since it's from numpy, so we don't create the tensor on default device then shard into GPUS
2025-02-18 10:36:54 -05:00
qazal
679291e26a
assert only base maps to buffer [pr] ( #9159 )
2025-02-18 15:46:47 +01:00
qazal
4f592eeea6
hotfix: remove extra matcher for copy/buffer_view [pr] ( #9157 )
2025-02-18 13:21:24 +01:00
George Hotz
ff9b985d9f
hotfix: View Base AST
2025-02-18 18:48:34 +08:00
George Hotz
30f470eaa3
UNIQUE UOp for buffer instead of arg ( #9156 )
...
* UNIQUE UOp for buffer instead of arg
* factor out buffer spec
2025-02-18 16:59:59 +08:00
qazal
38f5ea2132
increment writable buffers refcount from the kernel graph [pr] ( #9153 )
2025-02-18 10:20:02 +02:00
George Hotz
ddddcc165b
colors back in DEBUG=2 [pr] ( #9155 )
2025-02-18 16:17:57 +08:00
George Hotz
6d62966bf7
add support for named rewrites [pr] ( #9152 )
2025-02-18 16:07:04 +08:00
George Hotz
caee42e8a6
Revert "name from uops [pr] ( #9151 )" ( #9154 )
...
This reverts commit 28897be9a2 .
2025-02-18 16:06:44 +08:00
George Hotz
28897be9a2
name from uops [pr] ( #9151 )
2025-02-18 15:52:03 +08:00
George Hotz
a4dab3ec3f
add name uop ( #9149 )
...
* add name uop, TODO: refactor renderer to use
* renderer uses name uop
* fix tests
* render
* ptx
2025-02-18 15:26:58 +08:00
George Hotz
2db8b4046a
minor linearizer refactor to finalize in rewrite [pr] ( #9148 )
2025-02-18 12:42:22 +08:00
George Hotz
df3b320f46
rewriter -> devectorizer [pr] ( #9147 )
2025-02-18 12:42:08 +08:00
chenyu
5dc1257ce0
clean up bert fake data iterator [pr] ( #9145 )
...
reuse the same get_data_bert path in setup and real run
2025-02-17 20:03:38 -05:00
qazal
751c517b6c
cancel viz request after the kernel clicked away [pr] ( #9144 )
2025-02-17 20:19:09 +01:00
chenyu
465421b525
fix Tensor.isclose ( #9143 )
...
many corner cases around inf and nan
2025-02-17 12:03:12 -05:00
qazal
36741cbbc1
enable real_size assert for test_conv_2x2_backward_one_view [pr] ( #9142 )
2025-02-17 17:53:44 +01:00
qazal
e9ff4ef4f7
s/ScheduleContext/GrouperContext [pr] ( #9141 )
...
* refactor to kernel context [pr]
* s/ScheduleContext/GrouperContext [pr]
2025-02-17 17:14:17 +01:00
qazal
96cc9f59e0
refactor to kernel context [pr] ( #9140 )
2025-02-17 16:57:14 +01:00
qazal
df6781332e
remove var_vals from the scheduler context [pr] ( #9139 )
...
* remove var_vals from the scheduler context [pr]
* maps to int
2025-02-17 16:43:50 +01:00
Ali Ladjevardi
35e9c4657b
Use proper units when printing beam time ( #9103 )
...
* use proper units when printing beam time
* refactor DEBUG=2
2025-02-17 23:41:38 +08:00
Clément Verrier
a7f91224eb
add Tensor.isclose() ( #8844 )
...
* add `Tensor.isclose()`
* support `equal_nan`
so as to match PyTorch's behavior
* update unit tests
* remove some tests temporarily
* re-enable one test
* re-enable other test
* try to fix failing tests during CI
* save one line of code
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-17 10:11:40 -05:00
qazal
2b787c3b17
hotfix: lower ul.disabled opacity for viz [pr] ( #9138 )
2025-02-17 15:16:48 +01:00
qazal
660c034da6
KERNEL op try 3 ( #9061 )
...
* work
* tolerate shape, maybe this is ASSIGN(RESHAPE(BUF), KERNEL)
* err, it's not ASSIGN(BUF, KERNEL), it's ASSIGN(VIEW(BUF), KERNEL)
* burn the boats
* assign slightly works
* assign works
* cleanup + var_vals can exist
* fine image + fix metadata
* metadata, without making everything 30% slower
* diff pruning
* faster assign schedule
* add_buffer_ops stage
* add kernel_spec back
* add viz display
* more strict kernel_spec
2025-02-17 14:47:54 +01:00
qazal
ec80df5115
add PROGRAM renderer to viz [pr] ( #9137 )
2025-02-17 14:46:08 +01:00
qazal
7b09a72682
don't display void dtype in viz nodes [pr] ( #9136 )
...
* don't display void dtype in viz nodes [pr]
* extra
2025-02-17 13:49:36 +01:00
George Hotz
4dd10d03b7
move is_increasing to ops [pr] ( #9134 )
2025-02-17 19:27:48 +08:00
qazal
22c571d3cb
add kernel axis colors to viz [pr] ( #9129 )
...
* add kernel axis colors to viz [pr]
* slightly blending with white makes this nicer
* space
2025-02-17 12:21:35 +01:00
George Hotz
1bf66d62cf
symbolic gets its own file [pr] ( #9132 )
2025-02-17 18:55:21 +08:00
George Hotz
bd694faf6c
factor out the expander logic [pr] ( #9131 )
2025-02-17 18:09:48 +08:00
quortus
5bdf0c7951
Bitcast constant folding 2.0 ( #9089 )
...
* Prevent const folding in test_payne_hanek_reduction
* Do not use list as a default parameter
* Bitcast constant folding
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-17 18:08:20 +08:00
quortus
2be4529f14
Test broken const folding wraparound behavior ( #9080 )
...
* Test broken const folding wraparound behavior
* Add repro for test_payne_hanek_reduction const folding bug
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-17 17:44:56 +08:00
George Hotz
7eea9b639d
hotfix: add replay_pkl debugging env
2025-02-17 17:34:58 +08:00
George Hotz
af9d8d39d2
dsp matchers + bump line count to 11300 ( #9130 )
2025-02-17 17:31:54 +08:00
quortus
638d925e4e
Prevent const folding in test_payne_hanek_reduction ( #9088 )
...
* Prevent const folding in test_payne_hanek_reduction
* Do not use list as a default parameter
2025-02-17 17:31:10 +08:00
George Hotz
9289425170
add ast to ProgramSpec + pre matcher [pr] ( #9128 )
...
* add ast to ProgramSpec + pre matcher [pr]
* cleaner cast + test fix
2025-02-17 16:39:14 +08:00
qazal
fe260ac4d7
viz/server cleanups [pr] ( #9127 )
...
* viz/server cleanups [pr]
* space
2025-02-17 09:59:41 +02:00
George Hotz
a38b47e026
hotfix: DSP doesn't use that path
2025-02-17 10:45:29 +08:00
quortus
edf7213f34
Make bitcast to the same dtype noop ( #9121 )
2025-02-16 20:28:44 -05:00
Ahmed Harmouche
59fe45f947
Solve get_grouped_dims does not split issue ( #9085 )
...
* Solve dims too large errors on webgpu
* Simplify divisor find
* Test square root divisor
* Fix lint
* Refactor into group_dims and split_dims
* Refactor
* Fix lint
* Add back max check in _group_dims
* Prefer grouping over split
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-16 19:57:29 -05:00
Ahmed Harmouche
84dc331dd1
Refactor async ( #9126 )
2025-02-16 17:47:15 -05:00
qazal
6a9e5598f9
small viz touchups [pr] ( #9123 )
2025-02-16 20:07:40 +01:00
qazal
b3127f38e6
faster viz data fetching with streaming [pr] ( #9122 )
...
* refactor to generator
* yield
* switch to SSE
* start client side + end events
* start javascript work
* need to redo this whole part
* more correct
* diff
* works
* diff cleanup
* more diff cleanup
2025-02-16 19:31:11 +01:00
uuuvn
8926bac00a
am: profiling working ( #9119 )
...
ops_amd.py registres device finalization via atexit.register after
finalize_profile is registred in device.py leading to AM device
being closed before finalizing profile leading to hang.
(atexit.register is LIFO: https://docs.python.org/3.12/library/atexit.html#atexit.register )
This pr moves registring device finalization to device.py before
registring profile finalization
2025-02-16 18:51:08 +03:00
qazal
97cb9cb1ed
always viz the first graph + non blocking matches fetch [pr] ( #9117 )
...
* always display the first graph in viz [pr]
* simpler
* progress indicator is the matches list style
* remove extra
* back
* res.json is still slow
2025-02-16 13:39:51 +01:00
chenyu
1fda98d14f
fix import time_linearizer [pr] ( #9118 )
...
only test that used it was skipped in CI due to being slow
2025-02-15 21:33:28 -05:00