George Hotz
df3b320f46
rewriter -> devectorizer [pr] ( #9147 )
2025-02-18 12:42:08 +08:00
chenyu
5dc1257ce0
clean up bert fake data iterator [pr] ( #9145 )
...
reuse the same get_data_bert path in setup and real run
2025-02-17 20:03:38 -05:00
qazal
751c517b6c
cancel viz request after the kernel clicked away [pr] ( #9144 )
2025-02-17 20:19:09 +01:00
chenyu
465421b525
fix Tensor.isclose ( #9143 )
...
many corner cases around inf and nan
2025-02-17 12:03:12 -05:00
qazal
36741cbbc1
enable real_size assert for test_conv_2x2_backward_one_view [pr] ( #9142 )
2025-02-17 17:53:44 +01:00
qazal
e9ff4ef4f7
s/ScheduleContext/GrouperContext [pr] ( #9141 )
...
* refactor to kernel context [pr]
* s/ScheduleContext/GrouperContext [pr]
2025-02-17 17:14:17 +01:00
qazal
96cc9f59e0
refactor to kernel context [pr] ( #9140 )
2025-02-17 16:57:14 +01:00
qazal
df6781332e
remove var_vals from the scheduler context [pr] ( #9139 )
...
* remove var_vals from the scheduler context [pr]
* maps to int
2025-02-17 16:43:50 +01:00
Ali Ladjevardi
35e9c4657b
Use proper units when printing beam time ( #9103 )
...
* use proper units when printing beam time
* refactor DEBUG=2
2025-02-17 23:41:38 +08:00
Clément Verrier
a7f91224eb
add Tensor.isclose() ( #8844 )
...
* add `Tensor.isclose()`
* support `equal_nan`
so as to match PyTorch's behavior
* update unit tests
* remove some tests temporarily
* re-enable one test
* re-enable other test
* try to fix failing tests during CI
* save one line of code
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-17 10:11:40 -05:00
qazal
2b787c3b17
hotfix: lower ul.disabled opacity for viz [pr] ( #9138 )
2025-02-17 15:16:48 +01:00
qazal
660c034da6
KERNEL op try 3 ( #9061 )
...
* work
* tolerate shape, maybe this is ASSIGN(RESHAPE(BUF), KERNEL)
* err, it's not ASSIGN(BUF, KERNEL), it's ASSIGN(VIEW(BUF), KERNEL)
* burn the boats
* assign slightly works
* assign works
* cleanup + var_vals can exist
* fine image + fix metadata
* metadata, without making everything 30% slower
* diff pruning
* faster assign schedule
* add_buffer_ops stage
* add kernel_spec back
* add viz display
* more strict kernel_spec
2025-02-17 14:47:54 +01:00
qazal
ec80df5115
add PROGRAM renderer to viz [pr] ( #9137 )
2025-02-17 14:46:08 +01:00
qazal
7b09a72682
don't display void dtype in viz nodes [pr] ( #9136 )
...
* don't display void dtype in viz nodes [pr]
* extra
2025-02-17 13:49:36 +01:00
George Hotz
4dd10d03b7
move is_increasing to ops [pr] ( #9134 )
2025-02-17 19:27:48 +08:00
qazal
22c571d3cb
add kernel axis colors to viz [pr] ( #9129 )
...
* add kernel axis colors to viz [pr]
* slightly blending with white makes this nicer
* space
2025-02-17 12:21:35 +01:00
George Hotz
1bf66d62cf
symbolic gets its own file [pr] ( #9132 )
2025-02-17 18:55:21 +08:00
George Hotz
bd694faf6c
factor out the expander logic [pr] ( #9131 )
2025-02-17 18:09:48 +08:00
quortus
5bdf0c7951
Bitcast constant folding 2.0 ( #9089 )
...
* Prevent const folding in test_payne_hanek_reduction
* Do not use list as a default parameter
* Bitcast constant folding
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-17 18:08:20 +08:00
quortus
2be4529f14
Test broken const folding wraparound behavior ( #9080 )
...
* Test broken const folding wraparound behavior
* Add repro for test_payne_hanek_reduction const folding bug
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-17 17:44:56 +08:00
George Hotz
7eea9b639d
hotfix: add replay_pkl debugging env
2025-02-17 17:34:58 +08:00
George Hotz
af9d8d39d2
dsp matchers + bump line count to 11300 ( #9130 )
2025-02-17 17:31:54 +08:00
quortus
638d925e4e
Prevent const folding in test_payne_hanek_reduction ( #9088 )
...
* Prevent const folding in test_payne_hanek_reduction
* Do not use list as a default parameter
2025-02-17 17:31:10 +08:00
George Hotz
9289425170
add ast to ProgramSpec + pre matcher [pr] ( #9128 )
...
* add ast to ProgramSpec + pre matcher [pr]
* cleaner cast + test fix
2025-02-17 16:39:14 +08:00
qazal
fe260ac4d7
viz/server cleanups [pr] ( #9127 )
...
* viz/server cleanups [pr]
* space
2025-02-17 09:59:41 +02:00
George Hotz
a38b47e026
hotfix: DSP doesn't use that path
2025-02-17 10:45:29 +08:00
quortus
edf7213f34
Make bitcast to the same dtype noop ( #9121 )
2025-02-16 20:28:44 -05:00
Ahmed Harmouche
59fe45f947
Solve get_grouped_dims does not split issue ( #9085 )
...
* Solve dims too large errors on webgpu
* Simplify divisor find
* Test square root divisor
* Fix lint
* Refactor into group_dims and split_dims
* Refactor
* Fix lint
* Add back max check in _group_dims
* Prefer grouping over split
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-16 19:57:29 -05:00
Ahmed Harmouche
84dc331dd1
Refactor async ( #9126 )
2025-02-16 17:47:15 -05:00
qazal
6a9e5598f9
small viz touchups [pr] ( #9123 )
2025-02-16 20:07:40 +01:00
qazal
b3127f38e6
faster viz data fetching with streaming [pr] ( #9122 )
...
* refactor to generator
* yield
* switch to SSE
* start client side + end events
* start javascript work
* need to redo this whole part
* more correct
* diff
* works
* diff cleanup
* more diff cleanup
2025-02-16 19:31:11 +01:00
uuuvn
8926bac00a
am: profiling working ( #9119 )
...
ops_amd.py registres device finalization via atexit.register after
finalize_profile is registred in device.py leading to AM device
being closed before finalizing profile leading to hang.
(atexit.register is LIFO: https://docs.python.org/3.12/library/atexit.html#atexit.register )
This pr moves registring device finalization to device.py before
registring profile finalization
2025-02-16 18:51:08 +03:00
qazal
97cb9cb1ed
always viz the first graph + non blocking matches fetch [pr] ( #9117 )
...
* always display the first graph in viz [pr]
* simpler
* progress indicator is the matches list style
* remove extra
* back
* res.json is still slow
2025-02-16 13:39:51 +01:00
chenyu
1fda98d14f
fix import time_linearizer [pr] ( #9118 )
...
only test that used it was skipped in CI due to being slow
2025-02-15 21:33:28 -05:00
chenyu
c1dfe5c00d
compact get_late_rewrite_patterns [pr] ( #9116 )
2025-02-15 20:33:09 -05:00
qazal
2e97022e5e
remove extra block in viz [pr] ( #9115 )
2025-02-16 02:38:09 +02:00
chenyu
fd95543ff1
user scatter_reduce in scatter [pr] ( #9114 )
2025-02-15 18:21:01 -05:00
chenyu
c954419bc8
minor tweak to transcendental pow ( #9112 )
...
also added more pow with const test cases
2025-02-15 18:03:25 -05:00
chenyu
8dfa0024f0
raise in scatter if self and src have different dtype [pr] ( #9109 )
...
raise RuntimeError that matches torch instead of an implcitly cast
2025-02-15 11:21:34 -05:00
chenyu
d129ccda4c
add RAWAST back to DEBUG=3 [pr] ( #9107 )
2025-02-15 09:12:51 -05:00
qazal
2e19976d03
assert views in tensor uops [pr] ( #9106 )
2025-02-15 13:27:55 +02:00
George Hotz
81f5a7af7d
improve DEBUG=3 [pr] ( #9105 )
2025-02-15 18:44:56 +08:00
qazal
41d143d27c
new order to prepare for becomes_map = tensor_map [pr] ( #9104 )
2025-02-15 10:37:36 +01:00
George Hotz
4672d9af73
actual tests for the dsp backend [pr] ( #9102 )
...
* actual tests for the dsp backend [pr]
* fix name
2025-02-15 15:17:56 +08:00
George Hotz
7e09057afa
fixup clang devectorize ( #9099 )
...
* fixup clang devectorize
* __builtin_convertvector is some casts
* dsp fixups
2025-02-15 09:29:47 +08:00
Marcello Fuschi
8824f7e9df
Make logcumsumexp numerically stable ( #9050 )
...
* Make logcumsumexp numerically stable
* Refactor
* Refactor for special case ndim=0
* Refactor
* Use the correct device for mask
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-14 19:25:17 -05:00
chenyu
81597ddd96
increase lr for bert ( #9098 )
...
had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview
2025-02-14 19:10:35 -05:00
b1tg
3ad39b247b
refactor LLVMRenderer ( #9090 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-02-15 08:00:31 +08:00
b1tg
1f1362fd27
add truncate_bf16 ( #9078 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-02-15 07:59:09 +08:00
Ahmed Harmouche
2dc8f1867c
Synchronize webgpu ( #9093 )
2025-02-15 00:52:10 +03:00