qazal
ec80df5115
add PROGRAM renderer to viz [pr] ( #9137 )
2025-02-17 14:46:08 +01:00
qazal
7b09a72682
don't display void dtype in viz nodes [pr] ( #9136 )
...
* don't display void dtype in viz nodes [pr]
* extra
2025-02-17 13:49:36 +01:00
George Hotz
4dd10d03b7
move is_increasing to ops [pr] ( #9134 )
2025-02-17 19:27:48 +08:00
qazal
22c571d3cb
add kernel axis colors to viz [pr] ( #9129 )
...
* add kernel axis colors to viz [pr]
* slightly blending with white makes this nicer
* space
2025-02-17 12:21:35 +01:00
George Hotz
1bf66d62cf
symbolic gets its own file [pr] ( #9132 )
2025-02-17 18:55:21 +08:00
George Hotz
bd694faf6c
factor out the expander logic [pr] ( #9131 )
2025-02-17 18:09:48 +08:00
quortus
5bdf0c7951
Bitcast constant folding 2.0 ( #9089 )
...
* Prevent const folding in test_payne_hanek_reduction
* Do not use list as a default parameter
* Bitcast constant folding
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-17 18:08:20 +08:00
quortus
2be4529f14
Test broken const folding wraparound behavior ( #9080 )
...
* Test broken const folding wraparound behavior
* Add repro for test_payne_hanek_reduction const folding bug
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-17 17:44:56 +08:00
George Hotz
7eea9b639d
hotfix: add replay_pkl debugging env
2025-02-17 17:34:58 +08:00
George Hotz
af9d8d39d2
dsp matchers + bump line count to 11300 ( #9130 )
2025-02-17 17:31:54 +08:00
quortus
638d925e4e
Prevent const folding in test_payne_hanek_reduction ( #9088 )
...
* Prevent const folding in test_payne_hanek_reduction
* Do not use list as a default parameter
2025-02-17 17:31:10 +08:00
George Hotz
9289425170
add ast to ProgramSpec + pre matcher [pr] ( #9128 )
...
* add ast to ProgramSpec + pre matcher [pr]
* cleaner cast + test fix
2025-02-17 16:39:14 +08:00
qazal
fe260ac4d7
viz/server cleanups [pr] ( #9127 )
...
* viz/server cleanups [pr]
* space
2025-02-17 09:59:41 +02:00
George Hotz
a38b47e026
hotfix: DSP doesn't use that path
2025-02-17 10:45:29 +08:00
quortus
edf7213f34
Make bitcast to the same dtype noop ( #9121 )
2025-02-16 20:28:44 -05:00
Ahmed Harmouche
59fe45f947
Solve get_grouped_dims does not split issue ( #9085 )
...
* Solve dims too large errors on webgpu
* Simplify divisor find
* Test square root divisor
* Fix lint
* Refactor into group_dims and split_dims
* Refactor
* Fix lint
* Add back max check in _group_dims
* Prefer grouping over split
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-16 19:57:29 -05:00
Ahmed Harmouche
84dc331dd1
Refactor async ( #9126 )
2025-02-16 17:47:15 -05:00
qazal
6a9e5598f9
small viz touchups [pr] ( #9123 )
2025-02-16 20:07:40 +01:00
qazal
b3127f38e6
faster viz data fetching with streaming [pr] ( #9122 )
...
* refactor to generator
* yield
* switch to SSE
* start client side + end events
* start javascript work
* need to redo this whole part
* more correct
* diff
* works
* diff cleanup
* more diff cleanup
2025-02-16 19:31:11 +01:00
uuuvn
8926bac00a
am: profiling working ( #9119 )
...
ops_amd.py registres device finalization via atexit.register after
finalize_profile is registred in device.py leading to AM device
being closed before finalizing profile leading to hang.
(atexit.register is LIFO: https://docs.python.org/3.12/library/atexit.html#atexit.register )
This pr moves registring device finalization to device.py before
registring profile finalization
2025-02-16 18:51:08 +03:00
qazal
97cb9cb1ed
always viz the first graph + non blocking matches fetch [pr] ( #9117 )
...
* always display the first graph in viz [pr]
* simpler
* progress indicator is the matches list style
* remove extra
* back
* res.json is still slow
2025-02-16 13:39:51 +01:00
chenyu
1fda98d14f
fix import time_linearizer [pr] ( #9118 )
...
only test that used it was skipped in CI due to being slow
2025-02-15 21:33:28 -05:00
chenyu
c1dfe5c00d
compact get_late_rewrite_patterns [pr] ( #9116 )
2025-02-15 20:33:09 -05:00
qazal
2e97022e5e
remove extra block in viz [pr] ( #9115 )
2025-02-16 02:38:09 +02:00
chenyu
fd95543ff1
user scatter_reduce in scatter [pr] ( #9114 )
2025-02-15 18:21:01 -05:00
chenyu
c954419bc8
minor tweak to transcendental pow ( #9112 )
...
also added more pow with const test cases
2025-02-15 18:03:25 -05:00
chenyu
8dfa0024f0
raise in scatter if self and src have different dtype [pr] ( #9109 )
...
raise RuntimeError that matches torch instead of an implcitly cast
2025-02-15 11:21:34 -05:00
chenyu
d129ccda4c
add RAWAST back to DEBUG=3 [pr] ( #9107 )
2025-02-15 09:12:51 -05:00
qazal
2e19976d03
assert views in tensor uops [pr] ( #9106 )
2025-02-15 13:27:55 +02:00
George Hotz
81f5a7af7d
improve DEBUG=3 [pr] ( #9105 )
2025-02-15 18:44:56 +08:00
qazal
41d143d27c
new order to prepare for becomes_map = tensor_map [pr] ( #9104 )
2025-02-15 10:37:36 +01:00
George Hotz
4672d9af73
actual tests for the dsp backend [pr] ( #9102 )
...
* actual tests for the dsp backend [pr]
* fix name
2025-02-15 15:17:56 +08:00
George Hotz
7e09057afa
fixup clang devectorize ( #9099 )
...
* fixup clang devectorize
* __builtin_convertvector is some casts
* dsp fixups
2025-02-15 09:29:47 +08:00
Marcello Fuschi
8824f7e9df
Make logcumsumexp numerically stable ( #9050 )
...
* Make logcumsumexp numerically stable
* Refactor
* Refactor for special case ndim=0
* Refactor
* Use the correct device for mask
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-14 19:25:17 -05:00
chenyu
81597ddd96
increase lr for bert ( #9098 )
...
had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview
2025-02-14 19:10:35 -05:00
b1tg
3ad39b247b
refactor LLVMRenderer ( #9090 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-02-15 08:00:31 +08:00
b1tg
1f1362fd27
add truncate_bf16 ( #9078 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-02-15 07:59:09 +08:00
Ahmed Harmouche
2dc8f1867c
Synchronize webgpu ( #9093 )
2025-02-15 00:52:10 +03:00
chenyu
b58e7b1898
zero out the weight in bert init run ( #9076 )
...
`DEFAULT_FLOAT=HALF BENCHMARK=10 BS=66 EVAL_BS=6 GPUS=6 MODEL=bert python3 examples/mlperf/model_train.py` no longer oom. I think the buffer of random init weights caused the oom.
2025-02-14 08:40:41 -05:00
qazal
82ad0d2e65
keep CONST/BUFFER uops in tensor_map [pr] ( #9083 )
2025-02-14 14:50:08 +02:00
qazal
65297066c2
move buffer refcount increment to the toposort [pr] ( #9081 )
2025-02-14 12:54:22 +01:00
chenyu
73af42aeab
fix pow backward when base is 0 ( #9075 )
2025-02-13 21:06:01 -05:00
qazal
2d04a75a40
start tracking bottom_up_rewrite in viz [pr] ( #9071 )
...
* start tracking bottom_up_rewrite in viz [pr]
* use the tracking matcher in test_viz
2025-02-14 00:28:10 +01:00
chenyu
5ef48bbe0a
swap order in rsqrt ( #9069 )
...
fixed backward for 0
2025-02-13 16:51:21 -05:00
Ahmed Harmouche
e83905696e
Show install instructions when dawn library is missing ( #9059 )
...
* Show install instructions when dawn library is missing
* Handle missing dawn in ops_webgpu
* Simplify
* Solve f-string backlash error
2025-02-14 00:30:20 +03:00
chenyu
9e91898941
bert eval at the end of training ( #9070 )
...
always eval at the last epoch
2025-02-13 16:29:44 -05:00
chenyu
e02e3b94c3
remove SQRT hack in llvm ( #9067 )
...
replaced with xpow 0.5 in transcendental. fixed sqrt(0) backward
2025-02-13 15:42:34 -05:00
chenyu
947c97e6ff
add test_sqrt to test_speed_v_torch ( #9066 )
...
working on getting rid of llvm sqrt hack
2025-02-13 15:25:54 -05:00
chenyu
49abc09f77
remove the reshapes in test_arange_2_reduce [pr] ( #9063 )
2025-02-13 12:33:25 -05:00
chenyu
2573d0621a
Tensor.scatter_reduce touchup [pr] ( #9060 )
2025-02-13 10:01:14 -05:00