Commit Graph

7955 Commits

Author SHA1 Message Date
qazal
ec80df5115 add PROGRAM renderer to viz [pr] (#9137) 2025-02-17 14:46:08 +01:00
qazal
7b09a72682 don't display void dtype in viz nodes [pr] (#9136)
* don't display void dtype in viz nodes [pr]

* extra
2025-02-17 13:49:36 +01:00
George Hotz
4dd10d03b7 move is_increasing to ops [pr] (#9134) 2025-02-17 19:27:48 +08:00
qazal
22c571d3cb add kernel axis colors to viz [pr] (#9129)
* add kernel axis colors to viz [pr]

* slightly blending with white makes this nicer

* space
2025-02-17 12:21:35 +01:00
George Hotz
1bf66d62cf symbolic gets its own file [pr] (#9132) 2025-02-17 18:55:21 +08:00
George Hotz
bd694faf6c factor out the expander logic [pr] (#9131) 2025-02-17 18:09:48 +08:00
quortus
5bdf0c7951 Bitcast constant folding 2.0 (#9089)
* Prevent const folding in test_payne_hanek_reduction

* Do not use list as a default parameter

* Bitcast constant folding

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-17 18:08:20 +08:00
quortus
2be4529f14 Test broken const folding wraparound behavior (#9080)
* Test broken const folding wraparound behavior

* Add repro for test_payne_hanek_reduction const folding bug

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-17 17:44:56 +08:00
George Hotz
7eea9b639d hotfix: add replay_pkl debugging env 2025-02-17 17:34:58 +08:00
George Hotz
af9d8d39d2 dsp matchers + bump line count to 11300 (#9130) 2025-02-17 17:31:54 +08:00
quortus
638d925e4e Prevent const folding in test_payne_hanek_reduction (#9088)
* Prevent const folding in test_payne_hanek_reduction

* Do not use list as a default parameter
2025-02-17 17:31:10 +08:00
George Hotz
9289425170 add ast to ProgramSpec + pre matcher [pr] (#9128)
* add ast to ProgramSpec + pre matcher [pr]

* cleaner cast + test fix
2025-02-17 16:39:14 +08:00
qazal
fe260ac4d7 viz/server cleanups [pr] (#9127)
* viz/server cleanups [pr]

* space
2025-02-17 09:59:41 +02:00
George Hotz
a38b47e026 hotfix: DSP doesn't use that path 2025-02-17 10:45:29 +08:00
quortus
edf7213f34 Make bitcast to the same dtype noop (#9121) 2025-02-16 20:28:44 -05:00
Ahmed Harmouche
59fe45f947 Solve get_grouped_dims does not split issue (#9085)
* Solve dims too large errors on webgpu

* Simplify divisor find

* Test square root divisor

* Fix lint

* Refactor into group_dims and split_dims

* Refactor

* Fix lint

* Add back max check in _group_dims

* Prefer grouping over split

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-16 19:57:29 -05:00
Ahmed Harmouche
84dc331dd1 Refactor async (#9126) 2025-02-16 17:47:15 -05:00
qazal
6a9e5598f9 small viz touchups [pr] (#9123) 2025-02-16 20:07:40 +01:00
qazal
b3127f38e6 faster viz data fetching with streaming [pr] (#9122)
* refactor to generator

* yield

* switch to SSE

* start client side + end events

* start javascript work

* need to redo this whole part

* more correct

* diff

* works

* diff cleanup

* more diff cleanup
2025-02-16 19:31:11 +01:00
uuuvn
8926bac00a am: profiling working (#9119)
ops_amd.py registres device finalization via atexit.register after
finalize_profile is registred in device.py leading to AM device
being closed before finalizing profile leading to hang.
(atexit.register is LIFO: https://docs.python.org/3.12/library/atexit.html#atexit.register)

This pr moves registring device finalization to device.py before
registring profile finalization
2025-02-16 18:51:08 +03:00
qazal
97cb9cb1ed always viz the first graph + non blocking matches fetch [pr] (#9117)
* always display the first graph in viz [pr]

* simpler

* progress indicator is the matches list style

* remove extra

* back

* res.json is still slow
2025-02-16 13:39:51 +01:00
chenyu
1fda98d14f fix import time_linearizer [pr] (#9118)
only test that used it was skipped in CI due to being slow
2025-02-15 21:33:28 -05:00
chenyu
c1dfe5c00d compact get_late_rewrite_patterns [pr] (#9116) 2025-02-15 20:33:09 -05:00
qazal
2e97022e5e remove extra block in viz [pr] (#9115) 2025-02-16 02:38:09 +02:00
chenyu
fd95543ff1 user scatter_reduce in scatter [pr] (#9114) 2025-02-15 18:21:01 -05:00
chenyu
c954419bc8 minor tweak to transcendental pow (#9112)
also added more pow with const test cases
2025-02-15 18:03:25 -05:00
chenyu
8dfa0024f0 raise in scatter if self and src have different dtype [pr] (#9109)
raise RuntimeError that matches torch instead of an implcitly cast
2025-02-15 11:21:34 -05:00
chenyu
d129ccda4c add RAWAST back to DEBUG=3 [pr] (#9107) 2025-02-15 09:12:51 -05:00
qazal
2e19976d03 assert views in tensor uops [pr] (#9106) 2025-02-15 13:27:55 +02:00
George Hotz
81f5a7af7d improve DEBUG=3 [pr] (#9105) 2025-02-15 18:44:56 +08:00
qazal
41d143d27c new order to prepare for becomes_map = tensor_map [pr] (#9104) 2025-02-15 10:37:36 +01:00
George Hotz
4672d9af73 actual tests for the dsp backend [pr] (#9102)
* actual tests for the dsp backend [pr]

* fix name
2025-02-15 15:17:56 +08:00
George Hotz
7e09057afa fixup clang devectorize (#9099)
* fixup clang devectorize

* __builtin_convertvector is some casts

* dsp fixups
2025-02-15 09:29:47 +08:00
Marcello Fuschi
8824f7e9df Make logcumsumexp numerically stable (#9050)
* Make logcumsumexp numerically stable

* Refactor

* Refactor for special case ndim=0

* Refactor

* Use the correct device for mask

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-14 19:25:17 -05:00
chenyu
81597ddd96 increase lr for bert (#9098)
had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview
2025-02-14 19:10:35 -05:00
b1tg
3ad39b247b refactor LLVMRenderer (#9090)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-02-15 08:00:31 +08:00
b1tg
1f1362fd27 add truncate_bf16 (#9078)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-02-15 07:59:09 +08:00
Ahmed Harmouche
2dc8f1867c Synchronize webgpu (#9093) 2025-02-15 00:52:10 +03:00
chenyu
b58e7b1898 zero out the weight in bert init run (#9076)
`DEFAULT_FLOAT=HALF BENCHMARK=10 BS=66 EVAL_BS=6 GPUS=6 MODEL=bert python3 examples/mlperf/model_train.py` no longer oom. I think the buffer of random init weights caused the oom.
2025-02-14 08:40:41 -05:00
qazal
82ad0d2e65 keep CONST/BUFFER uops in tensor_map [pr] (#9083) 2025-02-14 14:50:08 +02:00
qazal
65297066c2 move buffer refcount increment to the toposort [pr] (#9081) 2025-02-14 12:54:22 +01:00
chenyu
73af42aeab fix pow backward when base is 0 (#9075) 2025-02-13 21:06:01 -05:00
qazal
2d04a75a40 start tracking bottom_up_rewrite in viz [pr] (#9071)
* start tracking bottom_up_rewrite in viz [pr]

* use the tracking matcher in test_viz
2025-02-14 00:28:10 +01:00
chenyu
5ef48bbe0a swap order in rsqrt (#9069)
fixed backward for 0
2025-02-13 16:51:21 -05:00
Ahmed Harmouche
e83905696e Show install instructions when dawn library is missing (#9059)
* Show install instructions when dawn library is missing

* Handle missing dawn in ops_webgpu

* Simplify

* Solve f-string backlash error
2025-02-14 00:30:20 +03:00
chenyu
9e91898941 bert eval at the end of training (#9070)
always eval at the last epoch
2025-02-13 16:29:44 -05:00
chenyu
e02e3b94c3 remove SQRT hack in llvm (#9067)
replaced with xpow 0.5 in transcendental. fixed sqrt(0) backward
2025-02-13 15:42:34 -05:00
chenyu
947c97e6ff add test_sqrt to test_speed_v_torch (#9066)
working on getting rid of llvm sqrt hack
2025-02-13 15:25:54 -05:00
chenyu
49abc09f77 remove the reshapes in test_arange_2_reduce [pr] (#9063) 2025-02-13 12:33:25 -05:00
chenyu
2573d0621a Tensor.scatter_reduce touchup [pr] (#9060) 2025-02-13 10:01:14 -05:00