Commit Graph

7917 Commits

Author SHA1 Message Date
George Hotz
df3b320f46 rewriter -> devectorizer [pr] (#9147) 2025-02-18 12:42:08 +08:00
chenyu
5dc1257ce0 clean up bert fake data iterator [pr] (#9145)
reuse the same get_data_bert path in setup and real run
2025-02-17 20:03:38 -05:00
qazal
751c517b6c cancel viz request after the kernel clicked away [pr] (#9144) 2025-02-17 20:19:09 +01:00
chenyu
465421b525 fix Tensor.isclose (#9143)
many corner cases around inf and nan
2025-02-17 12:03:12 -05:00
qazal
36741cbbc1 enable real_size assert for test_conv_2x2_backward_one_view [pr] (#9142) 2025-02-17 17:53:44 +01:00
qazal
e9ff4ef4f7 s/ScheduleContext/GrouperContext [pr] (#9141)
* refactor to kernel context [pr]

* s/ScheduleContext/GrouperContext [pr]
2025-02-17 17:14:17 +01:00
qazal
96cc9f59e0 refactor to kernel context [pr] (#9140) 2025-02-17 16:57:14 +01:00
qazal
df6781332e remove var_vals from the scheduler context [pr] (#9139)
* remove var_vals from the scheduler context [pr]

* maps to int
2025-02-17 16:43:50 +01:00
Ali Ladjevardi
35e9c4657b Use proper units when printing beam time (#9103)
* use proper units when printing beam time

* refactor DEBUG=2
2025-02-17 23:41:38 +08:00
Clément Verrier
a7f91224eb add Tensor.isclose() (#8844)
* add `Tensor.isclose()`

* support `equal_nan`

so as to match PyTorch's behavior

* update unit tests

* remove some tests temporarily

* re-enable one test

* re-enable other test

* try to fix failing tests during CI

* save one line of code

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-17 10:11:40 -05:00
qazal
2b787c3b17 hotfix: lower ul.disabled opacity for viz [pr] (#9138) 2025-02-17 15:16:48 +01:00
qazal
660c034da6 KERNEL op try 3 (#9061)
* work

* tolerate shape, maybe this is ASSIGN(RESHAPE(BUF), KERNEL)

* err, it's not ASSIGN(BUF, KERNEL), it's ASSIGN(VIEW(BUF), KERNEL)

* burn the boats

* assign slightly works

* assign works

* cleanup + var_vals can exist

* fine image + fix metadata

* metadata, without making everything 30% slower

* diff pruning

* faster assign schedule

* add_buffer_ops stage

* add kernel_spec back

* add viz display

* more strict kernel_spec
2025-02-17 14:47:54 +01:00
qazal
ec80df5115 add PROGRAM renderer to viz [pr] (#9137) 2025-02-17 14:46:08 +01:00
qazal
7b09a72682 don't display void dtype in viz nodes [pr] (#9136)
* don't display void dtype in viz nodes [pr]

* extra
2025-02-17 13:49:36 +01:00
George Hotz
4dd10d03b7 move is_increasing to ops [pr] (#9134) 2025-02-17 19:27:48 +08:00
qazal
22c571d3cb add kernel axis colors to viz [pr] (#9129)
* add kernel axis colors to viz [pr]

* slightly blending with white makes this nicer

* space
2025-02-17 12:21:35 +01:00
George Hotz
1bf66d62cf symbolic gets its own file [pr] (#9132) 2025-02-17 18:55:21 +08:00
George Hotz
bd694faf6c factor out the expander logic [pr] (#9131) 2025-02-17 18:09:48 +08:00
quortus
5bdf0c7951 Bitcast constant folding 2.0 (#9089)
* Prevent const folding in test_payne_hanek_reduction

* Do not use list as a default parameter

* Bitcast constant folding

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-17 18:08:20 +08:00
quortus
2be4529f14 Test broken const folding wraparound behavior (#9080)
* Test broken const folding wraparound behavior

* Add repro for test_payne_hanek_reduction const folding bug

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-17 17:44:56 +08:00
George Hotz
7eea9b639d hotfix: add replay_pkl debugging env 2025-02-17 17:34:58 +08:00
George Hotz
af9d8d39d2 dsp matchers + bump line count to 11300 (#9130) 2025-02-17 17:31:54 +08:00
quortus
638d925e4e Prevent const folding in test_payne_hanek_reduction (#9088)
* Prevent const folding in test_payne_hanek_reduction

* Do not use list as a default parameter
2025-02-17 17:31:10 +08:00
George Hotz
9289425170 add ast to ProgramSpec + pre matcher [pr] (#9128)
* add ast to ProgramSpec + pre matcher [pr]

* cleaner cast + test fix
2025-02-17 16:39:14 +08:00
qazal
fe260ac4d7 viz/server cleanups [pr] (#9127)
* viz/server cleanups [pr]

* space
2025-02-17 09:59:41 +02:00
George Hotz
a38b47e026 hotfix: DSP doesn't use that path 2025-02-17 10:45:29 +08:00
quortus
edf7213f34 Make bitcast to the same dtype noop (#9121) 2025-02-16 20:28:44 -05:00
Ahmed Harmouche
59fe45f947 Solve get_grouped_dims does not split issue (#9085)
* Solve dims too large errors on webgpu

* Simplify divisor find

* Test square root divisor

* Fix lint

* Refactor into group_dims and split_dims

* Refactor

* Fix lint

* Add back max check in _group_dims

* Prefer grouping over split

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-16 19:57:29 -05:00
Ahmed Harmouche
84dc331dd1 Refactor async (#9126) 2025-02-16 17:47:15 -05:00
qazal
6a9e5598f9 small viz touchups [pr] (#9123) 2025-02-16 20:07:40 +01:00
qazal
b3127f38e6 faster viz data fetching with streaming [pr] (#9122)
* refactor to generator

* yield

* switch to SSE

* start client side + end events

* start javascript work

* need to redo this whole part

* more correct

* diff

* works

* diff cleanup

* more diff cleanup
2025-02-16 19:31:11 +01:00
uuuvn
8926bac00a am: profiling working (#9119)
ops_amd.py registres device finalization via atexit.register after
finalize_profile is registred in device.py leading to AM device
being closed before finalizing profile leading to hang.
(atexit.register is LIFO: https://docs.python.org/3.12/library/atexit.html#atexit.register)

This pr moves registring device finalization to device.py before
registring profile finalization
2025-02-16 18:51:08 +03:00
qazal
97cb9cb1ed always viz the first graph + non blocking matches fetch [pr] (#9117)
* always display the first graph in viz [pr]

* simpler

* progress indicator is the matches list style

* remove extra

* back

* res.json is still slow
2025-02-16 13:39:51 +01:00
chenyu
1fda98d14f fix import time_linearizer [pr] (#9118)
only test that used it was skipped in CI due to being slow
2025-02-15 21:33:28 -05:00
chenyu
c1dfe5c00d compact get_late_rewrite_patterns [pr] (#9116) 2025-02-15 20:33:09 -05:00
qazal
2e97022e5e remove extra block in viz [pr] (#9115) 2025-02-16 02:38:09 +02:00
chenyu
fd95543ff1 user scatter_reduce in scatter [pr] (#9114) 2025-02-15 18:21:01 -05:00
chenyu
c954419bc8 minor tweak to transcendental pow (#9112)
also added more pow with const test cases
2025-02-15 18:03:25 -05:00
chenyu
8dfa0024f0 raise in scatter if self and src have different dtype [pr] (#9109)
raise RuntimeError that matches torch instead of an implcitly cast
2025-02-15 11:21:34 -05:00
chenyu
d129ccda4c add RAWAST back to DEBUG=3 [pr] (#9107) 2025-02-15 09:12:51 -05:00
qazal
2e19976d03 assert views in tensor uops [pr] (#9106) 2025-02-15 13:27:55 +02:00
George Hotz
81f5a7af7d improve DEBUG=3 [pr] (#9105) 2025-02-15 18:44:56 +08:00
qazal
41d143d27c new order to prepare for becomes_map = tensor_map [pr] (#9104) 2025-02-15 10:37:36 +01:00
George Hotz
4672d9af73 actual tests for the dsp backend [pr] (#9102)
* actual tests for the dsp backend [pr]

* fix name
2025-02-15 15:17:56 +08:00
George Hotz
7e09057afa fixup clang devectorize (#9099)
* fixup clang devectorize

* __builtin_convertvector is some casts

* dsp fixups
2025-02-15 09:29:47 +08:00
Marcello Fuschi
8824f7e9df Make logcumsumexp numerically stable (#9050)
* Make logcumsumexp numerically stable

* Refactor

* Refactor for special case ndim=0

* Refactor

* Use the correct device for mask

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-14 19:25:17 -05:00
chenyu
81597ddd96 increase lr for bert (#9098)
had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview
2025-02-14 19:10:35 -05:00
b1tg
3ad39b247b refactor LLVMRenderer (#9090)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-02-15 08:00:31 +08:00
b1tg
1f1362fd27 add truncate_bf16 (#9078)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-02-15 07:59:09 +08:00
Ahmed Harmouche
2dc8f1867c Synchronize webgpu (#9093) 2025-02-15 00:52:10 +03:00