Commit Graph

8255 Commits

Author SHA1 Message Date
chenyu
5358b0904b update uop_given_valid if a node becomes const (#9604)
* update uop_given_valid if a node becomes const

* cleanup
2025-03-27 14:57:46 -04:00
chenyu
a187dfd3df bert BEAM_UOPS_MAX 3000->4000 (#9603)
more stable for the final step time

green 410ms (master) -> 397ms (BEAM=4) -> 392ms (this)
red 561ms (master) -> 550ms (this)
2025-03-27 11:58:47 -04:00
qazal
088a677e25 rescale to fit viz graph [pr] (#9599)
* zoom to fit the graph in viz [pr]

* always on screen fit graph

* space key recenters
2025-03-27 23:33:51 +08:00
nimlgen
3737821b9e prepare for clang graph (#9600)
* prepare for clang graph

* emu

* ops

* ops2

* better type

* fix
2025-03-27 20:09:37 +07:00
qazal
bf94924d5a fix viz with nested graph_rewrite (#9595) 2025-03-27 13:14:28 +08:00
qazal
c011751b41 statically define viz arrow heads (#9594) 2025-03-27 12:22:04 +08:00
qazal
0877497bad hotfix: use captured uops in viz render [pr] (#9593)
* hotfix: use captured uops in viz render [pr]

* better error
2025-03-27 11:52:12 +08:00
qazal
e5ff7b23d7 refactor to @track_matches + add failing test_nested_rewrite (#9592)
* test_nested_rewrite

* refactor to track_matches

* positional arg
2025-03-27 11:11:56 +08:00
chenyu
62888614f6 lower bert eval bs to 24 (#9590)
oom during eval
2025-03-26 21:25:23 -04:00
nimlgen
dc9da1d917 memplan into one buffer (#9526)
* new memplanner

* new should works

* fix

* VALIDATE_MEMORY_PLANNER

* hm?

* ugh

* fix alignment

* fix2

* rm

* tiny fixes

* test

* comments and fixes

* fix2

* liiiinetr

* t

* fix
2025-03-27 01:46:50 +07:00
qazal
8b717c345c cache viz worker at launch (#9589) 2025-03-27 01:10:02 +08:00
George Hotz
d62ced8981 symbolic -> symbolic_flat (#9588) 2025-03-26 23:34:43 +08:00
George Hotz
8aaa5e1ec5 generate the individual indexes (#9587) 2025-03-26 22:32:06 +08:00
George Hotz
5c6cd884e3 multiple simplifies is faster [pr] (#9586)
* multiple simplifies is faster [pr]

* cleanup

* cleanup
2025-03-26 21:42:52 +08:00
George Hotz
1e6e75e39a little changes from dsp branch (#9582)
* little changes from dsp branch

* not that one

* need the where

* Revert "need the where"

This reverts commit 140f89c878.
2025-03-26 20:01:21 +08:00
nimlgen
e88a640ca5 fix _access_resources for offset buffers (#9580)
* fix _access_resources for offset buffers

* test
2025-03-26 18:42:43 +07:00
Andrey
7b865ed03d use tuple in isinstance for type checking (#9583) 2025-03-26 19:36:48 +08:00
George Hotz
9115ce8860 linearizer fixups from DSP branch (#9581) 2025-03-26 18:28:15 +08:00
qazal
e799df537e prep viz UI cleanup for grid scales (#9579)
* less ways to make a button

* move collapse out

* work

* do not create extra resizers

* better

* ul

* safari
2025-03-26 17:48:15 +08:00
nimlgen
ccbcdca473 add memplanner tests (#9577) 2025-03-26 10:59:39 +07:00
qazal
c03dadfcb9 add TORCHVIZ=1 to beautiful_mnist_torch (#9576) 2025-03-26 11:17:08 +08:00
qazal
93bcb974c5 select torch device in examples/beautiful_mnist_torch.py (#9575) 2025-03-26 11:01:25 +08:00
uuuvn
2c32126fc8 am: AMRegister refactor (#9572) 2025-03-26 00:52:40 +07:00
chenyu
cddd750d68 add a failed test case for jit/nojit rand [pr] (#9574)
currently adding jit produced different rand values
2025-03-25 13:32:44 -04:00
nimlgen
4cf2b68ca8 am_smi: fix init for newer versions (#9559) 2025-03-25 23:48:05 +07:00
qazal
a6a5c0aec5 add NULL=1 backend (#9573)
* add NULL=1 backend

* NullAllocator

* line

* metadata should still work

* it shouldn't have memory usage

* Revert "it shouldn't have memory usage"

This reverts commit a9080fdd43.

* back

* null flops
2025-03-25 22:20:52 +08:00
qazal
b60d9976b4 better yaxis formatting in viz memory graph (#9570)
* better bytes format

* pluralize

* 1 less line
2025-03-25 16:50:22 +08:00
qazal
faf3b5b245 display kernel metadata in memory viz (#9569)
* display kernel metadata in memory viz

* fix that
2025-03-25 13:14:54 +08:00
qazal
52301fe68e move Buffer refcount increment out of schedule.py (#9564)
* move Buffer refcount increment out of schedule.py

* add TestGC.test_assign_refcount

* refcount refers to Ops.BUFFER UOps
2025-03-25 12:08:27 +08:00
qazal
262f5a2bd3 hotfix: replace link in viz/readme (#9568) 2025-03-25 10:24:49 +08:00
chenyu
6427272bf6 minor update to rand [pr] (#9566) 2025-03-24 18:49:50 -04:00
chenyu
b0e070e737 remove MOCKGPU workaround in rand (#9565)
also `requires_grad_` to save a line
2025-03-24 17:49:45 -04:00
qazal
d7c754ce49 failing test for UOp buffer ref count (#9563)
* failing test for UOp buffer ref count

* lint
2025-03-25 00:10:48 +08:00
b1tg
f90001e1a6 amd llvm render (no_comgr prereq) (#9543)
* amd llvm render

* skip test_div_rounding_mode

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-03-24 22:50:51 +08:00
Priyank Patel
4f5e03bd60 better fix inplace detach (#9557) 2025-03-24 22:50:28 +08:00
qazal
1c40873962 show buffer info in memory viz (#9562) 2025-03-24 22:12:30 +08:00
qazal
efaee75656 start viz of memory usage (#9561)
* start viz of memory usage

* polygons/bars + use d3
2025-03-24 19:05:35 +08:00
qazal
1cfe6d02fe refactor uop_to_json to return a dict [pr] (#9560) 2025-03-24 16:38:17 +08:00
nimlgen
edf9e1bf8d am: move out soc21 to a sep module (#9551)
* am: soc module is not part of am

* am: soc module is not part of am
2025-03-24 14:17:42 +07:00
George Hotz
74d98eafb8 add onnx frontend stub [pr] (#9558) 2025-03-24 12:24:34 +08:00
George Hotz
de7d6cec3a hotfix: DEBUG 5 prints the ast 2025-03-24 11:43:11 +08:00
chenyu
ba41076e94 update embedding test to not use dtypes.long [pr] (#9556) 2025-03-23 21:33:38 -04:00
chenyu
c965f4c20b update bert config (#9555)
BEAM 4->5 for green, 2% faster
use AMD driver instead of AM for red, 5% faster
2025-03-23 16:14:41 -04:00
chenyu
d734e24c01 minor WEBGPU_PATH cleanup [pr] (#9552)
also mypy recognizes `sys.platform == 'win32'` but does not recognizes it if wrapped inside a helper...
2025-03-23 09:10:02 -04:00
Ahmed Harmouche
7ce7fe0574 Refactor webgpu_dawn lib finding (#9547)
* Refactor webgpu_dawn lib finding

* Fix ruff
2025-03-23 08:23:29 -04:00
uuuvn
c631c72f22 HCQ: Increment timeline signal before submitting (#9550)
`AMDComputeQueue.__del__` frees `hw_page` which is safe because
`AMDAllocator._free` does `self.dev.synchronize()` which is supposed
to wait for execution of IB to finish, however that doesn't happen if
AMDComputeQueue is dropped right after submit before timeline signal is
incremented, which it is in most places leading to a race if .bind() is
also used (required for multi-xcc because bug in mec fw treats all
PACKET3_PRED_EXECs outside IBs as if they had EXEC_COUNT of zero).
2025-03-23 18:30:38 +07:00
nimlgen
d5667419af am: move out pte creation logic (#9548)
* am: move out pte creation logic

* emu

* ops
2025-03-23 18:29:10 +07:00
geohotstan
309afa20b7 add Tensor.max_unpool2d (#9518)
* why does max_unpool2d feel slower than out.gradient ...

* slightly cleaner

* what happened to ruff

* need to think about this some more

* slightly faster now?

* clean up, 1 more failing edge case

* ok good

* working TINY_BACKEND

* nit doc wording

* retry CI
2025-03-22 12:11:33 -04:00
quortus
bdd44d4255 Fix DSP transcendentals (#9542) 2025-03-22 11:08:18 +08:00
Ignacio Sica
eddafb84e5 Bugfix for TC=3 (#9464)
* wrong but uses less shared

* for size 8 tc1 with devectorize in 0 loads into local before wmma and works

* improvements over tc1 devectorize

* fix tc=3

* works for handcoded tc opts

* clean bugfix tc=3

* fix

* revert changes
2025-03-21 16:43:42 -07:00