Commit Graph

9453 Commits

Author SHA1 Message Date
George Hotz
770a558585 lil cleanups from uop branch [pr] (#11197) 2025-07-12 09:46:28 -07:00
George Hotz
5625e1904b axis types in KernelInfo (#11196)
* axis types in KernelInfo [pr]

* simpler lowerer

* fix tests
2025-07-12 09:36:20 -07:00
nimlgen
ea7f2f779c hcq: p2p nv-amd (#11195)
* hcq: p2p between diff devices

* fix
2025-07-12 18:53:34 +03:00
qazal
6a9f059b21 viz: early convert to cpu time (#11192) 2025-07-12 17:19:41 +03:00
chenyu
12b04efd69 remove a TODO prod(k.full_shape[k.first_upcast:]) (#11191)
IMAGE=2 test/test_ops.py works now
2025-07-12 10:16:56 -04:00
nimlgen
6f5250d158 nv: fix typing in rpc_rm_control (#11189) 2025-07-12 16:09:42 +03:00
qazal
c0a5490c72 viz: minor profiler cleanup (#11190) 2025-07-12 14:18:24 +03:00
chenyu
fdcc25e392 some noop hand_coded_optimizations cleanup [pr] (#11188) 2025-07-12 00:09:23 -04:00
chenyu
1ad852a892 break up Kernel.reshape_and_permute [pr] (#11187) 2025-07-11 18:08:08 -04:00
uuuvn
d11b20129d DMARef infra (#10753)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-11 14:09:47 -07:00
chenyu
b072be0e2d hotfix whisper main script (#11184) 2025-07-11 12:34:00 -04:00
qazal
0b7e9b5db7 viz: bugfix for multiple rewrites with the same name (#11182) 2025-07-11 18:26:12 +03:00
nimlgen
f9e4c4e57a nv: nvpci blackwell support (#11127)
* nv: start 5090

* gsp init 5090

* mmu

* works

* after merge

* clenaer

* rwk

* x

* fx

* finish?

* fix

* unrelated

* fix

* commenbt
2025-07-11 17:02:09 +03:00
qazal
1d85323572 viz: absolute scaling of memory graph (#11181) 2025-07-11 16:39:11 +03:00
nimlgen
c7f6b617b4 nv: do not hardcode lv0 pd size (#11180) 2025-07-11 16:26:18 +03:00
nimlgen
27922c986a nv: generic mmu impl (#11179) 2025-07-11 16:26:09 +03:00
qazal
d3ec63a5c3 viz: add base class for unittests (#11178) 2025-07-11 13:58:03 +03:00
qazal
b791ea117d viz: enable scrolling in profiler (#11169)
* viz: add scrollbar to profiler

* using margin fixes the layout bug

* s/profiler.clientHeight/profiler.scrollHeight, it's important

* closer

* scrolling on the device list also works
2025-07-11 11:30:13 +03:00
chenyu
b219e47bef remove Kernel.upcasted_axis [pr] (#11175) 2025-07-10 23:19:21 -04:00
George Hotz
ccd382bc6f use axis_types more [pr] (#11172)
* use axis_types more

* fix local shape

* simpler clause

* fix local shape
2025-07-10 15:05:13 -07:00
nimlgen
fb278c6a02 do not recreate Compiled.profile_events in helper_collect_profile (#11171) 2025-07-10 23:55:12 +03:00
George Hotz
5c5eb92ed4 tc unroll after upcast [pr] (#11170) 2025-07-10 13:43:50 -07:00
George Hotz
05613c8cac use shape str for tensor cores upcast/reduce [pr] (#11168)
* use shape str for tensor cores upcast/reduce [pr]

* reduce axis count isn't fixed
2025-07-10 13:10:58 -07:00
nimlgen
cc6ed30f4f nv: relative lv addressing in NVPageTableEntry (#11164) 2025-07-10 22:35:50 +03:00
chenyu
439d033af9 update the README matmul example (#11167)
don't call rand and numpy to show that it's indeed one kernel
2025-07-10 14:47:29 -04:00
qazal
bde80c0cdf record GraphEvents in metal graph (#11145)
* record GraphEvents in metal graph

* add TestProfiler.test_graph, revert old stuff

* move profile capture to MetalGraph

* comment

* don't double record graph command buffers

* wait_check

* explicit delete
2025-07-10 21:32:06 +03:00
George Hotz
8ce3d5906b use shape_str for tensor cores (#11165) 2025-07-10 09:10:36 -07:00
nimlgen
581397110f nv: use classes in GSP_IP (#11163) 2025-07-10 17:47:12 +03:00
nimlgen
705de6b8a6 nv: parse sizes of ctx buffers (#11161) 2025-07-10 17:46:48 +03:00
qazal
dcc9704b6b viz: profile RewriteSteps in TINY device (#11125)
* viz: profile RewriteSteps in TINY device

* use TracingKey with category

* split by whitespace

* add tracing.py

* work

* tracing_key

* TRACK_MATCH_STATS=3, can this be in defaults?

* fallback name

* work

* javascript

* measure text is slow

* checkout

* profile graph_rewrite/graph_rewrite_map

* change that

* no as

* finally

* work

* linking works
2025-07-10 17:45:57 +03:00
Pyry Kovanen
32117402dd metal: fix incorrect _free on interpreter exit (#11158) 2025-07-10 14:01:30 +03:00
qazal
3d610f6d2b viz: small ui cleanup (#11157)
* viz: small ui cleanup

* 2
2025-07-10 11:43:36 +03:00
chenyu
7db07e5f2c don't narrow range of CAST on bool/unsigned (#11156) 2025-07-09 22:20:09 -04:00
George Hotz
e154a66f43 unroll axis 0 in tensor core (#11155)
* unroll is 0 in tc [pr]

* flip order of upcast/reduce in tensor core

* Revert "flip order of upcast/reduce in tensor core"

This reverts commit e564e38bcd.
2025-07-09 17:28:23 -07:00
George Hotz
b7742ad9e4 migrate to string swizzle [pr] (#11154) 2025-07-09 16:57:53 -07:00
George Hotz
4156baee93 break swizzle into three chunks [pr] (#11153)
* break swizzle into three chunks [pr]

* test failed
2025-07-09 15:30:34 -07:00
George Hotz
ca2dc95433 swizzle in tc can't be none [pr] (#11152) 2025-07-09 14:44:23 -07:00
George Hotz
53ae153404 tc should be in opt (#11148)
* tc should be in opt [pr]

* fix import
2025-07-09 14:12:21 -07:00
wozeparrot
6697d0089d initial gfx950 kfd support (#11151)
* feat: initial gfx950 support

* fix: lint
2025-07-09 13:45:16 -07:00
George Hotz
262054be52 gfx950 tc support (#11150) 2025-07-09 13:30:42 -07:00
nimlgen
b6981404ed memory: use page shifts in memory manager (#11149)
* memory: use page shifts in memory manager

* fix
2025-07-09 22:05:00 +03:00
qazal
5c1d215b41 viz: add Graph stream (#11144)
* viz: stack an event for the entire batch

* multi

* whitespace

* work

* multi graph, Graph gets its own row
2025-07-09 20:56:46 +03:00
George Hotz
22305260e0 move tc to tc.py [pr] (#11147) 2025-07-09 10:55:56 -07:00
George Hotz
2893feb9f6 cleanups for kernel.py (#11143)
* cleanups for kernel.py

* fixups
2025-07-08 18:10:25 -07:00
George Hotz
b11ca104e9 axis cleanups [pr] (#11142) 2025-07-08 17:07:26 -07:00
chenyu
7ce9e45474 mypy onnx_parser (#11141) 2025-07-08 19:50:28 -04:00
George Hotz
a1b8f3e64f delete info from kernel [pr] (#11139)
* delete info from kernel [pr]

* update kernel info

* delete info
2025-07-08 15:53:13 -07:00
George Hotz
359bed74f8 axis type tracking [pr] (#11137)
* axis type tracking [pr]

* keep update_info

* keep legacy colors

* update tests to apply_opt
2025-07-08 14:16:25 -07:00
chenyu
dada3f5bf3 skip some new onnx tests (#11135)
these fails on master with latest onnx
2025-07-08 16:12:48 -04:00
chenyu
ffcc557986 lint onnx and onnx_parser (#11134) 2025-07-08 15:28:35 -04:00