Commit Graph

10490 Commits

Author SHA1 Message Date
qazal
1d85323572 viz: absolute scaling of memory graph (#11181) 2025-07-11 16:39:11 +03:00
nimlgen
c7f6b617b4 nv: do not hardcode lv0 pd size (#11180) 2025-07-11 16:26:18 +03:00
nimlgen
27922c986a nv: generic mmu impl (#11179) 2025-07-11 16:26:09 +03:00
qazal
d3ec63a5c3 viz: add base class for unittests (#11178) 2025-07-11 13:58:03 +03:00
qazal
b791ea117d viz: enable scrolling in profiler (#11169)
* viz: add scrollbar to profiler

* using margin fixes the layout bug

* s/profiler.clientHeight/profiler.scrollHeight, it's important

* closer

* scrolling on the device list also works
2025-07-11 11:30:13 +03:00
chenyu
b219e47bef remove Kernel.upcasted_axis [pr] (#11175) 2025-07-10 23:19:21 -04:00
George Hotz
ccd382bc6f use axis_types more [pr] (#11172)
* use axis_types more

* fix local shape

* simpler clause

* fix local shape
2025-07-10 15:05:13 -07:00
nimlgen
fb278c6a02 do not recreate Compiled.profile_events in helper_collect_profile (#11171) 2025-07-10 23:55:12 +03:00
George Hotz
5c5eb92ed4 tc unroll after upcast [pr] (#11170) 2025-07-10 13:43:50 -07:00
George Hotz
05613c8cac use shape str for tensor cores upcast/reduce [pr] (#11168)
* use shape str for tensor cores upcast/reduce [pr]

* reduce axis count isn't fixed
2025-07-10 13:10:58 -07:00
nimlgen
cc6ed30f4f nv: relative lv addressing in NVPageTableEntry (#11164) 2025-07-10 22:35:50 +03:00
chenyu
439d033af9 update the README matmul example (#11167)
don't call rand and numpy to show that it's indeed one kernel
2025-07-10 14:47:29 -04:00
qazal
bde80c0cdf record GraphEvents in metal graph (#11145)
* record GraphEvents in metal graph

* add TestProfiler.test_graph, revert old stuff

* move profile capture to MetalGraph

* comment

* don't double record graph command buffers

* wait_check

* explicit delete
2025-07-10 21:32:06 +03:00
George Hotz
8ce3d5906b use shape_str for tensor cores (#11165) 2025-07-10 09:10:36 -07:00
nimlgen
581397110f nv: use classes in GSP_IP (#11163) 2025-07-10 17:47:12 +03:00
nimlgen
705de6b8a6 nv: parse sizes of ctx buffers (#11161) 2025-07-10 17:46:48 +03:00
qazal
dcc9704b6b viz: profile RewriteSteps in TINY device (#11125)
* viz: profile RewriteSteps in TINY device

* use TracingKey with category

* split by whitespace

* add tracing.py

* work

* tracing_key

* TRACK_MATCH_STATS=3, can this be in defaults?

* fallback name

* work

* javascript

* measure text is slow

* checkout

* profile graph_rewrite/graph_rewrite_map

* change that

* no as

* finally

* work

* linking works
2025-07-10 17:45:57 +03:00
Pyry Kovanen
32117402dd metal: fix incorrect _free on interpreter exit (#11158) 2025-07-10 14:01:30 +03:00
qazal
3d610f6d2b viz: small ui cleanup (#11157)
* viz: small ui cleanup

* 2
2025-07-10 11:43:36 +03:00
chenyu
7db07e5f2c don't narrow range of CAST on bool/unsigned (#11156) 2025-07-09 22:20:09 -04:00
George Hotz
e154a66f43 unroll axis 0 in tensor core (#11155)
* unroll is 0 in tc [pr]

* flip order of upcast/reduce in tensor core

* Revert "flip order of upcast/reduce in tensor core"

This reverts commit e564e38bcd.
2025-07-09 17:28:23 -07:00
George Hotz
b7742ad9e4 migrate to string swizzle [pr] (#11154) 2025-07-09 16:57:53 -07:00
George Hotz
4156baee93 break swizzle into three chunks [pr] (#11153)
* break swizzle into three chunks [pr]

* test failed
2025-07-09 15:30:34 -07:00
George Hotz
ca2dc95433 swizzle in tc can't be none [pr] (#11152) 2025-07-09 14:44:23 -07:00
George Hotz
53ae153404 tc should be in opt (#11148)
* tc should be in opt [pr]

* fix import
2025-07-09 14:12:21 -07:00
wozeparrot
6697d0089d initial gfx950 kfd support (#11151)
* feat: initial gfx950 support

* fix: lint
2025-07-09 13:45:16 -07:00
George Hotz
262054be52 gfx950 tc support (#11150) 2025-07-09 13:30:42 -07:00
nimlgen
b6981404ed memory: use page shifts in memory manager (#11149)
* memory: use page shifts in memory manager

* fix
2025-07-09 22:05:00 +03:00
qazal
5c1d215b41 viz: add Graph stream (#11144)
* viz: stack an event for the entire batch

* multi

* whitespace

* work

* multi graph, Graph gets its own row
2025-07-09 20:56:46 +03:00
George Hotz
22305260e0 move tc to tc.py [pr] (#11147) 2025-07-09 10:55:56 -07:00
George Hotz
2893feb9f6 cleanups for kernel.py (#11143)
* cleanups for kernel.py

* fixups
2025-07-08 18:10:25 -07:00
George Hotz
b11ca104e9 axis cleanups [pr] (#11142) 2025-07-08 17:07:26 -07:00
chenyu
7ce9e45474 mypy onnx_parser (#11141) 2025-07-08 19:50:28 -04:00
George Hotz
a1b8f3e64f delete info from kernel [pr] (#11139)
* delete info from kernel [pr]

* update kernel info

* delete info
2025-07-08 15:53:13 -07:00
George Hotz
359bed74f8 axis type tracking [pr] (#11137)
* axis type tracking [pr]

* keep update_info

* keep legacy colors

* update tests to apply_opt
2025-07-08 14:16:25 -07:00
chenyu
dada3f5bf3 skip some new onnx tests (#11135)
these fails on master with latest onnx
2025-07-08 16:12:48 -04:00
chenyu
ffcc557986 lint onnx and onnx_parser (#11134) 2025-07-08 15:28:35 -04:00
George Hotz
3238d21cd1 add finalized to kernel [pr] (#11132)
* add finalized to kernel [pr]

* add copy
2025-07-08 11:06:17 -07:00
George Hotz
289a411f5f hotfix: remove unused GBARRIER, CONTIGUOUS color is GBARRIER 2025-07-08 10:31:06 -07:00
nimlgen
43650169f4 nv: switch headers to 570.144 to match gsp (#11131) 2025-07-08 20:29:06 +03:00
quortus
790b05ab12 [pr] Unify CONTIGUOUS and GBARRIER (#11121)
* Unify CONTIGUOUS and GBARRIER

* Simplify rules
2025-07-08 10:27:23 -07:00
nimlgen
b516fe71b4 nv: return real struct in _alloc_boot_struct (#11130) 2025-07-08 20:04:43 +03:00
qazal
3dfc0ff887 move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126)
* move cpu_profile and shared ProfileEvents to helpers [pr]

* TestProfiler.test_cpu_profile

* update test_viz.py

* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
George Hotz
397826f0b4 add a test for 1B llm (#11124)
* add a test for 1B llm

* fix mbs

* add apps to release
2025-07-07 18:47:25 -07:00
George Hotz
f7d4638e05 start LLM app, tons of clean up required. target is 200 line ollama (#11068)
* start LLM app, tons of clean up required. target is 200 line ollama

* kind of works

* simpler

* add k/v cache

* with SYM=1, it loops

* no rope cache

* simpler

* more cleanups

* cleanups

* works

* argparse and comments

* from gguf

* generate is a function

* no copy from cpu

* fix max context pass in

* test

* improve test

* ai2_arc

* fix 8B, use less ram

* 136 lines
2025-07-07 17:09:46 -07:00
chenyu
341a686799 Tensor.diagonal (#11122)
only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant
2025-07-07 16:21:26 -04:00
Sieds Lykles
584fd6af5a Fix division by zero and mask bug in add views (#11088)
* merge view infinite loop test

* adjust condition in `x//d -> x//(-d)*-1`

* Fix division by zero in add views

* adjust offset end

* fix typo in comment

* add target to test_merge_views_variable

* fix view incorrectly being masked

* ssimplify strides and offset of the new view to canonicalize

* remove print in test

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-07-07 10:05:47 -07:00
nimlgen
71377cd233 nv: parse falcon app descs (#11118) 2025-07-07 18:14:14 +03:00
nimlgen
9a573a1d99 nv: finalize nvdev (#11117)
* nv: finalize nvdev

* typo
2025-07-07 16:31:59 +03:00
nimlgen
fa59c05282 nv: import flags from system (#11115)
* nv: import flags from system

* not used
2025-07-07 14:46:49 +03:00