Commit Graph

10633 Commits

Author SHA1 Message Date
nimlgen
fb278c6a02 do not recreate Compiled.profile_events in helper_collect_profile (#11171) 2025-07-10 23:55:12 +03:00
George Hotz
5c5eb92ed4 tc unroll after upcast [pr] (#11170) 2025-07-10 13:43:50 -07:00
George Hotz
05613c8cac use shape str for tensor cores upcast/reduce [pr] (#11168)
* use shape str for tensor cores upcast/reduce [pr]

* reduce axis count isn't fixed
2025-07-10 13:10:58 -07:00
nimlgen
cc6ed30f4f nv: relative lv addressing in NVPageTableEntry (#11164) 2025-07-10 22:35:50 +03:00
chenyu
439d033af9 update the README matmul example (#11167)
don't call rand and numpy to show that it's indeed one kernel
2025-07-10 14:47:29 -04:00
qazal
bde80c0cdf record GraphEvents in metal graph (#11145)
* record GraphEvents in metal graph

* add TestProfiler.test_graph, revert old stuff

* move profile capture to MetalGraph

* comment

* don't double record graph command buffers

* wait_check

* explicit delete
2025-07-10 21:32:06 +03:00
George Hotz
8ce3d5906b use shape_str for tensor cores (#11165) 2025-07-10 09:10:36 -07:00
nimlgen
581397110f nv: use classes in GSP_IP (#11163) 2025-07-10 17:47:12 +03:00
nimlgen
705de6b8a6 nv: parse sizes of ctx buffers (#11161) 2025-07-10 17:46:48 +03:00
qazal
dcc9704b6b viz: profile RewriteSteps in TINY device (#11125)
* viz: profile RewriteSteps in TINY device

* use TracingKey with category

* split by whitespace

* add tracing.py

* work

* tracing_key

* TRACK_MATCH_STATS=3, can this be in defaults?

* fallback name

* work

* javascript

* measure text is slow

* checkout

* profile graph_rewrite/graph_rewrite_map

* change that

* no as

* finally

* work

* linking works
2025-07-10 17:45:57 +03:00
Pyry Kovanen
32117402dd metal: fix incorrect _free on interpreter exit (#11158) 2025-07-10 14:01:30 +03:00
qazal
3d610f6d2b viz: small ui cleanup (#11157)
* viz: small ui cleanup

* 2
2025-07-10 11:43:36 +03:00
chenyu
7db07e5f2c don't narrow range of CAST on bool/unsigned (#11156) 2025-07-09 22:20:09 -04:00
George Hotz
e154a66f43 unroll axis 0 in tensor core (#11155)
* unroll is 0 in tc [pr]

* flip order of upcast/reduce in tensor core

* Revert "flip order of upcast/reduce in tensor core"

This reverts commit e564e38bcd.
2025-07-09 17:28:23 -07:00
George Hotz
b7742ad9e4 migrate to string swizzle [pr] (#11154) 2025-07-09 16:57:53 -07:00
George Hotz
4156baee93 break swizzle into three chunks [pr] (#11153)
* break swizzle into three chunks [pr]

* test failed
2025-07-09 15:30:34 -07:00
George Hotz
ca2dc95433 swizzle in tc can't be none [pr] (#11152) 2025-07-09 14:44:23 -07:00
George Hotz
53ae153404 tc should be in opt (#11148)
* tc should be in opt [pr]

* fix import
2025-07-09 14:12:21 -07:00
wozeparrot
6697d0089d initial gfx950 kfd support (#11151)
* feat: initial gfx950 support

* fix: lint
2025-07-09 13:45:16 -07:00
George Hotz
262054be52 gfx950 tc support (#11150) 2025-07-09 13:30:42 -07:00
nimlgen
b6981404ed memory: use page shifts in memory manager (#11149)
* memory: use page shifts in memory manager

* fix
2025-07-09 22:05:00 +03:00
qazal
5c1d215b41 viz: add Graph stream (#11144)
* viz: stack an event for the entire batch

* multi

* whitespace

* work

* multi graph, Graph gets its own row
2025-07-09 20:56:46 +03:00
George Hotz
22305260e0 move tc to tc.py [pr] (#11147) 2025-07-09 10:55:56 -07:00
George Hotz
2893feb9f6 cleanups for kernel.py (#11143)
* cleanups for kernel.py

* fixups
2025-07-08 18:10:25 -07:00
George Hotz
b11ca104e9 axis cleanups [pr] (#11142) 2025-07-08 17:07:26 -07:00
chenyu
7ce9e45474 mypy onnx_parser (#11141) 2025-07-08 19:50:28 -04:00
George Hotz
a1b8f3e64f delete info from kernel [pr] (#11139)
* delete info from kernel [pr]

* update kernel info

* delete info
2025-07-08 15:53:13 -07:00
George Hotz
359bed74f8 axis type tracking [pr] (#11137)
* axis type tracking [pr]

* keep update_info

* keep legacy colors

* update tests to apply_opt
2025-07-08 14:16:25 -07:00
chenyu
dada3f5bf3 skip some new onnx tests (#11135)
these fails on master with latest onnx
2025-07-08 16:12:48 -04:00
chenyu
ffcc557986 lint onnx and onnx_parser (#11134) 2025-07-08 15:28:35 -04:00
George Hotz
3238d21cd1 add finalized to kernel [pr] (#11132)
* add finalized to kernel [pr]

* add copy
2025-07-08 11:06:17 -07:00
George Hotz
289a411f5f hotfix: remove unused GBARRIER, CONTIGUOUS color is GBARRIER 2025-07-08 10:31:06 -07:00
nimlgen
43650169f4 nv: switch headers to 570.144 to match gsp (#11131) 2025-07-08 20:29:06 +03:00
quortus
790b05ab12 [pr] Unify CONTIGUOUS and GBARRIER (#11121)
* Unify CONTIGUOUS and GBARRIER

* Simplify rules
2025-07-08 10:27:23 -07:00
nimlgen
b516fe71b4 nv: return real struct in _alloc_boot_struct (#11130) 2025-07-08 20:04:43 +03:00
qazal
3dfc0ff887 move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126)
* move cpu_profile and shared ProfileEvents to helpers [pr]

* TestProfiler.test_cpu_profile

* update test_viz.py

* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
George Hotz
397826f0b4 add a test for 1B llm (#11124)
* add a test for 1B llm

* fix mbs

* add apps to release
2025-07-07 18:47:25 -07:00
George Hotz
f7d4638e05 start LLM app, tons of clean up required. target is 200 line ollama (#11068)
* start LLM app, tons of clean up required. target is 200 line ollama

* kind of works

* simpler

* add k/v cache

* with SYM=1, it loops

* no rope cache

* simpler

* more cleanups

* cleanups

* works

* argparse and comments

* from gguf

* generate is a function

* no copy from cpu

* fix max context pass in

* test

* improve test

* ai2_arc

* fix 8B, use less ram

* 136 lines
2025-07-07 17:09:46 -07:00
chenyu
341a686799 Tensor.diagonal (#11122)
only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant
2025-07-07 16:21:26 -04:00
Sieds Lykles
584fd6af5a Fix division by zero and mask bug in add views (#11088)
* merge view infinite loop test

* adjust condition in `x//d -> x//(-d)*-1`

* Fix division by zero in add views

* adjust offset end

* fix typo in comment

* add target to test_merge_views_variable

* fix view incorrectly being masked

* ssimplify strides and offset of the new view to canonicalize

* remove print in test

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-07-07 10:05:47 -07:00
nimlgen
71377cd233 nv: parse falcon app descs (#11118) 2025-07-07 18:14:14 +03:00
nimlgen
9a573a1d99 nv: finalize nvdev (#11117)
* nv: finalize nvdev

* typo
2025-07-07 16:31:59 +03:00
nimlgen
fa59c05282 nv: import flags from system (#11115)
* nv: import flags from system

* not used
2025-07-07 14:46:49 +03:00
Nino Risteski
a1a146a499 adding enable_gqa in SDPA (#11097)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-06 23:25:33 -07:00
nimlgen
b73e89110e nv: align allocations for perf (#11114) 2025-07-06 22:32:11 +03:00
chenyu
7468959f4b Tensor.argsort (#11112) 2025-07-06 13:56:35 -04:00
kevvz
b7af9cf849 clean svd tests, set full_matrices false in torch backend (#11113)
* clean tests, set full_matrices false

* add more shape asserts
2025-07-06 13:55:49 -04:00
qazal
a556f50668 viz: small ui fixes (#11110)
* share styling of ctx-list and metadata

* scrollbar-gutter: stable prevents layout shift when changing steps

* margin-left makes left side unaligned
2025-07-06 17:05:36 +03:00
chenyu
ba88ec3ad0 pipe linalg svd to torch (#11109)
and found a bug in svd
2025-07-06 08:37:25 -04:00
chenyu
845a4d32bc Tensor.diag (#11108)
also updated Tensor.eye to use it
2025-07-05 23:03:02 -04:00