nimlgen
fb278c6a02
do not recreate Compiled.profile_events in helper_collect_profile ( #11171 )
2025-07-10 23:55:12 +03:00
George Hotz
5c5eb92ed4
tc unroll after upcast [pr] ( #11170 )
2025-07-10 13:43:50 -07:00
George Hotz
05613c8cac
use shape str for tensor cores upcast/reduce [pr] ( #11168 )
...
* use shape str for tensor cores upcast/reduce [pr]
* reduce axis count isn't fixed
2025-07-10 13:10:58 -07:00
nimlgen
cc6ed30f4f
nv: relative lv addressing in NVPageTableEntry ( #11164 )
2025-07-10 22:35:50 +03:00
chenyu
439d033af9
update the README matmul example ( #11167 )
...
don't call rand and numpy to show that it's indeed one kernel
2025-07-10 14:47:29 -04:00
qazal
bde80c0cdf
record GraphEvents in metal graph ( #11145 )
...
* record GraphEvents in metal graph
* add TestProfiler.test_graph, revert old stuff
* move profile capture to MetalGraph
* comment
* don't double record graph command buffers
* wait_check
* explicit delete
2025-07-10 21:32:06 +03:00
George Hotz
8ce3d5906b
use shape_str for tensor cores ( #11165 )
2025-07-10 09:10:36 -07:00
nimlgen
581397110f
nv: use classes in GSP_IP ( #11163 )
2025-07-10 17:47:12 +03:00
nimlgen
705de6b8a6
nv: parse sizes of ctx buffers ( #11161 )
2025-07-10 17:46:48 +03:00
qazal
dcc9704b6b
viz: profile RewriteSteps in TINY device ( #11125 )
...
* viz: profile RewriteSteps in TINY device
* use TracingKey with category
* split by whitespace
* add tracing.py
* work
* tracing_key
* TRACK_MATCH_STATS=3, can this be in defaults?
* fallback name
* work
* javascript
* measure text is slow
* checkout
* profile graph_rewrite/graph_rewrite_map
* change that
* no as
* finally
* work
* linking works
2025-07-10 17:45:57 +03:00
Pyry Kovanen
32117402dd
metal: fix incorrect _free on interpreter exit ( #11158 )
2025-07-10 14:01:30 +03:00
qazal
3d610f6d2b
viz: small ui cleanup ( #11157 )
...
* viz: small ui cleanup
* 2
2025-07-10 11:43:36 +03:00
chenyu
7db07e5f2c
don't narrow range of CAST on bool/unsigned ( #11156 )
2025-07-09 22:20:09 -04:00
George Hotz
e154a66f43
unroll axis 0 in tensor core ( #11155 )
...
* unroll is 0 in tc [pr]
* flip order of upcast/reduce in tensor core
* Revert "flip order of upcast/reduce in tensor core"
This reverts commit e564e38bcd .
2025-07-09 17:28:23 -07:00
George Hotz
b7742ad9e4
migrate to string swizzle [pr] ( #11154 )
2025-07-09 16:57:53 -07:00
George Hotz
4156baee93
break swizzle into three chunks [pr] ( #11153 )
...
* break swizzle into three chunks [pr]
* test failed
2025-07-09 15:30:34 -07:00
George Hotz
ca2dc95433
swizzle in tc can't be none [pr] ( #11152 )
2025-07-09 14:44:23 -07:00
George Hotz
53ae153404
tc should be in opt ( #11148 )
...
* tc should be in opt [pr]
* fix import
2025-07-09 14:12:21 -07:00
wozeparrot
6697d0089d
initial gfx950 kfd support ( #11151 )
...
* feat: initial gfx950 support
* fix: lint
2025-07-09 13:45:16 -07:00
George Hotz
262054be52
gfx950 tc support ( #11150 )
2025-07-09 13:30:42 -07:00
nimlgen
b6981404ed
memory: use page shifts in memory manager ( #11149 )
...
* memory: use page shifts in memory manager
* fix
2025-07-09 22:05:00 +03:00
qazal
5c1d215b41
viz: add Graph stream ( #11144 )
...
* viz: stack an event for the entire batch
* multi
* whitespace
* work
* multi graph, Graph gets its own row
2025-07-09 20:56:46 +03:00
George Hotz
22305260e0
move tc to tc.py [pr] ( #11147 )
2025-07-09 10:55:56 -07:00
George Hotz
2893feb9f6
cleanups for kernel.py ( #11143 )
...
* cleanups for kernel.py
* fixups
2025-07-08 18:10:25 -07:00
George Hotz
b11ca104e9
axis cleanups [pr] ( #11142 )
2025-07-08 17:07:26 -07:00
chenyu
7ce9e45474
mypy onnx_parser ( #11141 )
2025-07-08 19:50:28 -04:00
George Hotz
a1b8f3e64f
delete info from kernel [pr] ( #11139 )
...
* delete info from kernel [pr]
* update kernel info
* delete info
2025-07-08 15:53:13 -07:00
George Hotz
359bed74f8
axis type tracking [pr] ( #11137 )
...
* axis type tracking [pr]
* keep update_info
* keep legacy colors
* update tests to apply_opt
2025-07-08 14:16:25 -07:00
chenyu
dada3f5bf3
skip some new onnx tests ( #11135 )
...
these fails on master with latest onnx
2025-07-08 16:12:48 -04:00
chenyu
ffcc557986
lint onnx and onnx_parser ( #11134 )
2025-07-08 15:28:35 -04:00
George Hotz
3238d21cd1
add finalized to kernel [pr] ( #11132 )
...
* add finalized to kernel [pr]
* add copy
2025-07-08 11:06:17 -07:00
George Hotz
289a411f5f
hotfix: remove unused GBARRIER, CONTIGUOUS color is GBARRIER
2025-07-08 10:31:06 -07:00
nimlgen
43650169f4
nv: switch headers to 570.144 to match gsp ( #11131 )
2025-07-08 20:29:06 +03:00
quortus
790b05ab12
[pr] Unify CONTIGUOUS and GBARRIER ( #11121 )
...
* Unify CONTIGUOUS and GBARRIER
* Simplify rules
2025-07-08 10:27:23 -07:00
nimlgen
b516fe71b4
nv: return real struct in _alloc_boot_struct ( #11130 )
2025-07-08 20:04:43 +03:00
qazal
3dfc0ff887
move cpu_profile and shared ProfileEvents from device.py to helpers [pr] ( #11126 )
...
* move cpu_profile and shared ProfileEvents to helpers [pr]
* TestProfiler.test_cpu_profile
* update test_viz.py
* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
George Hotz
397826f0b4
add a test for 1B llm ( #11124 )
...
* add a test for 1B llm
* fix mbs
* add apps to release
2025-07-07 18:47:25 -07:00
George Hotz
f7d4638e05
start LLM app, tons of clean up required. target is 200 line ollama ( #11068 )
...
* start LLM app, tons of clean up required. target is 200 line ollama
* kind of works
* simpler
* add k/v cache
* with SYM=1, it loops
* no rope cache
* simpler
* more cleanups
* cleanups
* works
* argparse and comments
* from gguf
* generate is a function
* no copy from cpu
* fix max context pass in
* test
* improve test
* ai2_arc
* fix 8B, use less ram
* 136 lines
2025-07-07 17:09:46 -07:00
chenyu
341a686799
Tensor.diagonal ( #11122 )
...
only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant
2025-07-07 16:21:26 -04:00
Sieds Lykles
584fd6af5a
Fix division by zero and mask bug in add views ( #11088 )
...
* merge view infinite loop test
* adjust condition in `x//d -> x//(-d)*-1`
* Fix division by zero in add views
* adjust offset end
* fix typo in comment
* add target to test_merge_views_variable
* fix view incorrectly being masked
* ssimplify strides and offset of the new view to canonicalize
* remove print in test
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-07-07 10:05:47 -07:00
nimlgen
71377cd233
nv: parse falcon app descs ( #11118 )
2025-07-07 18:14:14 +03:00
nimlgen
9a573a1d99
nv: finalize nvdev ( #11117 )
...
* nv: finalize nvdev
* typo
2025-07-07 16:31:59 +03:00
nimlgen
fa59c05282
nv: import flags from system ( #11115 )
...
* nv: import flags from system
* not used
2025-07-07 14:46:49 +03:00
Nino Risteski
a1a146a499
adding enable_gqa in SDPA ( #11097 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-06 23:25:33 -07:00
nimlgen
b73e89110e
nv: align allocations for perf ( #11114 )
2025-07-06 22:32:11 +03:00
chenyu
7468959f4b
Tensor.argsort ( #11112 )
2025-07-06 13:56:35 -04:00
kevvz
b7af9cf849
clean svd tests, set full_matrices false in torch backend ( #11113 )
...
* clean tests, set full_matrices false
* add more shape asserts
2025-07-06 13:55:49 -04:00
qazal
a556f50668
viz: small ui fixes ( #11110 )
...
* share styling of ctx-list and metadata
* scrollbar-gutter: stable prevents layout shift when changing steps
* margin-left makes left side unaligned
2025-07-06 17:05:36 +03:00
chenyu
ba88ec3ad0
pipe linalg svd to torch ( #11109 )
...
and found a bug in svd
2025-07-06 08:37:25 -04:00
chenyu
845a4d32bc
Tensor.diag ( #11108 )
...
also updated Tensor.eye to use it
2025-07-05 23:03:02 -04:00