Commit Graph

10417 Commits

Author SHA1 Message Date
George Hotz
ca2dc95433 swizzle in tc can't be none [pr] (#11152) 2025-07-09 14:44:23 -07:00
George Hotz
53ae153404 tc should be in opt (#11148)
* tc should be in opt [pr]

* fix import
2025-07-09 14:12:21 -07:00
wozeparrot
6697d0089d initial gfx950 kfd support (#11151)
* feat: initial gfx950 support

* fix: lint
2025-07-09 13:45:16 -07:00
George Hotz
262054be52 gfx950 tc support (#11150) 2025-07-09 13:30:42 -07:00
nimlgen
b6981404ed memory: use page shifts in memory manager (#11149)
* memory: use page shifts in memory manager

* fix
2025-07-09 22:05:00 +03:00
qazal
5c1d215b41 viz: add Graph stream (#11144)
* viz: stack an event for the entire batch

* multi

* whitespace

* work

* multi graph, Graph gets its own row
2025-07-09 20:56:46 +03:00
George Hotz
22305260e0 move tc to tc.py [pr] (#11147) 2025-07-09 10:55:56 -07:00
George Hotz
2893feb9f6 cleanups for kernel.py (#11143)
* cleanups for kernel.py

* fixups
2025-07-08 18:10:25 -07:00
George Hotz
b11ca104e9 axis cleanups [pr] (#11142) 2025-07-08 17:07:26 -07:00
chenyu
7ce9e45474 mypy onnx_parser (#11141) 2025-07-08 19:50:28 -04:00
George Hotz
a1b8f3e64f delete info from kernel [pr] (#11139)
* delete info from kernel [pr]

* update kernel info

* delete info
2025-07-08 15:53:13 -07:00
George Hotz
359bed74f8 axis type tracking [pr] (#11137)
* axis type tracking [pr]

* keep update_info

* keep legacy colors

* update tests to apply_opt
2025-07-08 14:16:25 -07:00
chenyu
dada3f5bf3 skip some new onnx tests (#11135)
these fails on master with latest onnx
2025-07-08 16:12:48 -04:00
chenyu
ffcc557986 lint onnx and onnx_parser (#11134) 2025-07-08 15:28:35 -04:00
George Hotz
3238d21cd1 add finalized to kernel [pr] (#11132)
* add finalized to kernel [pr]

* add copy
2025-07-08 11:06:17 -07:00
George Hotz
289a411f5f hotfix: remove unused GBARRIER, CONTIGUOUS color is GBARRIER 2025-07-08 10:31:06 -07:00
nimlgen
43650169f4 nv: switch headers to 570.144 to match gsp (#11131) 2025-07-08 20:29:06 +03:00
quortus
790b05ab12 [pr] Unify CONTIGUOUS and GBARRIER (#11121)
* Unify CONTIGUOUS and GBARRIER

* Simplify rules
2025-07-08 10:27:23 -07:00
nimlgen
b516fe71b4 nv: return real struct in _alloc_boot_struct (#11130) 2025-07-08 20:04:43 +03:00
qazal
3dfc0ff887 move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126)
* move cpu_profile and shared ProfileEvents to helpers [pr]

* TestProfiler.test_cpu_profile

* update test_viz.py

* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
George Hotz
397826f0b4 add a test for 1B llm (#11124)
* add a test for 1B llm

* fix mbs

* add apps to release
2025-07-07 18:47:25 -07:00
George Hotz
f7d4638e05 start LLM app, tons of clean up required. target is 200 line ollama (#11068)
* start LLM app, tons of clean up required. target is 200 line ollama

* kind of works

* simpler

* add k/v cache

* with SYM=1, it loops

* no rope cache

* simpler

* more cleanups

* cleanups

* works

* argparse and comments

* from gguf

* generate is a function

* no copy from cpu

* fix max context pass in

* test

* improve test

* ai2_arc

* fix 8B, use less ram

* 136 lines
2025-07-07 17:09:46 -07:00
chenyu
341a686799 Tensor.diagonal (#11122)
only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant
2025-07-07 16:21:26 -04:00
Sieds Lykles
584fd6af5a Fix division by zero and mask bug in add views (#11088)
* merge view infinite loop test

* adjust condition in `x//d -> x//(-d)*-1`

* Fix division by zero in add views

* adjust offset end

* fix typo in comment

* add target to test_merge_views_variable

* fix view incorrectly being masked

* ssimplify strides and offset of the new view to canonicalize

* remove print in test

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-07-07 10:05:47 -07:00
nimlgen
71377cd233 nv: parse falcon app descs (#11118) 2025-07-07 18:14:14 +03:00
nimlgen
9a573a1d99 nv: finalize nvdev (#11117)
* nv: finalize nvdev

* typo
2025-07-07 16:31:59 +03:00
nimlgen
fa59c05282 nv: import flags from system (#11115)
* nv: import flags from system

* not used
2025-07-07 14:46:49 +03:00
Nino Risteski
a1a146a499 adding enable_gqa in SDPA (#11097)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-06 23:25:33 -07:00
nimlgen
b73e89110e nv: align allocations for perf (#11114) 2025-07-06 22:32:11 +03:00
chenyu
7468959f4b Tensor.argsort (#11112) 2025-07-06 13:56:35 -04:00
kevvz
b7af9cf849 clean svd tests, set full_matrices false in torch backend (#11113)
* clean tests, set full_matrices false

* add more shape asserts
2025-07-06 13:55:49 -04:00
qazal
a556f50668 viz: small ui fixes (#11110)
* share styling of ctx-list and metadata

* scrollbar-gutter: stable prevents layout shift when changing steps

* margin-left makes left side unaligned
2025-07-06 17:05:36 +03:00
chenyu
ba88ec3ad0 pipe linalg svd to torch (#11109)
and found a bug in svd
2025-07-06 08:37:25 -04:00
chenyu
845a4d32bc Tensor.diag (#11108)
also updated Tensor.eye to use it
2025-07-05 23:03:02 -04:00
ttomsa
4905af4ae0 remove invalid int div test (#11106)
* rm test

* also rm this
2025-07-05 18:57:55 -04:00
qazal
a4aa769c0a fix: type checking for track_rewrites key [pr] (#11104)
* fix: type checking for track_rewrites key [pr]

* also for cpu_profile

* func.__name__ to start
2025-07-05 20:11:21 +03:00
qazal
81781dc12b viz: renames and spacing changes to tracing (#11102) 2025-07-05 18:40:39 +03:00
qazal
7619bf35e7 cleanup: remove disabled TestIndexingOrdering (#11101)
* cleanup: remove disabled TestIndexingOrdering

* don't import kernelize internals
2025-07-05 18:14:37 +03:00
qazal
4fcfaa0ef7 viz: switch to TracingKey (#11100)
* viz: switch to TracingKey

* tuple

* order is name, keys, fmt

* add test_tracing_key
2025-07-05 17:46:18 +03:00
qazal
458be950d9 viz: add TINY device (#11095)
* viz: add TINY device

* replace Any with a proper type

* reorder

* diff

* rename

* space

* from diff

* multiple keys
2025-07-05 16:54:55 +03:00
nimlgen
4dccb2ea49 am_smi: increase kill retries (#11099) 2025-07-05 16:23:50 +03:00
chenyu
39b4d72687 remove flatten and reshape in sparse_categorical_crossentropy [pr] (#11093)
not needed, directly operating on the classes dim is fine
2025-07-04 15:15:27 -04:00
nimlgen
577afc9f05 hcq: remove redunt syncs and fix typing (#11096)
Before this patch the code could issues reduntdant syncs because of
the typing issue. Current tests should cover all correctness checks.
2025-07-04 21:49:47 +03:00
qazal
41aa54eb5a viz: resolve all graph references in python (#11087)
* viz: resolve all graph references in python

* it just maps things to the index

* always map the name

* key on the uop

* diff

* close
2025-07-04 20:35:25 +03:00
qazal
3d8569f6d8 hotfix: infinite loop in tracking pattern matcher (#11094)
* failing test

* fix that

* given matchers
2025-07-04 19:55:26 +03:00
qazal
a783211fc7 viz: allow end_time=None in trace events (#11092) 2025-07-04 17:48:17 +03:00
0xSG
17119b0f23 hip_ioctl: platform.machine added (#11084) 2025-07-04 17:20:24 +03:00
nimlgen
6656aa162c nv: enable huge pages (#11091) 2025-07-04 17:17:24 +03:00
nimlgen
01f3c4f44d memory: simpler paddr allocation logic (#11090)
* memory: new paddr allocation logic

* am fix

* am refactrros

* fix

* mypy

* use it

* am
2025-07-04 17:00:36 +03:00
qazal
f6d55d9272 viz: pickle UPat location (#11086) 2025-07-04 13:09:00 +03:00