George Hotz
a1b8f3e64f
delete info from kernel [pr] ( #11139 )
...
* delete info from kernel [pr]
* update kernel info
* delete info
2025-07-08 15:53:13 -07:00
George Hotz
359bed74f8
axis type tracking [pr] ( #11137 )
...
* axis type tracking [pr]
* keep update_info
* keep legacy colors
* update tests to apply_opt
2025-07-08 14:16:25 -07:00
chenyu
dada3f5bf3
skip some new onnx tests ( #11135 )
...
these fails on master with latest onnx
2025-07-08 16:12:48 -04:00
chenyu
ffcc557986
lint onnx and onnx_parser ( #11134 )
2025-07-08 15:28:35 -04:00
George Hotz
3238d21cd1
add finalized to kernel [pr] ( #11132 )
...
* add finalized to kernel [pr]
* add copy
2025-07-08 11:06:17 -07:00
George Hotz
289a411f5f
hotfix: remove unused GBARRIER, CONTIGUOUS color is GBARRIER
2025-07-08 10:31:06 -07:00
nimlgen
43650169f4
nv: switch headers to 570.144 to match gsp ( #11131 )
2025-07-08 20:29:06 +03:00
quortus
790b05ab12
[pr] Unify CONTIGUOUS and GBARRIER ( #11121 )
...
* Unify CONTIGUOUS and GBARRIER
* Simplify rules
2025-07-08 10:27:23 -07:00
nimlgen
b516fe71b4
nv: return real struct in _alloc_boot_struct ( #11130 )
2025-07-08 20:04:43 +03:00
qazal
3dfc0ff887
move cpu_profile and shared ProfileEvents from device.py to helpers [pr] ( #11126 )
...
* move cpu_profile and shared ProfileEvents to helpers [pr]
* TestProfiler.test_cpu_profile
* update test_viz.py
* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
George Hotz
397826f0b4
add a test for 1B llm ( #11124 )
...
* add a test for 1B llm
* fix mbs
* add apps to release
2025-07-07 18:47:25 -07:00
George Hotz
f7d4638e05
start LLM app, tons of clean up required. target is 200 line ollama ( #11068 )
...
* start LLM app, tons of clean up required. target is 200 line ollama
* kind of works
* simpler
* add k/v cache
* with SYM=1, it loops
* no rope cache
* simpler
* more cleanups
* cleanups
* works
* argparse and comments
* from gguf
* generate is a function
* no copy from cpu
* fix max context pass in
* test
* improve test
* ai2_arc
* fix 8B, use less ram
* 136 lines
2025-07-07 17:09:46 -07:00
chenyu
341a686799
Tensor.diagonal ( #11122 )
...
only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant
2025-07-07 16:21:26 -04:00
Sieds Lykles
584fd6af5a
Fix division by zero and mask bug in add views ( #11088 )
...
* merge view infinite loop test
* adjust condition in `x//d -> x//(-d)*-1`
* Fix division by zero in add views
* adjust offset end
* fix typo in comment
* add target to test_merge_views_variable
* fix view incorrectly being masked
* ssimplify strides and offset of the new view to canonicalize
* remove print in test
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-07-07 10:05:47 -07:00
nimlgen
71377cd233
nv: parse falcon app descs ( #11118 )
2025-07-07 18:14:14 +03:00
nimlgen
9a573a1d99
nv: finalize nvdev ( #11117 )
...
* nv: finalize nvdev
* typo
2025-07-07 16:31:59 +03:00
nimlgen
fa59c05282
nv: import flags from system ( #11115 )
...
* nv: import flags from system
* not used
2025-07-07 14:46:49 +03:00
Nino Risteski
a1a146a499
adding enable_gqa in SDPA ( #11097 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-06 23:25:33 -07:00
nimlgen
b73e89110e
nv: align allocations for perf ( #11114 )
2025-07-06 22:32:11 +03:00
chenyu
7468959f4b
Tensor.argsort ( #11112 )
2025-07-06 13:56:35 -04:00
kevvz
b7af9cf849
clean svd tests, set full_matrices false in torch backend ( #11113 )
...
* clean tests, set full_matrices false
* add more shape asserts
2025-07-06 13:55:49 -04:00
qazal
a556f50668
viz: small ui fixes ( #11110 )
...
* share styling of ctx-list and metadata
* scrollbar-gutter: stable prevents layout shift when changing steps
* margin-left makes left side unaligned
2025-07-06 17:05:36 +03:00
chenyu
ba88ec3ad0
pipe linalg svd to torch ( #11109 )
...
and found a bug in svd
2025-07-06 08:37:25 -04:00
chenyu
845a4d32bc
Tensor.diag ( #11108 )
...
also updated Tensor.eye to use it
2025-07-05 23:03:02 -04:00
ttomsa
4905af4ae0
remove invalid int div test ( #11106 )
...
* rm test
* also rm this
2025-07-05 18:57:55 -04:00
qazal
a4aa769c0a
fix: type checking for track_rewrites key [pr] ( #11104 )
...
* fix: type checking for track_rewrites key [pr]
* also for cpu_profile
* func.__name__ to start
2025-07-05 20:11:21 +03:00
qazal
81781dc12b
viz: renames and spacing changes to tracing ( #11102 )
2025-07-05 18:40:39 +03:00
qazal
7619bf35e7
cleanup: remove disabled TestIndexingOrdering ( #11101 )
...
* cleanup: remove disabled TestIndexingOrdering
* don't import kernelize internals
2025-07-05 18:14:37 +03:00
qazal
4fcfaa0ef7
viz: switch to TracingKey ( #11100 )
...
* viz: switch to TracingKey
* tuple
* order is name, keys, fmt
* add test_tracing_key
2025-07-05 17:46:18 +03:00
qazal
458be950d9
viz: add TINY device ( #11095 )
...
* viz: add TINY device
* replace Any with a proper type
* reorder
* diff
* rename
* space
* from diff
* multiple keys
2025-07-05 16:54:55 +03:00
nimlgen
4dccb2ea49
am_smi: increase kill retries ( #11099 )
2025-07-05 16:23:50 +03:00
chenyu
39b4d72687
remove flatten and reshape in sparse_categorical_crossentropy [pr] ( #11093 )
...
not needed, directly operating on the classes dim is fine
2025-07-04 15:15:27 -04:00
nimlgen
577afc9f05
hcq: remove redunt syncs and fix typing ( #11096 )
...
Before this patch the code could issues reduntdant syncs because of
the typing issue. Current tests should cover all correctness checks.
2025-07-04 21:49:47 +03:00
qazal
41aa54eb5a
viz: resolve all graph references in python ( #11087 )
...
* viz: resolve all graph references in python
* it just maps things to the index
* always map the name
* key on the uop
* diff
* close
2025-07-04 20:35:25 +03:00
qazal
3d8569f6d8
hotfix: infinite loop in tracking pattern matcher ( #11094 )
...
* failing test
* fix that
* given matchers
2025-07-04 19:55:26 +03:00
qazal
a783211fc7
viz: allow end_time=None in trace events ( #11092 )
2025-07-04 17:48:17 +03:00
0xSG
17119b0f23
hip_ioctl: platform.machine added ( #11084 )
2025-07-04 17:20:24 +03:00
nimlgen
6656aa162c
nv: enable huge pages ( #11091 )
2025-07-04 17:17:24 +03:00
nimlgen
01f3c4f44d
memory: simpler paddr allocation logic ( #11090 )
...
* memory: new paddr allocation logic
* am fix
* am refactrros
* fix
* mypy
* use it
* am
2025-07-04 17:00:36 +03:00
qazal
f6d55d9272
viz: pickle UPat location ( #11086 )
2025-07-04 13:09:00 +03:00
qazal
2403f126ed
move printable out of UPat [pr] ( #11085 )
...
* move printable out of UPat [pr]
* print_match_stats
2025-07-04 12:31:11 +03:00
qazal
988540f401
support capturing cpu_profile on error ( #11078 )
...
* support capturing cpu_profile on error
* spacing
* pylint complains
2025-07-04 11:53:12 +03:00
chenyu
a2f5a54458
move sparse_categorical_crossentropy to test_ops ( #11083 )
...
also flattened the tests
2025-07-03 21:40:54 -04:00
chenyu
7c8ccb0267
sparse_categorical_crossentropy cleanup [pr] ( #11082 )
2025-07-03 18:32:52 -04:00
nimlgen
e02ee8ef1b
nv: cleanups from 5090 ( #11081 )
2025-07-04 00:08:47 +03:00
George Hotz
e9a01dd04a
Revert "Fix division by zero in add views ( #11075 )" ( #11080 )
...
This reverts commit 19f07e72f6 .
2025-07-03 11:39:44 -07:00
Sieds Lykles
19f07e72f6
Fix division by zero in add views ( #11075 )
2025-07-03 11:37:59 -07:00
chenyu
678cabc6f2
use argfix in Tensor.stack ( #11077 )
...
works for multiple Tensor args or single tuple/list of Tensors, but not the mixed
2025-07-03 12:15:11 -04:00
qazal
b695e8c4d6
viz: remove support for naming with self ( #11076 )
2025-07-03 17:29:14 +03:00
Sieds Lykles
53985297bd
add test, fix rewrite rule and raise error on division by zero ( #11073 )
2025-07-03 08:25:06 -04:00