nimlgen
2f72be5055
nv_smi: init basic insmod/rmmod/reset cmds ( #11282 )
2025-07-19 15:43:03 +03:00
qazal
577e581943
fix typo in sqtt/readme ( #11281 )
2025-07-19 15:10:24 +03:00
nimlgen
188ed38315
replace from_mv with lightweight mv_address ( #11280 )
2025-07-19 13:50:51 +03:00
quortus
1a25e27f32
Do not produce out of spec intermediate UOp in gated LOAD/STORE folding ( #11207 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-18 15:42:55 -04:00
chenyu
ec3efd2919
move upcast before reduce ( #11250 )
...
* move upcast before reduce
upcast goes to end of global+local+upcast
* r_196_32_4_24_8
2025-07-18 14:42:15 -04:00
chenyu
be2f4336e6
use onnx 1.18.0 in DSP test ( #11279 )
2025-07-18 14:09:23 -04:00
nimlgen
9a88bd841c
hcq: refactor into peer_groups ( #11277 )
...
* hcq: refactor into peer_groups
* fix fors
* fixes
* ooops
* mypy
* tiny fixes
2025-07-18 16:34:18 +03:00
nimlgen
f432eef708
hcq: rename CPU -> KICK in graph for kickoff signal ( #11278 )
2025-07-18 15:54:35 +03:00
quortus
52bbd9900b
[pr] Stable tensor order in _find_all_tensors_for_uops ( #11276 )
...
* Use dict for all_tensors to get stable tensor order in _find_all_tensors_for_uops
* Rerun tests
2025-07-18 13:12:01 +03:00
chenyu
c5a5d74642
Revert "image_dot of 2 half inputs returns half ( #11007 )" ( #11274 )
...
This reverts commit fa8e08f922 .
2025-07-17 17:34:18 -04:00
Utkarsh Gill
fa8e08f922
image_dot of 2 half inputs returns half ( #11007 )
...
* cast after sum
* comment out skipif
* minor fix
* only test IMAGE
* IMAGE is supported now
* simpler
* simplerr
* only cast if dtype is None
* dont need to change base_imaeg_type
* only cast when dtype is half
* add explicit test
* actually no, workflow seems better
* actually, keep both
* move test
* fix indent
---------
Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local >
2025-07-17 13:47:22 -07:00
geohotstan
536b254df4
Bump onnx to 1.18.0 ( #11266 )
...
* bump
* thou hast implement functions
* hacked in domain support
* some clean ups
* hack quantize_onnx_test too
* add helper lol, why onnx tests why
* better dispatcher, but need tests and better naming
* flaky ci
* change some names
* small clean ups
* make it easier to clean up tests once ORT supports 1.18.0
* nits
* fix bug of Softmax_1 being registered in onnx_ops
* need a default value
* resolve_const is better name
* fix OnnxRunner.to
* use proper domain names
2025-07-17 15:35:41 -04:00
qazal
1606491b1c
viz: refactor to generic shape spec ( #11272 )
2025-07-17 20:25:15 +03:00
nimlgen
cfb229473f
hcq: refactor buffer mapping ( #11271 )
...
* hcq: refactor buffer mapping
* fix
* fix mypy
2025-07-17 15:16:49 +03:00
qazal
e68af3b336
disable flaky assert in test_cpu_profile ( #11270 )
2025-07-17 06:50:39 +03:00
chenyu
60ffe00172
remove Kernel.first_reduce [pr] ( #11269 )
2025-07-16 18:30:14 -04:00
chenyu
522dc72f08
remove Kernel.local_dims [pr] ( #11268 )
...
* remove Kernel.local_dims [pr]
also not needed
* fix test_matvec
2025-07-16 17:46:19 -04:00
chenyu
d8c783f65f
remove Kernel.global_dims [pr] ( #11267 )
...
all reference to global used axis_types, so we don't need number of global helper that was used to locate GLOBAL
2025-07-16 17:16:49 -04:00
uuuvn
6f0ddcc24c
Remote cross-host graph ( #11229 )
2025-07-16 13:27:54 -07:00
nimlgen
6aa20c607d
nv: graceful shutdown to cold state ( #11265 )
2025-07-16 19:49:35 +03:00
chenyu
59b52d49d7
remove .global_dims that are for locating GLOBAL [pr] ( #11264 )
2025-07-16 11:19:31 -04:00
chenyu
e6c016ddd0
move check axis < shape_len to real_axis [pr] ( #11263 )
...
ensure output of real_axis is always valid
2025-07-16 10:15:44 -04:00
quortus
924bc7c9ae
Fix test_uop_spec ( #11259 )
2025-07-16 11:02:31 +03:00
chenyu
c8e5c4d7c3
insert_before -> insert_at [pr] ( #11257 )
...
more precise
2025-07-15 17:44:34 -04:00
wozeparrot
b32d9321fb
feat: more keccak cleanup + more explicit shape ( #11256 )
2025-07-15 13:57:47 -07:00
chenyu
9f79079cbe
update KernelInfo dims to return list of dims [pr] ( #11255 )
...
local dims are not contiguous once upcast sits between local and groupreduce
2025-07-15 15:01:39 -04:00
chenyu
629fa21b6b
remove final range in heuristic [pr] ( #11251 )
...
all dims are based on AxisType now
2025-07-15 11:39:15 -04:00
chenyu
d7adc24083
remove Kernel.first_upcast [pr] ( #11248 )
...
first_reduce does not need a default now
2025-07-15 10:21:34 -04:00
nimlgen
197d345804
nv: print rpc msg with DEBUG>=3 ( #11247 )
2025-07-15 16:39:58 +03:00
chenyu
034e51bd36
remove first_reduce used for locate real_axis [pr] ( #11245 )
...
LOCAL goes to the last of (GLOBAL+LOCAL)+1
GROUP goes to right before first REDUCE
2025-07-15 09:19:38 -04:00
chenyu
0e2422d216
Kernel.axes_of helper [pr] ( #11243 )
...
look up dim based on AxisType
2025-07-14 22:17:43 -04:00
chenyu
968f6b2a2e
remove hasattr(self, 'axis_types') checks in dims property [pr] ( #11242 )
...
no needed anymore
2025-07-14 20:59:51 -04:00
leopf
557ca7d757
testing SimpleTokenizer against OASST1 ( #11214 )
2025-07-14 17:09:31 -07:00
wozeparrot
5878b189b8
don't const fold shape changing bitcast ( #11236 )
2025-07-14 16:42:16 -07:00
chenyu
b6662096cb
remove more first_reduce [pr] ( #11239 )
2025-07-14 19:13:44 -04:00
chenyu
eb8e17ef59
remove most of the first_upcast [pr] ( #11238 )
2025-07-14 16:54:24 -04:00
qazal
c78b1cbae7
viz profiler cleanups ( #11234 )
...
* move all render calls to zoom callback
* cleanup the naming
* require transform arg
2025-07-14 19:06:33 +03:00
chenyu
36ce883c7d
update heuristic to use k.upcastable_dims and k.unrollable_dims [pr] ( #11233 )
...
idea is to make it behave the same regardless of axis order and with empty 1s in shape.
not quite fully remove all first_upcast yet because some conditions used already upcasted size which need a separate benchmark to remove.
2025-07-14 11:10:30 -04:00
qazal
c0c695dd89
viz: remove extra transform ( #11232 )
2025-07-14 16:51:47 +03:00
chenyu
da219199f5
minor hcopt cleanup [pr] ( #11231 )
2025-07-14 09:36:25 -04:00
nimlgen
756ba1a5f9
nv: support ampere in nvpci ( #11230 )
2025-07-14 15:35:44 +03:00
uuuvn
b2cc6cfa1b
JIT_BATCH_SIZE is a ContextVar ( #11228 )
2025-07-14 14:03:45 +03:00
nimlgen
c4a920d95c
nv: use last signature ( #11227 )
2025-07-14 13:00:39 +03:00
nimlgen
a830d37881
nv: check wpr2 is inited ( #11226 )
2025-07-14 11:46:14 +03:00
chenyu
0387bb9630
clean up image upcast in hcopt [pr] ( #11220 )
...
GLOBAL+LOCAL for upcast
GROUP_REDUCE+REDUCE for unroll
2025-07-13 18:06:43 -04:00
chenyu
85ddd72038
simpler grouptop in hcopt ( #11219 )
...
* simpler grouptop in hcopt
keep the only perf relevant conditions and the rest is handled by try except
* update openpilot read image count
2025-07-13 16:06:09 -04:00
qazal
40847ca29c
viz: prune out of screen rects ( #11217 )
2025-07-13 21:49:59 +03:00
chenyu
674dc28505
remove Kernel.full_unupcasted_shape [pr] ( #11215 )
...
decomp to shape_len and first_upcast to get the last upcast-able dim
2025-07-13 13:56:23 -04:00
chenyu
9575cf6c6e
shave more hcopt [pr] ( #11213 )
...
start to use AxisType for conditions
2025-07-13 12:43:58 -04:00
Alisher Zhubanyshev
4ef6b46b34
hcq: reduce launch overhead ( #11193 )
...
* nv: improve mmio creation speed
* add memoryview test
* fix indents
* move mv bench to `test_helpers`, remove comparison
2025-07-13 19:25:50 +03:00