Commit Graph

10417 Commits

Author SHA1 Message Date
nimlgen
2f72be5055 nv_smi: init basic insmod/rmmod/reset cmds (#11282) 2025-07-19 15:43:03 +03:00
qazal
577e581943 fix typo in sqtt/readme (#11281) 2025-07-19 15:10:24 +03:00
nimlgen
188ed38315 replace from_mv with lightweight mv_address (#11280) 2025-07-19 13:50:51 +03:00
quortus
1a25e27f32 Do not produce out of spec intermediate UOp in gated LOAD/STORE folding (#11207)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-18 15:42:55 -04:00
chenyu
ec3efd2919 move upcast before reduce (#11250)
* move upcast before reduce

upcast goes to end of global+local+upcast

* r_196_32_4_24_8
2025-07-18 14:42:15 -04:00
chenyu
be2f4336e6 use onnx 1.18.0 in DSP test (#11279) 2025-07-18 14:09:23 -04:00
nimlgen
9a88bd841c hcq: refactor into peer_groups (#11277)
* hcq: refactor into peer_groups

* fix fors

* fixes

* ooops

* mypy

* tiny fixes
2025-07-18 16:34:18 +03:00
nimlgen
f432eef708 hcq: rename CPU -> KICK in graph for kickoff signal (#11278) 2025-07-18 15:54:35 +03:00
quortus
52bbd9900b [pr] Stable tensor order in _find_all_tensors_for_uops (#11276)
* Use dict for all_tensors to get stable tensor order in _find_all_tensors_for_uops

* Rerun tests
2025-07-18 13:12:01 +03:00
chenyu
c5a5d74642 Revert "image_dot of 2 half inputs returns half (#11007)" (#11274)
This reverts commit fa8e08f922.
2025-07-17 17:34:18 -04:00
Utkarsh Gill
fa8e08f922 image_dot of 2 half inputs returns half (#11007)
* cast after sum

* comment out skipif

* minor fix

* only test IMAGE

* IMAGE is supported now

* simpler

* simplerr

* only cast if dtype is None

* dont need to change base_imaeg_type

* only cast when dtype is half

* add explicit test

* actually no, workflow seems better

* actually, keep both

* move test

* fix indent

---------

Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>
2025-07-17 13:47:22 -07:00
geohotstan
536b254df4 Bump onnx to 1.18.0 (#11266)
* bump

* thou hast implement functions

* hacked in domain support

* some clean ups

* hack quantize_onnx_test too

* add helper lol, why onnx tests why

* better dispatcher, but need tests and better naming

* flaky ci

* change some names

* small clean ups

* make it easier to clean up tests once ORT supports 1.18.0

* nits

* fix bug of Softmax_1 being registered in onnx_ops

* need a default value

* resolve_const is better name

* fix OnnxRunner.to

* use proper domain names
2025-07-17 15:35:41 -04:00
qazal
1606491b1c viz: refactor to generic shape spec (#11272) 2025-07-17 20:25:15 +03:00
nimlgen
cfb229473f hcq: refactor buffer mapping (#11271)
* hcq: refactor buffer mapping

* fix

* fix mypy
2025-07-17 15:16:49 +03:00
qazal
e68af3b336 disable flaky assert in test_cpu_profile (#11270) 2025-07-17 06:50:39 +03:00
chenyu
60ffe00172 remove Kernel.first_reduce [pr] (#11269) 2025-07-16 18:30:14 -04:00
chenyu
522dc72f08 remove Kernel.local_dims [pr] (#11268)
* remove Kernel.local_dims [pr]

also not needed

* fix test_matvec
2025-07-16 17:46:19 -04:00
chenyu
d8c783f65f remove Kernel.global_dims [pr] (#11267)
all reference to global used axis_types, so we don't need number of global helper that was used to locate GLOBAL
2025-07-16 17:16:49 -04:00
uuuvn
6f0ddcc24c Remote cross-host graph (#11229) 2025-07-16 13:27:54 -07:00
nimlgen
6aa20c607d nv: graceful shutdown to cold state (#11265) 2025-07-16 19:49:35 +03:00
chenyu
59b52d49d7 remove .global_dims that are for locating GLOBAL [pr] (#11264) 2025-07-16 11:19:31 -04:00
chenyu
e6c016ddd0 move check axis < shape_len to real_axis [pr] (#11263)
ensure output of real_axis is always valid
2025-07-16 10:15:44 -04:00
quortus
924bc7c9ae Fix test_uop_spec (#11259) 2025-07-16 11:02:31 +03:00
chenyu
c8e5c4d7c3 insert_before -> insert_at [pr] (#11257)
more precise
2025-07-15 17:44:34 -04:00
wozeparrot
b32d9321fb feat: more keccak cleanup + more explicit shape (#11256) 2025-07-15 13:57:47 -07:00
chenyu
9f79079cbe update KernelInfo dims to return list of dims [pr] (#11255)
local dims are not contiguous once upcast sits between local and groupreduce
2025-07-15 15:01:39 -04:00
chenyu
629fa21b6b remove final range in heuristic [pr] (#11251)
all dims are based on AxisType now
2025-07-15 11:39:15 -04:00
chenyu
d7adc24083 remove Kernel.first_upcast [pr] (#11248)
first_reduce does not need a default now
2025-07-15 10:21:34 -04:00
nimlgen
197d345804 nv: print rpc msg with DEBUG>=3 (#11247) 2025-07-15 16:39:58 +03:00
chenyu
034e51bd36 remove first_reduce used for locate real_axis [pr] (#11245)
LOCAL goes to the last of (GLOBAL+LOCAL)+1
GROUP goes to right before first REDUCE
2025-07-15 09:19:38 -04:00
chenyu
0e2422d216 Kernel.axes_of helper [pr] (#11243)
look up dim based on AxisType
2025-07-14 22:17:43 -04:00
chenyu
968f6b2a2e remove hasattr(self, 'axis_types') checks in dims property [pr] (#11242)
no needed anymore
2025-07-14 20:59:51 -04:00
leopf
557ca7d757 testing SimpleTokenizer against OASST1 (#11214) 2025-07-14 17:09:31 -07:00
wozeparrot
5878b189b8 don't const fold shape changing bitcast (#11236) 2025-07-14 16:42:16 -07:00
chenyu
b6662096cb remove more first_reduce [pr] (#11239) 2025-07-14 19:13:44 -04:00
chenyu
eb8e17ef59 remove most of the first_upcast [pr] (#11238) 2025-07-14 16:54:24 -04:00
qazal
c78b1cbae7 viz profiler cleanups (#11234)
* move all render calls to zoom callback

* cleanup the naming

* require transform arg
2025-07-14 19:06:33 +03:00
chenyu
36ce883c7d update heuristic to use k.upcastable_dims and k.unrollable_dims [pr] (#11233)
idea is to make it behave the same regardless of axis order and with empty 1s in shape.

not quite fully remove all first_upcast yet because some conditions used already upcasted size which need a separate benchmark to remove.
2025-07-14 11:10:30 -04:00
qazal
c0c695dd89 viz: remove extra transform (#11232) 2025-07-14 16:51:47 +03:00
chenyu
da219199f5 minor hcopt cleanup [pr] (#11231) 2025-07-14 09:36:25 -04:00
nimlgen
756ba1a5f9 nv: support ampere in nvpci (#11230) 2025-07-14 15:35:44 +03:00
uuuvn
b2cc6cfa1b JIT_BATCH_SIZE is a ContextVar (#11228) 2025-07-14 14:03:45 +03:00
nimlgen
c4a920d95c nv: use last signature (#11227) 2025-07-14 13:00:39 +03:00
nimlgen
a830d37881 nv: check wpr2 is inited (#11226) 2025-07-14 11:46:14 +03:00
chenyu
0387bb9630 clean up image upcast in hcopt [pr] (#11220)
GLOBAL+LOCAL for upcast
GROUP_REDUCE+REDUCE for unroll
2025-07-13 18:06:43 -04:00
chenyu
85ddd72038 simpler grouptop in hcopt (#11219)
* simpler grouptop in hcopt

keep the only perf relevant conditions and the rest is handled by try except

* update openpilot read image count
2025-07-13 16:06:09 -04:00
qazal
40847ca29c viz: prune out of screen rects (#11217) 2025-07-13 21:49:59 +03:00
chenyu
674dc28505 remove Kernel.full_unupcasted_shape [pr] (#11215)
decomp to shape_len and first_upcast to get the last upcast-able dim
2025-07-13 13:56:23 -04:00
chenyu
9575cf6c6e shave more hcopt [pr] (#11213)
start to use AxisType for conditions
2025-07-13 12:43:58 -04:00
Alisher Zhubanyshev
4ef6b46b34 hcq: reduce launch overhead (#11193)
* nv: improve mmio creation speed

* add memoryview test

* fix indents

* move mv bench to `test_helpers`, remove comparison
2025-07-13 19:25:50 +03:00