wozeparrot
30ce16a424
feat: failing test for long keccak ( #11292 )
2025-07-21 12:49:23 -04:00
uuuvn
178dbf3f66
Remote scheduler changes ( #11177 )
2025-07-21 09:29:44 -07:00
वेदांत
e368628736
Add amin support to Tensor operations in Torch backend ( #11290 )
...
* intiger div mod fix
* Revert "intiger div mod fix"
This reverts commit d5d2f201bf .
* feat arg_min support
* tets update
* test fix
2025-07-21 09:14:08 -04:00
qazal
5eb54e2499
viz: close event streams before profiler render ( #11300 )
2025-07-21 15:42:31 +03:00
nimlgen
cc3c1e4c14
hcq: move cpu to hcq ( #11262 )
...
* hcq: move cpu to hcq
* import time
* upd
* fix
* windows support
* hm
* cleaner
* fix timer
* fix timing
* std is ns
* skip profiler
* mypy
* cleaner
* cleanups
* after merge
* default is back
2025-07-21 15:10:38 +03:00
nimlgen
816c01c2d4
hcq: default copy_queue_t=None ( #11297 )
2025-07-21 14:45:20 +03:00
qazal
6520a7fcb6
viz: factorize event stream ( #11298 )
2025-07-21 14:42:00 +03:00
nimlgen
9c533e5c38
hcq: cpu prereq ( #11296 )
2025-07-21 13:35:18 +03:00
nimlgen
e87a42e243
hcq: prepare for windows ( #11293 )
...
* hcq: prepare for windows
* comments
2025-07-21 13:08:56 +03:00
nimlgen
df3ba0a7c0
autogen: fix imports in libusb ( #11294 )
2025-07-21 13:04:27 +03:00
nimlgen
dd6a2d432f
hcq: default timestamp metrics is ns ( #11295 )
2025-07-21 12:56:30 +03:00
wozeparrot
53345ef4e2
feat: make ops_disk work on block devices ( #11291 )
2025-07-20 14:39:50 -07:00
qazal
3002c63b1e
process replay: optionally pass tinygrad import error ( #11289 )
...
* process replay: optionally pass tinygrad import error
* gate all tinygrad internals
* s/getenv/os.getenv pre import
* diff
2025-07-20 22:57:56 +03:00
chenyu
9e3a593313
minor kernel.py cleanups [pr] ( #11286 )
2025-07-20 10:15:31 -04:00
quortus
5f17927a87
Shorten UOp.load method ( #11285 )
2025-07-20 13:48:04 +03:00
chenyu
54924f9969
type remove Union and Optional [pr] ( #11283 )
...
use `|` for consistency
2025-07-19 14:05:52 -04:00
nimlgen
2f72be5055
nv_smi: init basic insmod/rmmod/reset cmds ( #11282 )
2025-07-19 15:43:03 +03:00
qazal
577e581943
fix typo in sqtt/readme ( #11281 )
2025-07-19 15:10:24 +03:00
nimlgen
188ed38315
replace from_mv with lightweight mv_address ( #11280 )
2025-07-19 13:50:51 +03:00
quortus
1a25e27f32
Do not produce out of spec intermediate UOp in gated LOAD/STORE folding ( #11207 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-18 15:42:55 -04:00
chenyu
ec3efd2919
move upcast before reduce ( #11250 )
...
* move upcast before reduce
upcast goes to end of global+local+upcast
* r_196_32_4_24_8
2025-07-18 14:42:15 -04:00
chenyu
be2f4336e6
use onnx 1.18.0 in DSP test ( #11279 )
2025-07-18 14:09:23 -04:00
nimlgen
9a88bd841c
hcq: refactor into peer_groups ( #11277 )
...
* hcq: refactor into peer_groups
* fix fors
* fixes
* ooops
* mypy
* tiny fixes
2025-07-18 16:34:18 +03:00
nimlgen
f432eef708
hcq: rename CPU -> KICK in graph for kickoff signal ( #11278 )
2025-07-18 15:54:35 +03:00
quortus
52bbd9900b
[pr] Stable tensor order in _find_all_tensors_for_uops ( #11276 )
...
* Use dict for all_tensors to get stable tensor order in _find_all_tensors_for_uops
* Rerun tests
2025-07-18 13:12:01 +03:00
chenyu
c5a5d74642
Revert "image_dot of 2 half inputs returns half ( #11007 )" ( #11274 )
...
This reverts commit fa8e08f922 .
2025-07-17 17:34:18 -04:00
Utkarsh Gill
fa8e08f922
image_dot of 2 half inputs returns half ( #11007 )
...
* cast after sum
* comment out skipif
* minor fix
* only test IMAGE
* IMAGE is supported now
* simpler
* simplerr
* only cast if dtype is None
* dont need to change base_imaeg_type
* only cast when dtype is half
* add explicit test
* actually no, workflow seems better
* actually, keep both
* move test
* fix indent
---------
Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local >
2025-07-17 13:47:22 -07:00
geohotstan
536b254df4
Bump onnx to 1.18.0 ( #11266 )
...
* bump
* thou hast implement functions
* hacked in domain support
* some clean ups
* hack quantize_onnx_test too
* add helper lol, why onnx tests why
* better dispatcher, but need tests and better naming
* flaky ci
* change some names
* small clean ups
* make it easier to clean up tests once ORT supports 1.18.0
* nits
* fix bug of Softmax_1 being registered in onnx_ops
* need a default value
* resolve_const is better name
* fix OnnxRunner.to
* use proper domain names
2025-07-17 15:35:41 -04:00
qazal
1606491b1c
viz: refactor to generic shape spec ( #11272 )
2025-07-17 20:25:15 +03:00
nimlgen
cfb229473f
hcq: refactor buffer mapping ( #11271 )
...
* hcq: refactor buffer mapping
* fix
* fix mypy
2025-07-17 15:16:49 +03:00
qazal
e68af3b336
disable flaky assert in test_cpu_profile ( #11270 )
2025-07-17 06:50:39 +03:00
chenyu
60ffe00172
remove Kernel.first_reduce [pr] ( #11269 )
2025-07-16 18:30:14 -04:00
chenyu
522dc72f08
remove Kernel.local_dims [pr] ( #11268 )
...
* remove Kernel.local_dims [pr]
also not needed
* fix test_matvec
2025-07-16 17:46:19 -04:00
chenyu
d8c783f65f
remove Kernel.global_dims [pr] ( #11267 )
...
all reference to global used axis_types, so we don't need number of global helper that was used to locate GLOBAL
2025-07-16 17:16:49 -04:00
uuuvn
6f0ddcc24c
Remote cross-host graph ( #11229 )
2025-07-16 13:27:54 -07:00
nimlgen
6aa20c607d
nv: graceful shutdown to cold state ( #11265 )
2025-07-16 19:49:35 +03:00
chenyu
59b52d49d7
remove .global_dims that are for locating GLOBAL [pr] ( #11264 )
2025-07-16 11:19:31 -04:00
chenyu
e6c016ddd0
move check axis < shape_len to real_axis [pr] ( #11263 )
...
ensure output of real_axis is always valid
2025-07-16 10:15:44 -04:00
quortus
924bc7c9ae
Fix test_uop_spec ( #11259 )
2025-07-16 11:02:31 +03:00
chenyu
c8e5c4d7c3
insert_before -> insert_at [pr] ( #11257 )
...
more precise
2025-07-15 17:44:34 -04:00
wozeparrot
b32d9321fb
feat: more keccak cleanup + more explicit shape ( #11256 )
2025-07-15 13:57:47 -07:00
chenyu
9f79079cbe
update KernelInfo dims to return list of dims [pr] ( #11255 )
...
local dims are not contiguous once upcast sits between local and groupreduce
2025-07-15 15:01:39 -04:00
chenyu
629fa21b6b
remove final range in heuristic [pr] ( #11251 )
...
all dims are based on AxisType now
2025-07-15 11:39:15 -04:00
chenyu
d7adc24083
remove Kernel.first_upcast [pr] ( #11248 )
...
first_reduce does not need a default now
2025-07-15 10:21:34 -04:00
nimlgen
197d345804
nv: print rpc msg with DEBUG>=3 ( #11247 )
2025-07-15 16:39:58 +03:00
chenyu
034e51bd36
remove first_reduce used for locate real_axis [pr] ( #11245 )
...
LOCAL goes to the last of (GLOBAL+LOCAL)+1
GROUP goes to right before first REDUCE
2025-07-15 09:19:38 -04:00
chenyu
0e2422d216
Kernel.axes_of helper [pr] ( #11243 )
...
look up dim based on AxisType
2025-07-14 22:17:43 -04:00
chenyu
968f6b2a2e
remove hasattr(self, 'axis_types') checks in dims property [pr] ( #11242 )
...
no needed anymore
2025-07-14 20:59:51 -04:00
leopf
557ca7d757
testing SimpleTokenizer against OASST1 ( #11214 )
2025-07-14 17:09:31 -07:00
wozeparrot
5878b189b8
don't const fold shape changing bitcast ( #11236 )
2025-07-14 16:42:16 -07:00