Commit Graph

10633 Commits

Author SHA1 Message Date
wozeparrot
30ce16a424 feat: failing test for long keccak (#11292) 2025-07-21 12:49:23 -04:00
uuuvn
178dbf3f66 Remote scheduler changes (#11177) 2025-07-21 09:29:44 -07:00
वेदांत
e368628736 Add amin support to Tensor operations in Torch backend (#11290)
* intiger div mod fix

* Revert "intiger div mod fix"

This reverts commit d5d2f201bf.

* feat arg_min support

* tets update

* test fix
2025-07-21 09:14:08 -04:00
qazal
5eb54e2499 viz: close event streams before profiler render (#11300) 2025-07-21 15:42:31 +03:00
nimlgen
cc3c1e4c14 hcq: move cpu to hcq (#11262)
* hcq: move cpu to hcq

* import time

* upd

* fix

* windows support

* hm

* cleaner

* fix timer

* fix timing

* std is ns

* skip profiler

* mypy

* cleaner

* cleanups

* after merge

* default is back
2025-07-21 15:10:38 +03:00
nimlgen
816c01c2d4 hcq: default copy_queue_t=None (#11297) 2025-07-21 14:45:20 +03:00
qazal
6520a7fcb6 viz: factorize event stream (#11298) 2025-07-21 14:42:00 +03:00
nimlgen
9c533e5c38 hcq: cpu prereq (#11296) 2025-07-21 13:35:18 +03:00
nimlgen
e87a42e243 hcq: prepare for windows (#11293)
* hcq: prepare for windows

* comments
2025-07-21 13:08:56 +03:00
nimlgen
df3ba0a7c0 autogen: fix imports in libusb (#11294) 2025-07-21 13:04:27 +03:00
nimlgen
dd6a2d432f hcq: default timestamp metrics is ns (#11295) 2025-07-21 12:56:30 +03:00
wozeparrot
53345ef4e2 feat: make ops_disk work on block devices (#11291) 2025-07-20 14:39:50 -07:00
qazal
3002c63b1e process replay: optionally pass tinygrad import error (#11289)
* process replay: optionally pass tinygrad import error

* gate all tinygrad internals

* s/getenv/os.getenv pre import

* diff
2025-07-20 22:57:56 +03:00
chenyu
9e3a593313 minor kernel.py cleanups [pr] (#11286) 2025-07-20 10:15:31 -04:00
quortus
5f17927a87 Shorten UOp.load method (#11285) 2025-07-20 13:48:04 +03:00
chenyu
54924f9969 type remove Union and Optional [pr] (#11283)
use `|` for consistency
2025-07-19 14:05:52 -04:00
nimlgen
2f72be5055 nv_smi: init basic insmod/rmmod/reset cmds (#11282) 2025-07-19 15:43:03 +03:00
qazal
577e581943 fix typo in sqtt/readme (#11281) 2025-07-19 15:10:24 +03:00
nimlgen
188ed38315 replace from_mv with lightweight mv_address (#11280) 2025-07-19 13:50:51 +03:00
quortus
1a25e27f32 Do not produce out of spec intermediate UOp in gated LOAD/STORE folding (#11207)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-18 15:42:55 -04:00
chenyu
ec3efd2919 move upcast before reduce (#11250)
* move upcast before reduce

upcast goes to end of global+local+upcast

* r_196_32_4_24_8
2025-07-18 14:42:15 -04:00
chenyu
be2f4336e6 use onnx 1.18.0 in DSP test (#11279) 2025-07-18 14:09:23 -04:00
nimlgen
9a88bd841c hcq: refactor into peer_groups (#11277)
* hcq: refactor into peer_groups

* fix fors

* fixes

* ooops

* mypy

* tiny fixes
2025-07-18 16:34:18 +03:00
nimlgen
f432eef708 hcq: rename CPU -> KICK in graph for kickoff signal (#11278) 2025-07-18 15:54:35 +03:00
quortus
52bbd9900b [pr] Stable tensor order in _find_all_tensors_for_uops (#11276)
* Use dict for all_tensors to get stable tensor order in _find_all_tensors_for_uops

* Rerun tests
2025-07-18 13:12:01 +03:00
chenyu
c5a5d74642 Revert "image_dot of 2 half inputs returns half (#11007)" (#11274)
This reverts commit fa8e08f922.
2025-07-17 17:34:18 -04:00
Utkarsh Gill
fa8e08f922 image_dot of 2 half inputs returns half (#11007)
* cast after sum

* comment out skipif

* minor fix

* only test IMAGE

* IMAGE is supported now

* simpler

* simplerr

* only cast if dtype is None

* dont need to change base_imaeg_type

* only cast when dtype is half

* add explicit test

* actually no, workflow seems better

* actually, keep both

* move test

* fix indent

---------

Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>
2025-07-17 13:47:22 -07:00
geohotstan
536b254df4 Bump onnx to 1.18.0 (#11266)
* bump

* thou hast implement functions

* hacked in domain support

* some clean ups

* hack quantize_onnx_test too

* add helper lol, why onnx tests why

* better dispatcher, but need tests and better naming

* flaky ci

* change some names

* small clean ups

* make it easier to clean up tests once ORT supports 1.18.0

* nits

* fix bug of Softmax_1 being registered in onnx_ops

* need a default value

* resolve_const is better name

* fix OnnxRunner.to

* use proper domain names
2025-07-17 15:35:41 -04:00
qazal
1606491b1c viz: refactor to generic shape spec (#11272) 2025-07-17 20:25:15 +03:00
nimlgen
cfb229473f hcq: refactor buffer mapping (#11271)
* hcq: refactor buffer mapping

* fix

* fix mypy
2025-07-17 15:16:49 +03:00
qazal
e68af3b336 disable flaky assert in test_cpu_profile (#11270) 2025-07-17 06:50:39 +03:00
chenyu
60ffe00172 remove Kernel.first_reduce [pr] (#11269) 2025-07-16 18:30:14 -04:00
chenyu
522dc72f08 remove Kernel.local_dims [pr] (#11268)
* remove Kernel.local_dims [pr]

also not needed

* fix test_matvec
2025-07-16 17:46:19 -04:00
chenyu
d8c783f65f remove Kernel.global_dims [pr] (#11267)
all reference to global used axis_types, so we don't need number of global helper that was used to locate GLOBAL
2025-07-16 17:16:49 -04:00
uuuvn
6f0ddcc24c Remote cross-host graph (#11229) 2025-07-16 13:27:54 -07:00
nimlgen
6aa20c607d nv: graceful shutdown to cold state (#11265) 2025-07-16 19:49:35 +03:00
chenyu
59b52d49d7 remove .global_dims that are for locating GLOBAL [pr] (#11264) 2025-07-16 11:19:31 -04:00
chenyu
e6c016ddd0 move check axis < shape_len to real_axis [pr] (#11263)
ensure output of real_axis is always valid
2025-07-16 10:15:44 -04:00
quortus
924bc7c9ae Fix test_uop_spec (#11259) 2025-07-16 11:02:31 +03:00
chenyu
c8e5c4d7c3 insert_before -> insert_at [pr] (#11257)
more precise
2025-07-15 17:44:34 -04:00
wozeparrot
b32d9321fb feat: more keccak cleanup + more explicit shape (#11256) 2025-07-15 13:57:47 -07:00
chenyu
9f79079cbe update KernelInfo dims to return list of dims [pr] (#11255)
local dims are not contiguous once upcast sits between local and groupreduce
2025-07-15 15:01:39 -04:00
chenyu
629fa21b6b remove final range in heuristic [pr] (#11251)
all dims are based on AxisType now
2025-07-15 11:39:15 -04:00
chenyu
d7adc24083 remove Kernel.first_upcast [pr] (#11248)
first_reduce does not need a default now
2025-07-15 10:21:34 -04:00
nimlgen
197d345804 nv: print rpc msg with DEBUG>=3 (#11247) 2025-07-15 16:39:58 +03:00
chenyu
034e51bd36 remove first_reduce used for locate real_axis [pr] (#11245)
LOCAL goes to the last of (GLOBAL+LOCAL)+1
GROUP goes to right before first REDUCE
2025-07-15 09:19:38 -04:00
chenyu
0e2422d216 Kernel.axes_of helper [pr] (#11243)
look up dim based on AxisType
2025-07-14 22:17:43 -04:00
chenyu
968f6b2a2e remove hasattr(self, 'axis_types') checks in dims property [pr] (#11242)
no needed anymore
2025-07-14 20:59:51 -04:00
leopf
557ca7d757 testing SimpleTokenizer against OASST1 (#11214) 2025-07-14 17:09:31 -07:00
wozeparrot
5878b189b8 don't const fold shape changing bitcast (#11236) 2025-07-14 16:42:16 -07:00