tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 05:48:08 -05:00

Author	SHA1	Message	Date
nimlgen	2f72be5055	nv_smi: init basic insmod/rmmod/reset cmds (#11282 )	2025-07-19 15:43:03 +03:00
qazal	577e581943	fix typo in sqtt/readme (#11281 )	2025-07-19 15:10:24 +03:00
nimlgen	188ed38315	replace from_mv with lightweight mv_address (#11280 )	2025-07-19 13:50:51 +03:00
quortus	1a25e27f32	Do not produce out of spec intermediate UOp in gated LOAD/STORE folding (#11207 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-18 15:42:55 -04:00
chenyu	ec3efd2919	move upcast before reduce (#11250 ) * move upcast before reduce upcast goes to end of global+local+upcast * r_196_32_4_24_8	2025-07-18 14:42:15 -04:00
chenyu	be2f4336e6	use onnx 1.18.0 in DSP test (#11279 )	2025-07-18 14:09:23 -04:00
nimlgen	9a88bd841c	hcq: refactor into peer_groups (#11277 ) * hcq: refactor into peer_groups * fix fors * fixes * ooops * mypy * tiny fixes	2025-07-18 16:34:18 +03:00
nimlgen	f432eef708	hcq: rename CPU -> KICK in graph for kickoff signal (#11278 )	2025-07-18 15:54:35 +03:00
quortus	52bbd9900b	[pr] Stable tensor order in _find_all_tensors_for_uops (#11276 ) * Use dict for all_tensors to get stable tensor order in _find_all_tensors_for_uops * Rerun tests	2025-07-18 13:12:01 +03:00
chenyu	c5a5d74642	Revert "image_dot of 2 half inputs returns half (#11007 )" (#11274 ) This reverts commit `fa8e08f922`.	2025-07-17 17:34:18 -04:00
Utkarsh Gill	fa8e08f922	image_dot of 2 half inputs returns half (#11007 ) * cast after sum * comment out skipif * minor fix * only test IMAGE * IMAGE is supported now * simpler * simplerr * only cast if dtype is None * dont need to change base_imaeg_type * only cast when dtype is half * add explicit test * actually no, workflow seems better * actually, keep both * move test * fix indent --------- Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>	2025-07-17 13:47:22 -07:00
geohotstan	536b254df4	Bump onnx to 1.18.0 (#11266 ) * bump * thou hast implement functions * hacked in domain support * some clean ups * hack quantize_onnx_test too * add helper lol, why onnx tests why * better dispatcher, but need tests and better naming * flaky ci * change some names * small clean ups * make it easier to clean up tests once ORT supports 1.18.0 * nits * fix bug of Softmax_1 being registered in onnx_ops * need a default value * resolve_const is better name * fix OnnxRunner.to * use proper domain names	2025-07-17 15:35:41 -04:00
qazal	1606491b1c	viz: refactor to generic shape spec (#11272 )	2025-07-17 20:25:15 +03:00
nimlgen	cfb229473f	hcq: refactor buffer mapping (#11271 ) * hcq: refactor buffer mapping * fix * fix mypy	2025-07-17 15:16:49 +03:00
qazal	e68af3b336	disable flaky assert in test_cpu_profile (#11270 )	2025-07-17 06:50:39 +03:00
chenyu	60ffe00172	remove Kernel.first_reduce [pr] (#11269 )	2025-07-16 18:30:14 -04:00
chenyu	522dc72f08	remove Kernel.local_dims [pr] (#11268 ) * remove Kernel.local_dims [pr] also not needed * fix test_matvec	2025-07-16 17:46:19 -04:00
chenyu	d8c783f65f	remove Kernel.global_dims [pr] (#11267 ) all reference to global used axis_types, so we don't need number of global helper that was used to locate GLOBAL	2025-07-16 17:16:49 -04:00
uuuvn	6f0ddcc24c	Remote cross-host graph (#11229 )	2025-07-16 13:27:54 -07:00
nimlgen	6aa20c607d	nv: graceful shutdown to cold state (#11265 )	2025-07-16 19:49:35 +03:00
chenyu	59b52d49d7	remove .global_dims that are for locating GLOBAL [pr] (#11264 )	2025-07-16 11:19:31 -04:00
chenyu	e6c016ddd0	move check axis < shape_len to real_axis [pr] (#11263 ) ensure output of real_axis is always valid	2025-07-16 10:15:44 -04:00
quortus	924bc7c9ae	Fix test_uop_spec (#11259 )	2025-07-16 11:02:31 +03:00
chenyu	c8e5c4d7c3	insert_before -> insert_at [pr] (#11257 ) more precise	2025-07-15 17:44:34 -04:00
wozeparrot	b32d9321fb	feat: more keccak cleanup + more explicit shape (#11256 )	2025-07-15 13:57:47 -07:00
chenyu	9f79079cbe	update KernelInfo dims to return list of dims [pr] (#11255 ) local dims are not contiguous once upcast sits between local and groupreduce	2025-07-15 15:01:39 -04:00
chenyu	629fa21b6b	remove final range in heuristic [pr] (#11251 ) all dims are based on AxisType now	2025-07-15 11:39:15 -04:00
chenyu	d7adc24083	remove Kernel.first_upcast [pr] (#11248 ) first_reduce does not need a default now	2025-07-15 10:21:34 -04:00
nimlgen	197d345804	nv: print rpc msg with DEBUG>=3 (#11247 )	2025-07-15 16:39:58 +03:00
chenyu	034e51bd36	remove first_reduce used for locate real_axis [pr] (#11245 ) LOCAL goes to the last of (GLOBAL+LOCAL)+1 GROUP goes to right before first REDUCE	2025-07-15 09:19:38 -04:00
chenyu	0e2422d216	Kernel.axes_of helper [pr] (#11243 ) look up dim based on AxisType	2025-07-14 22:17:43 -04:00
chenyu	968f6b2a2e	remove hasattr(self, 'axis_types') checks in dims property [pr] (#11242 ) no needed anymore	2025-07-14 20:59:51 -04:00
leopf	557ca7d757	testing SimpleTokenizer against OASST1 (#11214 )	2025-07-14 17:09:31 -07:00
wozeparrot	5878b189b8	don't const fold shape changing bitcast (#11236 )	2025-07-14 16:42:16 -07:00
chenyu	b6662096cb	remove more first_reduce [pr] (#11239 )	2025-07-14 19:13:44 -04:00
chenyu	eb8e17ef59	remove most of the first_upcast [pr] (#11238 )	2025-07-14 16:54:24 -04:00
qazal	c78b1cbae7	viz profiler cleanups (#11234 ) * move all render calls to zoom callback * cleanup the naming * require transform arg	2025-07-14 19:06:33 +03:00
chenyu	36ce883c7d	update heuristic to use k.upcastable_dims and k.unrollable_dims [pr] (#11233 ) idea is to make it behave the same regardless of axis order and with empty 1s in shape. not quite fully remove all first_upcast yet because some conditions used already upcasted size which need a separate benchmark to remove.	2025-07-14 11:10:30 -04:00
qazal	c0c695dd89	viz: remove extra transform (#11232 )	2025-07-14 16:51:47 +03:00
chenyu	da219199f5	minor hcopt cleanup [pr] (#11231 )	2025-07-14 09:36:25 -04:00
nimlgen	756ba1a5f9	nv: support ampere in nvpci (#11230 )	2025-07-14 15:35:44 +03:00
uuuvn	b2cc6cfa1b	JIT_BATCH_SIZE is a ContextVar (#11228 )	2025-07-14 14:03:45 +03:00
nimlgen	c4a920d95c	nv: use last signature (#11227 )	2025-07-14 13:00:39 +03:00
nimlgen	a830d37881	nv: check wpr2 is inited (#11226 )	2025-07-14 11:46:14 +03:00
chenyu	0387bb9630	clean up image upcast in hcopt [pr] (#11220 ) GLOBAL+LOCAL for upcast GROUP_REDUCE+REDUCE for unroll	2025-07-13 18:06:43 -04:00
chenyu	85ddd72038	simpler grouptop in hcopt (#11219 ) * simpler grouptop in hcopt keep the only perf relevant conditions and the rest is handled by try except * update openpilot read image count	2025-07-13 16:06:09 -04:00
qazal	40847ca29c	viz: prune out of screen rects (#11217 )	2025-07-13 21:49:59 +03:00
chenyu	674dc28505	remove Kernel.full_unupcasted_shape [pr] (#11215 ) decomp to shape_len and first_upcast to get the last upcast-able dim	2025-07-13 13:56:23 -04:00
chenyu	9575cf6c6e	shave more hcopt [pr] (#11213 ) start to use AxisType for conditions	2025-07-13 12:43:58 -04:00
Alisher Zhubanyshev	4ef6b46b34	hcq: reduce launch overhead (#11193 ) * nv: improve mmio creation speed * add memoryview test * fix indents * move mv bench to `test_helpers`, remove comparison	2025-07-13 19:25:50 +03:00

... 17 18 19 20 21 ...

10417 Commits