tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 13:58:00 -05:00

Author	SHA1	Message	Date
chenyu	b6662096cb	remove more first_reduce [pr] (#11239 )	2025-07-14 19:13:44 -04:00
chenyu	eb8e17ef59	remove most of the first_upcast [pr] (#11238 )	2025-07-14 16:54:24 -04:00
qazal	c78b1cbae7	viz profiler cleanups (#11234 ) * move all render calls to zoom callback * cleanup the naming * require transform arg	2025-07-14 19:06:33 +03:00
chenyu	36ce883c7d	update heuristic to use k.upcastable_dims and k.unrollable_dims [pr] (#11233 ) idea is to make it behave the same regardless of axis order and with empty 1s in shape. not quite fully remove all first_upcast yet because some conditions used already upcasted size which need a separate benchmark to remove.	2025-07-14 11:10:30 -04:00
qazal	c0c695dd89	viz: remove extra transform (#11232 )	2025-07-14 16:51:47 +03:00
chenyu	da219199f5	minor hcopt cleanup [pr] (#11231 )	2025-07-14 09:36:25 -04:00
nimlgen	756ba1a5f9	nv: support ampere in nvpci (#11230 )	2025-07-14 15:35:44 +03:00
uuuvn	b2cc6cfa1b	JIT_BATCH_SIZE is a ContextVar (#11228 )	2025-07-14 14:03:45 +03:00
nimlgen	c4a920d95c	nv: use last signature (#11227 )	2025-07-14 13:00:39 +03:00
nimlgen	a830d37881	nv: check wpr2 is inited (#11226 )	2025-07-14 11:46:14 +03:00
chenyu	0387bb9630	clean up image upcast in hcopt [pr] (#11220 ) GLOBAL+LOCAL for upcast GROUP_REDUCE+REDUCE for unroll	2025-07-13 18:06:43 -04:00
chenyu	85ddd72038	simpler grouptop in hcopt (#11219 ) * simpler grouptop in hcopt keep the only perf relevant conditions and the rest is handled by try except * update openpilot read image count	2025-07-13 16:06:09 -04:00
qazal	40847ca29c	viz: prune out of screen rects (#11217 )	2025-07-13 21:49:59 +03:00
chenyu	674dc28505	remove Kernel.full_unupcasted_shape [pr] (#11215 ) decomp to shape_len and first_upcast to get the last upcast-able dim	2025-07-13 13:56:23 -04:00
chenyu	9575cf6c6e	shave more hcopt [pr] (#11213 ) start to use AxisType for conditions	2025-07-13 12:43:58 -04:00
Alisher Zhubanyshev	4ef6b46b34	hcq: reduce launch overhead (#11193 ) * nv: improve mmio creation speed * add memoryview test * fix indents * move mv bench to `test_helpers`, remove comparison	2025-07-13 19:25:50 +03:00
nimlgen	1cc2b3f845	nv: use wait_cond (#11212 )	2025-07-13 19:25:20 +03:00
nimlgen	6cce3a5d58	generic wait_cond (#11210 ) * generic wait_cond * fix linter * fix linter	2025-07-13 16:59:21 +03:00
chenyu	e11ccf2342	update float4 condition in hcopt (#11211 ) don't need all upcast candidates to be upcast-able, only check the actual one	2025-07-13 09:51:45 -04:00
nimlgen	55c54d9745	nv: sync after gpfifo setup (#11209 )	2025-07-13 14:40:11 +03:00
chenyu	d90d837013	clean up hcopt [pr] (#11205 ) removed one condition that's always true	2025-07-12 23:10:27 -04:00
chenyu	2b48b961be	fix a few broken AMX tests (#11204 )	2025-07-12 21:42:38 -04:00
wozeparrot	667c7a9fa6	clean: keccak cleanups + explicit shapes (#11202 )	2025-07-12 18:17:14 -07:00
chenyu	a0438012af	remove Kernel.get_program [pr] (#11203 )	2025-07-12 20:50:29 -04:00
George Hotz	d67c8e7b42	local metal on metal in uop syntax (#11185 ) * local metal on metal in uop syntax * TODO: just put the axis_info in the kernelinfo * local * amd_matmul works @ 28 TFLOPS * clean up matmul * kernel8 works * remove that * locals * axistype innovation * work * cleanup * kernel3 regs * cleanup kernel3 * work * why is it broken * no beam * reenable * permutes	2025-07-12 16:31:19 -07:00
uuuvn	40da5f0c81	fix silent mypy failure in ci (#11201 ) Example: https://github.com/tinygrad/tinygrad/actions/runs/16215577171/job/45784110543?pr=11177#step:7:20 Caused by footguny exception in how `set -e` works: ```bash python -m mypy --strict-equality --lineprecision-report . && cat lineprecision.txt ``` Will fail (and have non-zero exit code if run in interactive mode) but because there is `&&` it won't count as script-terminating failure in a script with `set -e` and instead as a test (similar to how fail of a command in if condition won't count as a script-terminating failure despite having non-zero exit code)	2025-07-12 15:12:25 -04:00
chenyu	73caa5dd1b	remove Kernel.membufs [pr] (#11200 )	2025-07-12 14:48:47 -04:00
geohotstan	5ce278b245	OnnxRunner file as input (#10789 ) * file path as input and have parse be in OnnxRunner.__init__ * modelproto_to_onnxrunner -> modelproto_to_runner * whoops, fix import * oh flakiness again, is it because it's getting gc-ed? * small changes * CI flaky so just move compile4 fix in * copy typing of onnx_load * actually can just import onnx_load instead of onnx.load * fix external_benchmark_openpilot * fix onnx_runner test to use onnx_helper * rerun CI * try run_modelproto * spam CI a few times * revert run_modelproto since that's flaky also * no external onnx_load usage except onnx.py * cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why? * model_benchmark 193s -> 80s, add OnnxRunner.to()... * minimize diff and clean up * device can be None, weird but eh --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-12 14:27:46 -04:00
nimlgen	110cff3f2e	fix device arg to Tensor.randn (#11194 ) * fix device arg to Tensor.randn * simpler test * self.assertEqual	2025-07-12 13:51:59 -04:00
chenyu	6283d50224	DEPRECATED_linearize -> to_program [pr] (#11198 )	2025-07-12 13:46:20 -04:00
George Hotz	770a558585	lil cleanups from uop branch [pr] (#11197 )	2025-07-12 09:46:28 -07:00
George Hotz	5625e1904b	axis types in KernelInfo (#11196 ) * axis types in KernelInfo [pr] * simpler lowerer * fix tests	2025-07-12 09:36:20 -07:00
nimlgen	ea7f2f779c	hcq: p2p nv-amd (#11195 ) * hcq: p2p between diff devices * fix	2025-07-12 18:53:34 +03:00
qazal	6a9f059b21	viz: early convert to cpu time (#11192 )	2025-07-12 17:19:41 +03:00
chenyu	12b04efd69	remove a TODO prod(k.full_shape[k.first_upcast:]) (#11191 ) IMAGE=2 test/test_ops.py works now	2025-07-12 10:16:56 -04:00
nimlgen	6f5250d158	nv: fix typing in rpc_rm_control (#11189 )	2025-07-12 16:09:42 +03:00
qazal	c0a5490c72	viz: minor profiler cleanup (#11190 )	2025-07-12 14:18:24 +03:00
chenyu	fdcc25e392	some noop hand_coded_optimizations cleanup [pr] (#11188 )	2025-07-12 00:09:23 -04:00
chenyu	1ad852a892	break up Kernel.reshape_and_permute [pr] (#11187 )	2025-07-11 18:08:08 -04:00
uuuvn	d11b20129d	DMARef infra (#10753 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-07-11 14:09:47 -07:00
chenyu	b072be0e2d	hotfix whisper main script (#11184 )	2025-07-11 12:34:00 -04:00
qazal	0b7e9b5db7	viz: bugfix for multiple rewrites with the same name (#11182 )	2025-07-11 18:26:12 +03:00
nimlgen	f9e4c4e57a	nv: nvpci blackwell support (#11127 ) * nv: start 5090 * gsp init 5090 * mmu * works * after merge * clenaer * rwk * x * fx * finish? * fix * unrelated * fix * commenbt	2025-07-11 17:02:09 +03:00
qazal	1d85323572	viz: absolute scaling of memory graph (#11181 )	2025-07-11 16:39:11 +03:00
nimlgen	c7f6b617b4	nv: do not hardcode lv0 pd size (#11180 )	2025-07-11 16:26:18 +03:00
nimlgen	27922c986a	nv: generic mmu impl (#11179 )	2025-07-11 16:26:09 +03:00
qazal	d3ec63a5c3	viz: add base class for unittests (#11178 )	2025-07-11 13:58:03 +03:00
qazal	b791ea117d	viz: enable scrolling in profiler (#11169 ) * viz: add scrollbar to profiler * using margin fixes the layout bug * s/profiler.clientHeight/profiler.scrollHeight, it's important * closer * scrolling on the device list also works	2025-07-11 11:30:13 +03:00
chenyu	b219e47bef	remove Kernel.upcasted_axis [pr] (#11175 )	2025-07-10 23:19:21 -04:00
George Hotz	ccd382bc6f	use axis_types more [pr] (#11172 ) * use axis_types more * fix local shape * simpler clause * fix local shape	2025-07-10 15:05:13 -07:00

... 22 23 24 25 26 ...

10633 Commits