tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
wozeparrot	222bb12ddf	tk softmax (#13205 )	2025-11-11 15:13:16 -08:00
wozeparrot	787f0070ed	feat: don't use output reg as local reduce reg (#13203 )	2025-11-11 14:35:16 -08:00
chenyu	ece1415def	clean up image_dot and image_conv2d (#13222 ) * clean up image_dot and image_conv2d * those are fine * interesting	2025-11-11 15:53:03 -05:00
nimlgen	2f0ea29b34	qcom: 48bit timestamps (#13214 ) * qcom: 48bit timestamps * f * lol * fix	2025-11-12 04:14:33 +08:00
qazal	bc55bc4849	cleanup test_viz profiler tests (#13221 )	2025-11-12 03:46:48 +08:00
chenyu	23b90945c3	add a benchmark for openpilot vision with DEBUG=2 (#13219 ) see per kernel speed, also disable the jobs for 0.9.9	2025-11-11 14:41:52 -05:00
George Hotz	c2075f3613	gc disable during big rewrites (#13215 ) * gc disable during big rewrites * cleaner with helper	2025-11-11 10:30:47 -08:00
Roelof van Dijk	e59313da08	migrate pytest and ruff (#13216 )	2025-11-11 13:27:51 -05:00
Gaétan Lepage	6fd7ce3832	migrate to pyproject.toml (#13189 ) * migrate to pyproject.toml * move mypy config to pyproject.toml	2025-11-11 09:09:27 -08:00
qazal	8002921a04	viz: improve the program run tooltip (#13212 ) * add tflops to tooltip format * show if the run was batched	2025-11-12 00:56:03 +08:00
qazal	f91e366a17	viz: display the graph layout recursion error (#13194 ) * viz: display the graph layout recursion error * share styles * +min-width * same thing * inline the append	2025-11-11 15:25:12 +08:00
wozeparrot	73497af4c0	clean: use np for allclose (#13204 )	2025-11-10 23:02:43 -08:00
George Hotz	a6360fd94d	store can have shape (#13202 ) * store can have shape * _shape	2025-11-10 22:16:47 -08:00
b1tg	f3692b7406	clean up hip renderer (#13063 ) * clean up hip renderer * ocml --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-11-11 00:44:24 -05:00
chenyu	22b8579234	one last regressed dm kernel (#13201 )	2025-11-10 23:30:52 -05:00
chenyu	58b7e4fab3	GROUPTOP heuristic on more axes (#13206 ) fixed dm speed	2025-11-10 23:30:37 -05:00
chenyu	829cdafccc	update openpilot slow conv uop ast (#13197 ) the two remaining slow ones	2025-11-10 17:03:20 -05:00
George Hotz	0c978d45e6	stub attention (#13196 ) * stub attention * name the kernels	2025-11-10 13:48:38 -08:00
chenyu	58c30fc7ce	minor image_conv2d cleanup (#13193 )	2025-11-10 16:05:40 -05:00
chenyu	60e55d9a2d	line count 18500 (#13191 )	2025-11-10 13:52:13 -05:00
nimlgen	09a59c2203	qcom: support new chip versioning (#13185 ) * qcom: support new chip versioning * ops * nit * fix * f	2025-11-10 23:57:29 +08:00
qazal	50934050bc	sqtt: append all wave execs (#13190 )	2025-11-10 23:50:08 +08:00
qazal	38a24731a1	cleanup sqtt tooling (#13188 ) * cleanup viz/serve.py * use latest profile in rgptool.py * unwrap nullable in roc.py, fix disasms typing	2025-11-10 20:52:57 +08:00
qazal	845a24dcc6	viz: group sqtt waves by program (#13187 ) * viz: group sqtt waves by program * color the names	2025-11-10 19:25:23 +08:00
George Hotz	fd6803000e	mutmut cfg (#13184 ) * mutmut cfg * coveragerc	2025-11-09 23:29:29 -08:00
wozeparrot	6252831ceb	feat: initial tk library (#13160 )	2025-11-09 22:54:29 -08:00
George Hotz	925231aec1	repeat does less reshape for 1s (#13183 )	2025-11-09 19:43:02 -08:00
George Hotz	d7369de048	hotfix: update weekly commits table	2025-11-09 19:37:06 -08:00
chenyu	6c48c87e51	improved ASSERT_MIN_STEP_TIME (#13182 ) * improved ASSERT_MIN_STEP_TIME getting close, current time +1ms then round up * relax	2025-11-09 16:41:12 -05:00
nimlgen	17715688c7	system: validate vendor for APLPCIIfaceBase (#13181 )	2025-11-10 02:49:21 +08:00
nimlgen	614783693e	nv: remove hardcoded expansion_rom_off (#13180 ) * nv: remove hardcoded expansion_rom_off * to max size	2025-11-09 21:43:19 +08:00
chenyu	e1d46de8f8	update GROUPTOP heuristic more (#13178 ) reverts #13176	2025-11-09 02:31:12 -05:00
chenyu	41e45c20ff	minor stuff reading the printed code [pr] (#13177 )	2025-11-09 00:58:51 -05:00
chenyu	8e868dced8	only GROUPTOP one reduce kernel (#13176 ) * only GROUPTOP one reduce kernel * ALLOWED_GATED_READ_IMAGE=148	2025-11-08 22:38:44 -05:00
chenyu	834067d91f	move onnx import in compile3 (#13172 ) only used in test_vs_onnx	2025-11-08 09:44:34 -08:00
nimlgen	7f3240dbfe	nv: cleanup alloc (#13170 ) * nv: cleanup alloc * okay okay	2025-11-09 00:14:46 +08:00
qazal	7250fc0354	viz: double click on kernel run goes to codegen (#13147 )	2025-11-08 23:40:50 +08:00
qazal	8a7fa9e7b4	sqtt: show total cycles of kernel in viz (#13169 )	2025-11-08 21:00:40 +08:00
chenyu	2ba8b4946f	external_benchmark_op_cat.py (#13168 ) * external_benchmark_op_cat.py cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS * fix	2025-11-08 01:54:10 -05:00
chenyu	a62496cb3d	clean up get_grouped_dims [pr] (#13159 )	2025-11-08 01:53:54 -05:00
wozeparrot	eb0192b0bb	feat: print ranges that aren't ended (#13167 )	2025-11-07 22:01:29 -08:00
George Hotz	b41541bc44	bounty: Remove Tensor._pool alternative implementation and verify kernels remain the same (#13164 )	2025-11-07 16:59:48 -08:00
George Hotz	ffb9e8396f	fix indexing bug with convs * minimal difference for ONE_POOL=1 * fix indexing bug * improve indexing debugger * more debugger improvements * always for reshape	2025-11-07 16:45:19 -08:00
chenyu	6a509da7f3	Scheduler.reduceops helper [pr] (#13162 )	2025-11-07 18:59:46 -05:00
George Hotz	2413311289	make _pool simpler (#13161 ) * make _pool simpler * just syntax * more correct and smaller * try this now * Revert "try this now" This reverts commit `607cdc2164`. * ONE_POOL	2025-11-07 15:58:44 -08:00
George Hotz	70054cdb14	move backward cast to broadcasted, expand to mixins (#13156 ) * shrink_to mixin * move backward cast into _broadcasted * expand to movement mixin * move a few more * fix spec issue	2025-11-07 15:07:47 -08:00
George Hotz	f2519ea0ba	shrink_to mixin (#13155 )	2025-11-07 11:46:24 -08:00
C T	0f9d7f650d	whisper: fix oob, explicit dtype (#13144 ) * fix dtype depending on numpy version numpy v2 np.array returns int64 which Tensor passed through for the first decode call, swallowing the <\|notimestamps\|> token and corrupting the sequence * fix whisper OOB global limit on whisper's context length * enforce whisper max_tokens_to_sample (match openai) local limit on max tokens decoded	2025-11-07 12:55:01 -05:00
Ahmed Harmouche	3ecff3a8da	Fix dim splitting bug for len(dim) == len(limited) case (#13142 ) * Fix gpudims bug on webgpu * Fix split dim bug * Remove webgpu_bug from examples * Add test for shape correctness * Fix 3D indexing --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-11-07 12:31:06 -05:00
nimlgen	b8e48effcb	device: no compilers message with reasons (#13146 ) * device: no compilers message with reasons * typings * mypy	2025-11-07 23:01:45 +08:00

1 2 3 4 5 ...

11094 Commits