tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-11 23:25:04 -05:00

Author	SHA1	Message	Date
quortus	bdd44d4255	Fix DSP transcendentals (#9542 )	2025-03-22 11:08:18 +08:00
Ignacio Sica	eddafb84e5	Bugfix for `TC=3` (#9464 ) * wrong but uses less shared * for size 8 tc1 with devectorize in 0 loads into local before wmma and works * improvements over tc1 devectorize * fix tc=3 * works for handcoded tc opts * clean bugfix tc=3 * fix * revert changes	2025-03-21 16:43:42 -07:00
chenyu	6da78164f9	assert Kernel ast.op to be Ops.SINK [pr] (#9539 ) rest of the code assumes self.ast is defined anyway	2025-03-21 18:09:44 -04:00
chenyu	c33679c47b	increase size in test_multinomial_counterexample (#9540 ) should be less flaky	2025-03-21 17:46:52 -04:00
Francis Lata	1a1087e3a0	cleanups on losses and dataset tests (#9538 )	2025-03-21 17:03:18 -04:00
Francis Lata	8cbe4009fc	RetinaNet losses (#9536 ) * add sigmoid_focal_loss and l1_loss * update ref implementation comment	2025-03-21 15:52:54 -04:00
Francis Lata	e6389184c5	update comment for retinanet dataloader implementations (#9534 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-21 15:07:45 -04:00
chenyu	ee3d313b34	Revert "update ruff to 0.11.2 (#9531 )" (#9535 ) This reverts commit `d8d65e2747`.	2025-03-21 14:52:25 -04:00
chenyu	b46b8ee15e	add a flag to log when beam surpassed max limit [pr] (#9533 )	2025-03-21 13:37:02 -04:00
Francis Lata	eb95825eea	RetinaNet dataloader (#9442 ) * retinanet dataloader * remove batch_size from generate_anchors * refactor kits19 dataset tests * add tests for dataloader * fix testing setup and cleanups * remove unused import	2025-03-21 13:36:41 -04:00
b1tg	58206fa8a9	add amd llvm compiler (#9519 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-21 23:13:27 +08:00
chenyu	d8d65e2747	update ruff to 0.11.2 (#9531 ) 0.11.2 fixed the false alert from 0.11.1. also pinned the version in setup for now to prevent broken CI from ruff upgrade	2025-03-21 10:32:59 -04:00
qazal	ee3ed73ed1	add reorder_view matcher to scheduler [pr] (#9528 )	2025-03-21 17:46:20 +08:00
George Hotz	8e555c586c	switch quantization to unsigned/unsigned + add Ops.REDUCE (#9527 ) * switch quantization to unsigned/unsigned + add Ops.REDUCE * tests * nhwc + replay pkl	2025-03-21 17:02:37 +08:00
nimlgen	a35b0a88bf	am: just rename and reorder ip init funcs (#9504 )	2025-03-21 15:57:32 +08:00
nimlgen	8a131ab271	am: allow allocations as small as a page (#9523 ) * am: fix allocs * bettermsg * comment * next time	2025-03-21 15:53:32 +08:00
Sieds Lykles	3ad3ac4d1e	Change dtypes.int to dtypes.ints (#9517 )	2025-03-20 17:24:26 -04:00
chenyu	b9fab9b914	pin ruff to 0.11.0 in CI (#9520 ) 0.11.1 had a bug https://github.com/astral-sh/ruff/issues/16874 that breaks ci	2025-03-20 13:12:50 -04:00
George Hotz	3c5161b4cb	add validation of the bounds of Ops.INDEX (#9503 ) * add validation of the bounds of Ops.INDEX * do mask properly * more validation * correct * fix gated * add CAST support to vmin/vmax * fix ptx and image * ptx no diff * upat.index also stays --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-03-20 12:15:55 +08:00
qazal	0b20f91ce7	remove move_mask from the devectorizer (#9511 ) * remove move_mask from the devectorizer * add (wrong) ptx * reason * enable index addition in PTX, we won't have the INDEX anyways * space	2025-03-20 11:53:12 +08:00
qazal	9302738263	hotfix: more consistent wgsl.py spacing + cleanups [pr] (#9515 ) * hotfix: more consistent wgsl.py spacing + cleanups [pr] * free things up	2025-03-20 11:07:15 +08:00
George Hotz	68053d0510	dsp stuff / sniff ioctls from snpe (#9490 ) * sniff ioctls from snpe * dump input buffers * snpe logs from dsp * NHWC support * knum 3 * this run? * revert those --------- Co-authored-by: Comma Device <device@comma.ai>	2025-03-20 10:38:23 +08:00
qazal	2223b93338	add UPat.or_casted [pr] (#9513 )	2025-03-20 10:08:32 +08:00
qazal	1839e8c9b3	place masks in INDEX for TestGatedStoreRewrite [pr] (#9512 )	2025-03-20 09:46:53 +08:00
b1tg	bd731a8624	AMDCompiler refactor (no_comgr prereq) (#9497 ) * add amdgpu_disassemble to helpers * refactor hip compiler --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-20 09:44:07 +08:00
geohotstan	8c0d0a122c	Add return_indices to max_pool (#9506 ) * wow argmax is so good * 1 less line * clean up and better variable names * is this torch thing right...? * add more tests * slap a TODO on it * clean ups * prettier looking code and fix ceil mode test * add return types and some docs * ok that was a bad example since indices == value, just no example	2025-03-19 15:25:37 -04:00
chenyu	189f62d44f	add rounding to tqdm unit scale (#9507 ) fixed `AssertionError: ' 1.00/10.0 1000it/s]' != ' 1.00/10.0 1.00kit/s]'`	2025-03-19 12:08:46 -04:00
nimlgen	a5c971ff3a	am: prereqs for rdna4 1/n (#9495 ) * am: ip_ver rename for acc * am: refactor this * fix version * ugh	2025-03-19 17:14:57 +08:00
Francis Lam	1e5d9ad8f7	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 ) * extra/gemm/max_matmul: start of custom kernels for GEMM * add an unoptimized FP16/FP16 MMA example * add slow 3-stage fp16 acc example * add correct 3-stage pipeline with unswizzled/flat smem input (slow) * add acc fp16 example with 3 stages and swizzle (no bank conflicts) * add max version of NV fp16_fp16_fp16 * fix up comments and removed unused code in max variations * add start of no_xor example * fix to account for UOps to Ops	2025-03-19 15:04:57 +08:00
George Hotz	865f23dd7b	olmoe memory usage cleanups	2025-03-19 12:28:18 +08:00
b1tg	2c87a22cf2	fix prg size calculation when there are adjacent mapped ranges (#9498 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-19 11:55:03 +08:00
b1tg	1d71436e6a	use libllvm19 in ci (#9494 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-19 11:53:32 +08:00
b1tg	a95b489a55	nanoGPT train works with tiny torch backend (#9283 ) * train_shakespeare_char.py works * move aten.where.self_out to tiny_backend_out * fix memory leak * corealize in the backward_hook * Update backend.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-19 11:51:02 +08:00
chenyu	f8976dd2eb	enable more webgpu tests (#9502 ) OSX has larger buffer number limit, and it supports fp16 now	2025-03-18 23:03:54 -04:00
qazal	ae688e4103	simple failing test for scheduling parallel reduce [pr] (#9501 ) * simple failing test for scheduling parallel reduce [pr] * atol	2025-03-19 10:52:13 +08:00
leopf	e4dad99145	nn.state docs cleanup (#8332 ) * doc cleanup * extension cleanup * manual definition * bring back accept_filename for gguf_load --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-18 17:16:40 -04:00
chenyu	1ea4876dfa	olmoe touchups (#9499 ) GlobalCounters.reset() and only validate if temperature is 0	2025-03-18 15:25:45 -04:00
geohotstan	f7506c6c25	JIT OLMoE (#9396 ) * jit the forward * might timeout, idk just send it * this is dumb * naive bitonic lol * idk if this is correct, but that squeeze before is definitly not * vectorized bitonic sort, but still slow * yay 1 layer is correct * alright its pretty good * good enough * rerun CI * nit improve comment	2025-03-18 14:49:02 -04:00
Ignacio Sica	5c56cac0a0	MI300 mfma support (#9417 ) * add f16/f32 mfma support for MI300 - add 16x16 mfma shape support for f16 with f32 acc - add ops_python mfma emulation - add arch to AMDRenderer * minor cleanup * minor cleanup * add mfma emulation task to ci * add back todo * hotfix: comment * add tc=3 job to ci	2025-03-18 14:33:30 -03:00
hooved	5500887eed	improve reproducibility of WebGPU CI puppeteer test (#9496 ) * try to make CI test fail with slow JS import * prevent race between model import and reference * revert artificial delay in JS module import	2025-03-18 09:27:38 -04:00
qazal	cde4fd3be3	do not view_left assign + elementwise sources always have a shape [pr] (#9491 )	2025-03-18 17:42:51 +08:00
George Hotz	117b7a16ef	VALIDATE_WITH_CPU [pr] (#9488 ) * VALIDATE_WITH_CPU [pr] * fix test	2025-03-18 15:15:04 +08:00
qazal	935cd01f56	simple failing test for graph_rewrite children [pr] (#9489 ) * simple failing test for graph_rewrite children [pr] * lint * update too	2025-03-18 13:07:21 +08:00
George Hotz	d20494e6d7	move buffer logic to Buffer [pr] (#9487 ) * move buffer logic to Buffer [pr] * pass shape into as_typed_buffer * pass shape into as_typed_buffer * work * cleaner * fix tests	2025-03-18 11:21:21 +08:00
qazal	3be228182f	unbind Tensor variables last [pr] (#9486 ) * reorder do_realize [pr] * move merge_views * unbind all variables at the end [pr]	2025-03-18 09:52:01 +08:00
qazal	b44f9c409a	reorder do_realize [pr] (#9485 ) * reorder do_realize [pr] * move merge_views	2025-03-18 09:30:10 +08:00
nimlgen	a82c9332d3	am: rename soc21 to soc (#9482 )	2025-03-18 08:54:26 +08:00
qazal	b100fc0b20	split the rule that uses context in scheduler simplifier [pr] (#9484 ) * split the rule that uses context in scheduler simplifier [pr] * add	2025-03-18 08:12:26 +08:00
Anish Umale	5e58f4b65b	Tiny backend test_ops fix part 3 (#9483 ) * extract straightforward things from https://github.com/tinygrad/tinygrad/pull/9302 * pass dtype and device for ones_like	2025-03-17 18:01:51 -04:00
TJ	9fcef4d009	add masked_select to tensor.py (#9468 ) * add masked_select to tensor.py * fix tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-17 16:05:36 -04:00

1 2 3 4 5 ...

8207 Commits