tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
uuuvn	2c32126fc8	am: AMRegister refactor (#9572 )	2025-03-26 00:52:40 +07:00
chenyu	cddd750d68	add a failed test case for jit/nojit rand [pr] (#9574 ) currently adding jit produced different rand values	2025-03-25 13:32:44 -04:00
nimlgen	4cf2b68ca8	am_smi: fix init for newer versions (#9559 )	2025-03-25 23:48:05 +07:00
qazal	a6a5c0aec5	add NULL=1 backend (#9573 ) * add NULL=1 backend * NullAllocator * line * metadata should still work * it shouldn't have memory usage * Revert "it shouldn't have memory usage" This reverts commit `a9080fdd43`. * back * null flops	2025-03-25 22:20:52 +08:00
qazal	b60d9976b4	better yaxis formatting in viz memory graph (#9570 ) * better bytes format * pluralize * 1 less line	2025-03-25 16:50:22 +08:00
qazal	faf3b5b245	display kernel metadata in memory viz (#9569 ) * display kernel metadata in memory viz * fix that	2025-03-25 13:14:54 +08:00
qazal	52301fe68e	move Buffer refcount increment out of schedule.py (#9564 ) * move Buffer refcount increment out of schedule.py * add TestGC.test_assign_refcount * refcount refers to Ops.BUFFER UOps	2025-03-25 12:08:27 +08:00
qazal	262f5a2bd3	hotfix: replace link in viz/readme (#9568 )	2025-03-25 10:24:49 +08:00
chenyu	6427272bf6	minor update to rand [pr] (#9566 )	2025-03-24 18:49:50 -04:00
chenyu	b0e070e737	remove MOCKGPU workaround in rand (#9565 ) also `requires_grad_` to save a line	2025-03-24 17:49:45 -04:00
qazal	d7c754ce49	failing test for UOp buffer ref count (#9563 ) * failing test for UOp buffer ref count * lint	2025-03-25 00:10:48 +08:00
b1tg	f90001e1a6	amd llvm render (no_comgr prereq) (#9543 ) * amd llvm render * skip test_div_rounding_mode --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-24 22:50:51 +08:00
Priyank Patel	4f5e03bd60	better fix inplace detach (#9557 )	2025-03-24 22:50:28 +08:00
qazal	1c40873962	show buffer info in memory viz (#9562 )	2025-03-24 22:12:30 +08:00
qazal	efaee75656	start viz of memory usage (#9561 ) * start viz of memory usage * polygons/bars + use d3	2025-03-24 19:05:35 +08:00
qazal	1cfe6d02fe	refactor uop_to_json to return a dict [pr] (#9560 )	2025-03-24 16:38:17 +08:00
nimlgen	edf9e1bf8d	am: move out soc21 to a sep module (#9551 ) * am: soc module is not part of am * am: soc module is not part of am	2025-03-24 14:17:42 +07:00
George Hotz	74d98eafb8	add onnx frontend stub [pr] (#9558 )	2025-03-24 12:24:34 +08:00
George Hotz	de7d6cec3a	hotfix: DEBUG 5 prints the ast	2025-03-24 11:43:11 +08:00
chenyu	ba41076e94	update embedding test to not use dtypes.long [pr] (#9556 )	2025-03-23 21:33:38 -04:00
chenyu	c965f4c20b	update bert config (#9555 ) BEAM 4->5 for green, 2% faster use AMD driver instead of AM for red, 5% faster	2025-03-23 16:14:41 -04:00
chenyu	d734e24c01	minor WEBGPU_PATH cleanup [pr] (#9552 ) also mypy recognizes `sys.platform == 'win32'` but does not recognizes it if wrapped inside a helper...	2025-03-23 09:10:02 -04:00
Ahmed Harmouche	7ce7fe0574	Refactor webgpu_dawn lib finding (#9547 ) * Refactor webgpu_dawn lib finding * Fix ruff	2025-03-23 08:23:29 -04:00
uuuvn	c631c72f22	HCQ: Increment timeline signal before submitting (#9550 ) `AMDComputeQueue.__del__` frees `hw_page` which is safe because `AMDAllocator._free` does `self.dev.synchronize()` which is supposed to wait for execution of IB to finish, however that doesn't happen if AMDComputeQueue is dropped right after submit before timeline signal is incremented, which it is in most places leading to a race if .bind() is also used (required for multi-xcc because bug in mec fw treats all PACKET3_PRED_EXECs outside IBs as if they had EXEC_COUNT of zero).	2025-03-23 18:30:38 +07:00
nimlgen	d5667419af	am: move out pte creation logic (#9548 ) * am: move out pte creation logic * emu * ops	2025-03-23 18:29:10 +07:00
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
quortus	bdd44d4255	Fix DSP transcendentals (#9542 )	2025-03-22 11:08:18 +08:00
Ignacio Sica	eddafb84e5	Bugfix for `TC=3` (#9464 ) * wrong but uses less shared * for size 8 tc1 with devectorize in 0 loads into local before wmma and works * improvements over tc1 devectorize * fix tc=3 * works for handcoded tc opts * clean bugfix tc=3 * fix * revert changes	2025-03-21 16:43:42 -07:00
chenyu	6da78164f9	assert Kernel ast.op to be Ops.SINK [pr] (#9539 ) rest of the code assumes self.ast is defined anyway	2025-03-21 18:09:44 -04:00
chenyu	c33679c47b	increase size in test_multinomial_counterexample (#9540 ) should be less flaky	2025-03-21 17:46:52 -04:00
Francis Lata	1a1087e3a0	cleanups on losses and dataset tests (#9538 )	2025-03-21 17:03:18 -04:00
Francis Lata	8cbe4009fc	RetinaNet losses (#9536 ) * add sigmoid_focal_loss and l1_loss * update ref implementation comment	2025-03-21 15:52:54 -04:00
Francis Lata	e6389184c5	update comment for retinanet dataloader implementations (#9534 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-21 15:07:45 -04:00
chenyu	ee3d313b34	Revert "update ruff to 0.11.2 (#9531 )" (#9535 ) This reverts commit `d8d65e2747`.	2025-03-21 14:52:25 -04:00
chenyu	b46b8ee15e	add a flag to log when beam surpassed max limit [pr] (#9533 )	2025-03-21 13:37:02 -04:00
Francis Lata	eb95825eea	RetinaNet dataloader (#9442 ) * retinanet dataloader * remove batch_size from generate_anchors * refactor kits19 dataset tests * add tests for dataloader * fix testing setup and cleanups * remove unused import	2025-03-21 13:36:41 -04:00
b1tg	58206fa8a9	add amd llvm compiler (#9519 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-21 23:13:27 +08:00
chenyu	d8d65e2747	update ruff to 0.11.2 (#9531 ) 0.11.2 fixed the false alert from 0.11.1. also pinned the version in setup for now to prevent broken CI from ruff upgrade	2025-03-21 10:32:59 -04:00
qazal	ee3ed73ed1	add reorder_view matcher to scheduler [pr] (#9528 )	2025-03-21 17:46:20 +08:00
George Hotz	8e555c586c	switch quantization to unsigned/unsigned + add Ops.REDUCE (#9527 ) * switch quantization to unsigned/unsigned + add Ops.REDUCE * tests * nhwc + replay pkl	2025-03-21 17:02:37 +08:00
nimlgen	a35b0a88bf	am: just rename and reorder ip init funcs (#9504 )	2025-03-21 15:57:32 +08:00
nimlgen	8a131ab271	am: allow allocations as small as a page (#9523 ) * am: fix allocs * bettermsg * comment * next time	2025-03-21 15:53:32 +08:00
Sieds Lykles	3ad3ac4d1e	Change dtypes.int to dtypes.ints (#9517 )	2025-03-20 17:24:26 -04:00
chenyu	b9fab9b914	pin ruff to 0.11.0 in CI (#9520 ) 0.11.1 had a bug https://github.com/astral-sh/ruff/issues/16874 that breaks ci	2025-03-20 13:12:50 -04:00
George Hotz	3c5161b4cb	add validation of the bounds of Ops.INDEX (#9503 ) * add validation of the bounds of Ops.INDEX * do mask properly * more validation * correct * fix gated * add CAST support to vmin/vmax * fix ptx and image * ptx no diff * upat.index also stays --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-03-20 12:15:55 +08:00
qazal	0b20f91ce7	remove move_mask from the devectorizer (#9511 ) * remove move_mask from the devectorizer * add (wrong) ptx * reason * enable index addition in PTX, we won't have the INDEX anyways * space	2025-03-20 11:53:12 +08:00
qazal	9302738263	hotfix: more consistent wgsl.py spacing + cleanups [pr] (#9515 ) * hotfix: more consistent wgsl.py spacing + cleanups [pr] * free things up	2025-03-20 11:07:15 +08:00
George Hotz	68053d0510	dsp stuff / sniff ioctls from snpe (#9490 ) * sniff ioctls from snpe * dump input buffers * snpe logs from dsp * NHWC support * knum 3 * this run? * revert those --------- Co-authored-by: Comma Device <device@comma.ai>	2025-03-20 10:38:23 +08:00
qazal	2223b93338	add UPat.or_casted [pr] (#9513 )	2025-03-20 10:08:32 +08:00
qazal	1839e8c9b3	place masks in INDEX for TestGatedStoreRewrite [pr] (#9512 )	2025-03-20 09:46:53 +08:00

... 47 48 49 50 51 ...

10633 Commits