tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 22:08:08 -05:00

Author	SHA1	Message	Date
qazal	e3d024afa0	viz: split into scale, shapes, axes last (#11018 ) * viz: split into scale, shapes, axes last * set zoom on render	2025-06-28 19:10:58 +03:00
qazal	508bc68078	viz: small fixups from memory graph (#11017 ) * don't need div.id * tooltip z-index	2025-06-28 16:34:14 +03:00
qazal	fc3e509822	viz: new canvas on first render (#11016 )	2025-06-28 16:04:51 +03:00
chenyu	c14c9a8eff	llama3 grad clip (#11003 )	2025-06-27 19:14:12 -04:00
nimlgen	e53673a0b2	amd: sdma queue overrun fix (#11012 ) * amd: sdma queue overrun fix * add () * fix * bug * this is correct	2025-06-28 01:42:03 +03:00
chenyu	f2548afeb5	bert grad clipping start with const 0 (#11008 ) saved the init kernels	2025-06-27 18:02:23 -04:00
chenyu	a6485d00c8	very tiny generate_dataset (#11013 ) one minute to gen on my mac	2025-06-27 17:10:45 -04:00
qazal	382fa6a325	viz: support axis colors in UOp nodes (#11009 ) * work * javascript * optional defaultColor * fine	2025-06-27 23:02:55 +03:00
qazal	44257f25e4	bump line count to 14600 (#11010 )	2025-06-27 22:48:14 +03:00
George Hotz	be53ef4f0a	rename DEFINE_ACC -> DEFINE_REG (#11006 ) * rename DEFINE_ACC -> DEFINE_REG * add CMPEQ to groupops	2025-06-27 11:09:25 -07:00
George Hotz	05c35d0db8	reorder ops and add comments (#11005 )	2025-06-27 10:52:14 -07:00
George Hotz	5a1911b7c4	apply the global dims late (#11002 ) * apply the global dims late [pr] * late gpudims * tests passing * remove the random local_dims inc * simpler	2025-06-27 09:54:34 -07:00
qazal	4ef10c57f9	remove unused test helper (#10999 )	2025-06-27 13:48:48 +03:00
qazal	a39343e39f	viz: move timeline layout to python (#10998 ) * viz: move timeline layout to python * DevEvent has a device and a name	2025-06-27 13:06:00 +03:00
George Hotz	b4eb876d5a	kernel.py no longer permutes reduce axis [pr] (#10968 ) * kernel.py no longer permutes reduce axis [pr] * delete tests that handcode uops * regen of sops is broken... * put import back * just remove that * disable those tests	2025-06-26 17:44:58 -07:00
chenyu	6ab5a5cb6c	llama3 mlperf train (#10983 ) work in progress. now it can overfit small examples and vram roughly matches	2025-06-26 20:24:27 -04:00
George Hotz	856759c79c	add halide example (#10980 ) * add halide example * upd halide gemm * partial works * touchups	2025-06-26 16:14:57 -07:00
qazal	1127302c46	move perfetto to extra (#10994 ) * move perfetto to extra * update TestViz and fix tests * remove perfetto.html from viz directory * work * mypy	2025-06-27 01:53:54 +03:00
qazal	712980e167	fix extract_dataset + add tests to CI (#10995 ) * fix extract_dataset + tests * add CI * sops.gz itself is same as master * yml + gzip -c + ge * don't commit that * bump limit to 1000 * axis=7 * test_tiny	2025-06-27 01:51:36 +03:00
chenyu	4572e65f0f	remove duplicated move_early logic in UOp.r [pr] (#10993 )	2025-06-26 18:33:54 -04:00
Ignacio Sica	579194f523	remove some linearize calls from tests 2 [pr] (#10992 ) * refactor count_float4 to take uops as input instead of kernel * remove some calls to linearize in test_linearizer * remove some more calls * remove one more call	2025-06-26 18:22:27 -03:00
geohotstan	50936b4a18	ONNX real float16 (#10694 ) * squash commits * temp fix for const tensor * actually realizing float16 can only happen in raw_data * .float -> cast(float) to rerun CI --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-26 14:05:12 -04:00
qazal	73484b0803	viz: generic shape tooltip/click handlers + renames (#10990 ) * viz: generic tooltip * assign kernel * labelParts/label * rect with a fillColor * line	2025-06-26 19:14:04 +03:00
qazal	7f79c1388f	viz: update y offset calculation (#10987 ) * viz: update y offset calculation * don't rescale padding	2025-06-26 12:05:20 +03:00
chenyu	49bba2f0a0	improve test_nll_loss (#10986 ) build target and weight tensors outside so it tests backward too.	2025-06-26 02:46:55 -04:00
chenyu	0612acfc70	improve Tensor.cross_entropy (#10985 ) separate when Y is prob vs indices and check shapes for indices. also fix higher dim cases	2025-06-26 01:39:48 -04:00
chenyu	8751d47985	CosineAnnealingLRWithWarmup (#10981 )	2025-06-25 17:45:21 -04:00
Ignacio Sica	21f1c4cc09	remove some linearize calls from tests [pr] (#10978 ) * remove some linearize calls from tests speed_compare_cuda_ptx test_uop_spec test_linearizer test_uops test_winograd * more clear assert message	2025-06-25 12:37:17 -07:00
chenyu	efad567ebd	ruff check whole `examples/mlperf/` (#10979 )	2025-06-25 12:57:48 -04:00
Sieds Lykles	15e60caf09	add Ops.EQ (#10976 )	2025-06-25 11:25:10 -04:00
Ignacio Sica	98d2cde293	revert tc_group feature (#10971 )	2025-06-24 20:58:13 -07:00
George Hotz	306dbc76f6	early view simplify (#10974 ) * shape const if it has a device [pr] * early view simplify	2025-06-24 20:52:45 -07:00
b1tg	77fff73295	fix viz vscode link on windows (#10972 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-06-25 06:47:59 +03:00
George Hotz	9d995c2a4d	shape const if it has a device [pr] (#10969 )	2025-06-24 16:22:54 -07:00
George Hotz	cf60ccac6a	support new const lowering (#10967 ) * support new const lowering * delete invalid linearizer failure tests	2025-06-24 15:21:41 -07:00
George Hotz	8a65720528	hotfix: disable test_tensor_core_opts_group test on real metal	2025-06-24 15:21:33 -07:00
nimlgen	1c45b9f7fb	start nvpci (#10521 ) * start nvpci * talk to fsp * boot args * riscv core bootted * q * agen * got gsp init msg * some fixes * set registry, stuck aft lockdown( * start ga/ad port * gsp init on ada * more classes allocated * more * mm * fixes and progress * no huge pages for now * mm seems workin, but switch to 512mb page for simplicity * working state * not cleaned * claned * nvd=1 * start gr ctx * compute * clean 1 * cleanup 2 * cleanup 3 * cleaner 4 * cleaner 6 * add iface to nv * save before reboot * merged into NV * moveout mm * post merge * cleaner 7 * merge and rebase * pciiface abstraction + reset * download fw from web * print logs * minor changes + p2p * cleaner 8 * cleaner 9 * cleaner 10 * delete * delete this as well * linter 1 * oops * priv_client -> priv_root * fix mypy * mypy? * mypy? * small changes * shorter * ops * remove this * do not allocate paddr for reserve * nodiff * unified script * ops * dif ver * add lock * setup	2025-06-25 00:37:34 +03:00
uuuvn	c8d0f68763	Weaker renderer validation in remote (#10964 ) ``` training bert training on ['REMOTE:0', 'REMOTE:1', 'REMOTE:2', 'REMOTE:3', 'REMOTE:4', 'REMOTE:5'] Traceback (most recent call last): File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 1300, in <module> with Profiling(enabled=getenv("PYPROFILE")): globals()[nm]() ^^^^^^^^^^^^^^^ File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 975, in train_bert for x in GPUS: Device[x] ~~~~~~^^^ File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 22, in __getitem__ def __getitem__(self, ix:str) -> Compiled: return self.__get_canonicalized_item(self.canonicalize(ix)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 28, in __get_canonicalized_item ret = [cls for cname, cls in inspect.getmembers(importlib.import_module(f'{base}.runtime.ops_{x}')) \ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/uuuvn/src/tinygrad/tinygrad/runtime/ops_remote.py", line 417, in __init__ if not renderer[0].startswith("tinygrad.renderer.") or not renderer[1].endswith("Renderer"): raise RuntimeError(f"bad renderer {renderer}") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: bad renderer ('tinygrad.runtime.ops_null', 'NullRenderer', ()) ```	2025-06-24 14:15:09 -07:00
George Hotz	c2f5f0f198	more robust reduce_gradient (#10965 )	2025-06-24 14:09:33 -07:00
George Hotz	8743ca40e2	force reduce to be in axis order (#10837 ) * force reduce to be in axis order * disable rule causing loop * disable that rule * no ra there * only move non reduce * fix tests	2025-06-24 13:00:16 -07:00
chenyu	ffb032e31d	test_diagonal touchup (#10962 )	2025-06-24 15:51:19 -04:00
Utkarsh Gill	7f9958b632	Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops (#10945 ) * fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend * cleanup * generic fix * tests * cmp with diagonal too * oops * move tests * fix test * remove unnecessary import * fix assert * compare against numpy --------- Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>	2025-06-24 15:36:06 -04:00
nimlgen	26ddf8d714	amd: rename dev_iface -> iface to match nv (#10959 )	2025-06-24 20:22:19 +03:00
chenyu	bfa87f3490	clean up binary_crossentropy_logits (#10958 )	2025-06-24 12:23:40 -04:00
qazal	2ccddfc0ca	viz: match canvas fontsize (#10957 ) it's 10px https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/font?utm_source=chatgpt.com.	2025-06-24 19:07:06 +03:00
qazal	de4b9bf53b	add opts_to_apply option to AST KernelInfo (#10950 ) * proposal: add option to override opts in the get_program API * update test_linearizer_rewrite * state in uops * update process_replay and names * empty isn't none * fix process replay	2025-06-24 18:55:39 +03:00
chenyu	18e264a449	Tensor.logsigmoid (#10955 )	2025-06-24 11:16:14 -04:00
Ignacio Sica	f15247d2d2	remove outdated index masking in lowerer [pr] (#10953 ) * add assert to check idx is never replaced with const 0 * remove outdated index masking	2025-06-24 07:53:30 -07:00
b1tg	cc32394b32	support copyin/copyout/is_allocated for subbuffers (#10869 ) * support copyin/copyout/is_allocated for subbuffers * simple * clean up * rm underlying_buf * add function is_initialized * add tests * better test_subbuffer_copy_in_out * fix allocator --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-06-24 07:49:04 -07:00
chenyu	35504c938e	torch.clip(x,y) -> x.clip(y) in test_ops (#10954 ) * torch.clip(x,y) -> x.clip(y) in test_ops * test_binary_crossentropy_logits_pos_weights	2025-06-24 10:22:19 -04:00

... 21 22 23 24 25 ...

10417 Commits