tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 22:08:08 -05:00

Author	SHA1	Message	Date
George Hotz	cf60ccac6a	support new const lowering (#10967 ) * support new const lowering * delete invalid linearizer failure tests	2025-06-24 15:21:41 -07:00
George Hotz	8a65720528	hotfix: disable test_tensor_core_opts_group test on real metal	2025-06-24 15:21:33 -07:00
nimlgen	1c45b9f7fb	start nvpci (#10521 ) * start nvpci * talk to fsp * boot args * riscv core bootted * q * agen * got gsp init msg * some fixes * set registry, stuck aft lockdown( * start ga/ad port * gsp init on ada * more classes allocated * more * mm * fixes and progress * no huge pages for now * mm seems workin, but switch to 512mb page for simplicity * working state * not cleaned * claned * nvd=1 * start gr ctx * compute * clean 1 * cleanup 2 * cleanup 3 * cleaner 4 * cleaner 6 * add iface to nv * save before reboot * merged into NV * moveout mm * post merge * cleaner 7 * merge and rebase * pciiface abstraction + reset * download fw from web * print logs * minor changes + p2p * cleaner 8 * cleaner 9 * cleaner 10 * delete * delete this as well * linter 1 * oops * priv_client -> priv_root * fix mypy * mypy? * mypy? * small changes * shorter * ops * remove this * do not allocate paddr for reserve * nodiff * unified script * ops * dif ver * add lock * setup	2025-06-25 00:37:34 +03:00
uuuvn	c8d0f68763	Weaker renderer validation in remote (#10964 ) ``` training bert training on ['REMOTE:0', 'REMOTE:1', 'REMOTE:2', 'REMOTE:3', 'REMOTE:4', 'REMOTE:5'] Traceback (most recent call last): File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 1300, in <module> with Profiling(enabled=getenv("PYPROFILE")): globals()[nm]() ^^^^^^^^^^^^^^^ File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 975, in train_bert for x in GPUS: Device[x] ~~~~~~^^^ File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 22, in __getitem__ def __getitem__(self, ix:str) -> Compiled: return self.__get_canonicalized_item(self.canonicalize(ix)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 28, in __get_canonicalized_item ret = [cls for cname, cls in inspect.getmembers(importlib.import_module(f'{base}.runtime.ops_{x}')) \ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/uuuvn/src/tinygrad/tinygrad/runtime/ops_remote.py", line 417, in __init__ if not renderer[0].startswith("tinygrad.renderer.") or not renderer[1].endswith("Renderer"): raise RuntimeError(f"bad renderer {renderer}") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: bad renderer ('tinygrad.runtime.ops_null', 'NullRenderer', ()) ```	2025-06-24 14:15:09 -07:00
George Hotz	c2f5f0f198	more robust reduce_gradient (#10965 )	2025-06-24 14:09:33 -07:00
George Hotz	8743ca40e2	force reduce to be in axis order (#10837 ) * force reduce to be in axis order * disable rule causing loop * disable that rule * no ra there * only move non reduce * fix tests	2025-06-24 13:00:16 -07:00
chenyu	ffb032e31d	test_diagonal touchup (#10962 )	2025-06-24 15:51:19 -04:00
Utkarsh Gill	7f9958b632	Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops (#10945 ) * fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend * cleanup * generic fix * tests * cmp with diagonal too * oops * move tests * fix test * remove unnecessary import * fix assert * compare against numpy --------- Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>	2025-06-24 15:36:06 -04:00
nimlgen	26ddf8d714	amd: rename dev_iface -> iface to match nv (#10959 )	2025-06-24 20:22:19 +03:00
chenyu	bfa87f3490	clean up binary_crossentropy_logits (#10958 )	2025-06-24 12:23:40 -04:00
qazal	2ccddfc0ca	viz: match canvas fontsize (#10957 ) it's 10px https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/font?utm_source=chatgpt.com.	2025-06-24 19:07:06 +03:00
qazal	de4b9bf53b	add opts_to_apply option to AST KernelInfo (#10950 ) * proposal: add option to override opts in the get_program API * update test_linearizer_rewrite * state in uops * update process_replay and names * empty isn't none * fix process replay	2025-06-24 18:55:39 +03:00
chenyu	18e264a449	Tensor.logsigmoid (#10955 )	2025-06-24 11:16:14 -04:00
Ignacio Sica	f15247d2d2	remove outdated index masking in lowerer [pr] (#10953 ) * add assert to check idx is never replaced with const 0 * remove outdated index masking	2025-06-24 07:53:30 -07:00
b1tg	cc32394b32	support copyin/copyout/is_allocated for subbuffers (#10869 ) * support copyin/copyout/is_allocated for subbuffers * simple * clean up * rm underlying_buf * add function is_initialized * add tests * better test_subbuffer_copy_in_out * fix allocator --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-06-24 07:49:04 -07:00
chenyu	35504c938e	torch.clip(x,y) -> x.clip(y) in test_ops (#10954 ) * torch.clip(x,y) -> x.clip(y) in test_ops * test_binary_crossentropy_logits_pos_weights	2025-06-24 10:22:19 -04:00
Fang-Pen Lin	86d458533f	Add pos_weight for binary_crossentropy_logits (#10855 ) * Add pos_weight for binary_crossentropy_logits * Remove debug code * Code style * Code style * Rename	2025-06-24 09:42:37 -04:00
Sieds Lykles	61dad3740f	fix min_max and add test (#10952 )	2025-06-24 09:33:26 -04:00
qazal	ab8c5d04ab	viz: convert to function_name in server [pr] (#10951 ) * viz: convert to function_name in server [pr] * it exists	2025-06-24 13:59:37 +03:00
nimlgen	c0d9cf09e0	system: flock (#10949 ) * system: flock * imports * xx	2025-06-24 11:33:49 +03:00
nimlgen	5202970feb	system: move memory_barrier to System (#10948 ) * system: move memory_barrier to System * fixed	2025-06-24 11:09:43 +03:00
qazal	f41c28a048	update test_tensor_uop_representation comments [pr] (#10946 ) These comments can update to match new tinygrad.	2025-06-24 10:47:09 +03:00
qazal	7a5e4e0bf1	fix unittests process replay [pr] (#10947 )	2025-06-24 10:30:23 +03:00
George Hotz	7d560dbd75	hotfix: corealize in the tiny mnist test	2025-06-23 17:41:16 -07:00
Alexey Zaytsev	230ad3a460	[bounty] Don't use numpy inside hlb_cifar10 training loop (#10777 ) * Don't use numpy inside hlb_cifar10 training loop * Lint it * jit it * Drop the last half-batch * Use gather for random_crop and reuse perms * Wrap train_cifar in FUSE_ARANGE context * No need to pass FUSE_ARANGE=1 to hlb_cifar10.py * Add cutmix to jittable augmentations * Remove .contiguous() from fetch_batches * Fix indexing boundary --------- Co-authored-by: Irwin1138 <irwin1139@gmail.com>	2025-06-23 17:24:56 -07:00
George Hotz	383010555f	delete linearize and to_program from kernel.py (#10943 )	2025-06-23 17:04:05 -07:00
George Hotz	0f89660ce4	Revert "change clang -march flag to -mcpu on arm (#10841 )" (#10942 ) This reverts commit `897e42fd1b`.	2025-06-23 16:48:28 -07:00
Ignacio Sica	956a8391a5	minor cleanup on test_tensor_core_opts tests (#10924 ) * minor cleanup on test_tensor_core_opts tests Tests now notify when skipped Before, they silently skipped if backend didn't had half precision and accumulation Also cleaned up atol and rtol setup * refactor test_tensor_core_opts_group --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-06-23 16:30:21 -07:00
ttomsa	897e42fd1b	change clang -march flag to -mcpu on arm (#10841 ) * change clang -march flag to -mcpu with fp16 disassembly test * fix * add capstone to macos dependencies * just check no cast in test * rm import * woops * lets check * move check * llvm init before cpu chcek * try this * bump autogen llvm version * also update libclang? * revert * add comment * skip llvm test and add comment * linter	2025-06-23 16:28:48 -07:00
Sieds Lykles	772cd02ad2	Perform index validation on load/store, not on the index (#10849 ) * move index validation to load/stores * add name * add linearizer_failure * add validate_store with implicit gates * linearizer_failure_58 is fixed! * add test_uop_graph test * rename cond to gate * test gated load/stores * use or_casted()	2025-06-23 16:25:05 -07:00
George Hotz	ae4d2d71b4	bump line count to 14500	2025-06-23 15:32:27 -07:00
Harsh Natuskar	79d7cdd9ba	Fix device (#10929 ) * fix: pkg * better * added test * less lines	2025-06-23 15:30:19 -07:00
George Hotz	e15754db28	remove (some) kernelize from llama and test schedule speed (#10939 ) * remove kernelize from llama * 405B * space	2025-06-23 15:07:31 -07:00
chenyu	3699d1d3ba	hotfix llama3 temperature is float (#10938 )	2025-06-23 15:20:56 -04:00
uuuvn	4e2c9e36c7	Remote multihost (p2p transfer) (#10601 )	2025-06-23 11:47:29 -07:00
chenyu	42b1c9625b	skip test TestKiTS19Dataset::test_training_set (#10936 ) flaky	2025-06-23 14:27:24 -04:00
patrini32	9e9fd44987	refactor test/external/external_llama_eval.py (#10567 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-06-23 10:43:20 -07:00
chenyu	785b4ea8ac	optim flatten().shape[0] is numel (#10935 )	2025-06-23 13:11:19 -04:00
qazal	ac39f27ae6	viz: non blocking UOp tracing (#10913 ) * viz: non blocking UOp tracing * u.arg * no if Ops.KENREL * drop replace * switch to weakref.WeakKeyDictionary * back * remove ram usage skips, viz works here * cache on reconstruct	2025-06-23 19:59:28 +03:00
Ignacio Sica	b8d09a1dae	tc with group/grouptop (#10903 )	2025-06-23 09:58:41 -07:00
qazal	9944c2c02d	viz: show time taken on hover (#10934 )	2025-06-23 19:00:40 +03:00
George Hotz	1e99a7f1c9	hotfix: don't viz the indexing rewrites	2025-06-23 08:20:26 -07:00
chenyu	f9b59924f1	OPTIM_DTYPE to specify dtype for optim params (#10925 ) one more flag	2025-06-23 10:32:03 -04:00
qazal	7820aeca8e	update codegen process replay to use get_program [pr] (#10921 ) * update codegen process replay to get_program [pr] * precommit * try str replace * +to_function_name * fixup tc * local2.sh * fix openpilot NOLOCALS * new local.sh * correct merge * beam cache * back * revert beam thing * adding opts_override and name_override makes output of get_program reproducible * min diff	2025-06-23 17:31:41 +03:00
nimlgen	eceb7a00d2	nv: rename iface mem functions (#10931 )	2025-06-23 16:34:51 +03:00
qazal	4e864bd304	fix: getenv("NOLOCALS")/NOLOCALS context var (#10927 ) OptOps shouldn't rely on os.environ.	2025-06-23 11:23:59 +03:00
alpharush	22f9696522	Fix/hcqfuzz harnesss bug (#10923 ) * update command so extra module is found * fix empty range in randrange errors * lint	2025-06-23 11:22:30 +03:00
qazal	f037f85532	s/getenv("TC")/USE_TC context var (#10922 )	2025-06-23 00:39:45 +03:00
qazal	9201224e0b	viz: remove Kernel check [pr] (#10920 ) * viz: remove Kernel check [pr] * TestVizIntegration * test/unit allows opening of devices * kernel -> Kernel	2025-06-22 20:47:54 +03:00
nimlgen	3ccdb2356b	system: factor out PCIIfaceBase (#10917 ) * system: factor out PCIIfaceBase * linter * typing	2025-06-22 20:03:14 +03:00

... 26 27 28 29 30 ...

10633 Commits