tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-17 02:48:03 -05:00

Author	SHA1	Message	Date
George Hotz	6eaea3c9d9	RANGEIFY=2 is partial contig	2025-08-21 16:33:33 -07:00
Jordan Chalupka	8de6db15ac	exclude .git from ruff (#11773 )	2025-08-21 15:37:50 -07:00
George Hotz	5954a0975f	fix some assigns on rangeify (#11774 ) * fix some assigns * llvm test * more tests * upd test	2025-08-21 15:15:54 -07:00
qazal	2e0eb88549	viz: add metadata to UOp tracing (#11772 ) * viz: add metadata to UOp tracing * place after tag * optional field * err, refcount of root must be 0	2025-08-22 00:18:45 +03:00
George Hotz	d6f9606e93	small cleanups to rangeify (#11769 )	2025-08-21 11:15:09 -07:00
uuuvn	bd4a9473b0	Multihost exception handling (#11729 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-08-21 13:51:49 -04:00
George Hotz	a2c7b807e0	don't bufferize 0s (#11766 )	2025-08-21 10:10:56 -07:00
nimlgen	9eff7cd1d8	am: support 64bit discovery (#11768 )	2025-08-21 18:28:13 +03:00
b1tg	56cd47a159	fix amd llvm bf16 tc (#11713 ) * fix amd llvm bf16 tc * is_cdna --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-08-21 09:33:28 -04:00
George Hotz	a044648111	rangeify load cleanups + multi support (#11765 ) * use the old buf_uop + cleanups * simpler handling of load * everything needed for multi too	2025-08-20 20:55:49 -07:00
George Hotz	9f94c25a25	fix symbolic usage. use shrink, not reshape (#11762 ) * fix test_var * revert those things * fix the ones in test tiny * use better syntax * it's the same, but that's clearer * fix pad	2025-08-20 18:35:42 -07:00
chenyu	5276fbc9c5	fix gather with inf values (#11760 ) (mask * x) is wrong because 0*inf is nan. i feel we have a lot of those still...	2025-08-20 20:35:40 -04:00
wozeparrot	b979162c5d	llama3 eval train (#11706 )	2025-08-20 19:56:35 -04:00
chenyu	dbd3b67657	clamp GRAD_CLIP_NORM in llama (#11761 )	2025-08-20 19:55:50 -04:00
George Hotz	9635592141	** rangeify, try 3 (#11683 ) * ** rangeify, try 3 * bring that over * bufferize, don't use contig tag * work * ish * fix rangeify * flash attention is back * fix rangeify tests * stuff passes * fix test_log_softmax * more stuff passes * progress children * new endrange solution * progress * progress counter * basic assign * contigs only * symbolic in schedule * unbind_kernel * late children * ops fixed * beautiful mnist is close * that seems to work * mnist works * improve names * fix bmnist * no pcontig * testing backward * work * clone movement ops * new_range helper * MBLOCK/MERGE * ops tests pass * revert mblock stuff * cleanups...but it breaks ops * remove reindex * hack for relu * disable the hacks * more hacks * upd * mostly works with cleanups disabled * ndr * ops tests pass * terrible hacks for indexing to work * context mismatch * pcontig * split pcontig v contig * z3 trunc * null * no fuse in rangeify * ops test passes * lnorm * fix assign * nd rangeify * both should work * tests for rangeify * cleanups * stores pass the pointer through * disable pcontig for now * PARTIAL_CONTIG is a flag	2025-08-20 14:22:44 -07:00
chenyu	d7553721d1	clean up test_dtype_alu (#11757 ) remove the check that looks into schedule, only test if output matches	2025-08-20 14:36:18 -04:00
chenyu	5f08a3e928	hotfix: cast half to float in Tensor.tolist (#11755 ) workaround for python < 3.12	2025-08-20 12:18:35 -04:00
qazal	de4cb722a4	viz: add metadata and var_vals tracing (#11753 ) * viz: add metadata and var_vals tracing * add test_trace_metadata * set TRACEMETA=1	2025-08-20 18:39:51 +03:00
nimlgen	6589c9e643	hcq: better errors for ifaces (#11751 ) * hcq: better errors for ifaces * fix linter * typo * space	2025-08-20 17:50:51 +03:00
chenyu	be7b0b6970	TRANSCENDENTAL_SUPPORTED_DTYPES->TRANSCENDENTAL_DTYPES (#11752 )	2025-08-20 10:29:36 -04:00
ttomsa	220a2a88d7	a(1/b) -> a/b on LLVM, CPU (#11743 ) add fdiv rewrite * :) * use float_lop * use reciprocal() * revert * move to decompositions	2025-08-20 09:35:10 -04:00
George Hotz	12ab3f8b06	correct row_count in process replay (#11748 )	2025-08-19 22:21:07 -07:00
George Hotz	8af8808c61	cleanup tests, bump caches (#11746 )	2025-08-19 21:21:07 -07:00
George Hotz	00391db628	no ast for mem estimate (#11744 ) * no ast for mem estimate * skip for webgpu	2025-08-19 20:18:45 -07:00
chenyu	dd413e1208	remove a Ops.REDUCE check in reduce_collapse [pr] (#11734 )	2025-08-19 19:21:28 -04:00
ttomsa	70c3f1fb29	x.where(False, True) -> !x (#11738 ) * add pat * add test	2025-08-19 19:08:16 -04:00
George Hotz	1d307f568c	move device tests to test/device + test cleanups (#11735 ) * move device tests to test/device * test speedups * test device * linalg to unit * upd * so pytest just works * more divide and skip * speed * test devectorize * add pillow	2025-08-19 16:02:20 -07:00
wozeparrot	bcc7623025	feat: bump version to 0.11.0 (#11736 ) v0.11.0	2025-08-19 17:08:56 -04:00
qazal	8c987b3293	DISABLE_FAST_IDIV is a context var [pr] (#11733 )	2025-08-19 23:30:50 +03:00
George Hotz	bf467c623d	changes from rangeify + better NullRenderer (#11732 ) * changes from rangeify + better NullRenderer * fix test	2025-08-19 12:51:54 -07:00
chenyu	02353588cb	small getitem cleanup (#11730 )	2025-08-19 12:25:58 -04:00
chenyu	712a5c651a	minor Tensor.triu cleanup (#11728 ) less confusing dtype	2025-08-19 08:07:38 -04:00
nimlgen	9c9e337c78	amd: parse soc enums (#11727 ) * amd: parse soc enums * remove from mock * fix * minimal amd_gpu	2025-08-19 15:06:09 +03:00
qazal	57ad69160a	viz: inline memory shape spec (#11725 )	2025-08-19 08:03:29 +03:00
chenyu	c5b52e9321	onnx RotaryEmbedding cleanup (#11724 )	2025-08-18 23:34:42 -04:00
George Hotz	31619774a9	Revert "Revert "fix the misused cast in amd llvm tc (#11711 )" (#11715 )" (#11723 ) This reverts commit `ca28db5a97`.	2025-08-18 19:44:35 -07:00
George Hotz	2ea54d7337	improve syntax of UPats using f [pr] (#11717 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2025-08-18 20:49:45 -04:00
chenyu	b67345caa3	use truncate in onnx read_int64 [pr] (#11720 )	2025-08-18 20:49:35 -04:00
qazal	50e789e290	hotfix: add device to decompositions ctx (#11721 ) fast_idiv requires it for checking if a dtype is supported. Without this, codegen creates non reproducible output without a complete os.environ. since `is_dtype_supported` will open devices based on the env var unless the device is specified by the caller.	2025-08-19 03:31:16 +03:00
George Hotz	4b3fcb4064	Revert "REDUCE_AXIS keepdim=False (#11311 )" (#11718 ) This reverts commit `b518a7378a`.	2025-08-18 13:28:53 -07:00
George Hotz	67d0ba5bd8	new ops from rangeify (#11716 )	2025-08-18 13:13:11 -07:00
George Hotz	4afa0b86bb	hotfix: ls -lh on wheel size	2025-08-18 11:52:59 -07:00
George Hotz	ca28db5a97	Revert "fix the misused cast in amd llvm tc (#11711 )" (#11715 ) This reverts commit `799a637b03`.	2025-08-18 11:51:28 -07:00
chenyu	c10e4c4e20	print wheel build size (#11714 )	2025-08-18 14:29:47 -04:00
b1tg	b518a7378a	REDUCE_AXIS keepdim=False (#11311 ) * progress * fix tests * fix tests * remove hack for test_symfold * fix test_conv.py on llvm * hack test_cache_speed * lint * remove hack for helper_linearizer_opt * tests * fix DSP * clean up * remove hack for kernelize.py * hack for test/test_multitensor.py TestMultiTensor.test_matmul_shard_none * clean * uop.r need reshape? * lower_store cause fail * fix lower? * avoid contiguous hack * 2134 * conv2d count * remove unused * hack lower * reduced and clean up * fix TestMultiTensor.test_matmul_shard_none * src sync + fix TestMultiTensor.test_matmul_shard_none * remove excluded in mop --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-08-18 10:09:17 -07:00
b1tg	61884f2057	add cstyle renderer to the NULL device (#11709 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-18 09:52:22 -07:00
uuuvn	18db8fa311	Allow choosing leaders in multinode reduce (#11506 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-08-18 12:43:20 -04:00
b1tg	799a637b03	fix the misused cast in amd llvm tc (#11711 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-18 09:15:34 -07:00
qazal	fef97547f9	viz: preset the final timestamp (#11712 )	2025-08-18 17:51:21 +03:00
chenyu	c30a113b2a	support bf16 and fp8 in Tensor.tolist (#11704 ) memoryview does not support it, but casting works fine so cast is fine	2025-08-17 15:11:13 -04:00

1 2 3 4 5 ...

9878 Commits