tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-08 22:48:25 -05:00

Author	SHA1	Message	Date
Elnur Rakhmatullin	de2b323d97	Fixed a typo in "simplify" (#10358 )	2025-05-16 14:45:14 -07:00
chenyu	8a906cb124	Tensor.randn_like (#10276 )	2025-05-13 11:53:59 -04:00
nimlgen	b583ece8f3	amd: replace AMD_DRIVERLESS with AMD_IFACE (#10116 ) * amd: replace AMD_DRIVERLESS with AMD_IFACE * docs * print direct err for amd_iface * print for all	2025-04-30 20:22:02 +03:00
qazal	0bee225a58	Tensor.kernelize docs (#9946 ) * Tensor.kernelize docs * syntax * test_kernelize_bw * Tensor.kernelize docstring * pruning * tiny details * details 2 * becomes_map terminology * more changes to becomes	2025-04-21 16:34:03 +08:00
qazal	e20ef7196a	Tensor.kernelize (#9845 ) * add kernelize * remove that * kernelize returns self * update abstractions2.py * kernelize in test_schedule * temp: assert BUFFER_VIEW's existence * ASSIGN must have a buffer or subbuffer target * assert and shrink * fix * padded setitem * var * toposort once * extra * base_buffer * end with BUFFER_VIEW * setitem for disk * test_setitem_becomes_subbuffer * mul slice test * torch backend fix 1 * non-deterministic * keep subbuffer	2025-04-20 20:53:49 +08:00
qazal	218e01833d	update scheduler section for abstractions2.py [pr] (#9927 )	2025-04-19 12:09:14 +03:00
Alexey Zaytsev	78a6af3da7	Use $CUDA_PATH/include for CUDA headers (#9858 )	2025-04-13 16:20:19 +01:00
nimlgen	23b67f532c	amd: minor comments and readme updates (#9865 )	2025-04-12 23:24:05 +03:00
qazal	f2bd65ccfc	delete Ops.EMPTY and Tensor._metaop (#9715 ) * delete Ops.EMPTY and Tensor._metaop [pr] * test_creation * arg= * abstractions2	2025-04-03 12:29:02 +08:00
Ignacio Sica	876a8be97a	Debug env var breakdown (#9663 ) * add debug level breakdown * hotfix * Update env_vars.md	2025-04-02 14:34:07 +08:00
chenyu	162f286a0e	add a few Tensor method to doc (#9614 ) * add a few Tensor method to doc * clone	2025-03-28 13:47:16 -04:00
uuuvn	c631c72f22	HCQ: Increment timeline signal before submitting (#9550 ) `AMDComputeQueue.__del__` frees `hw_page` which is safe because `AMDAllocator._free` does `self.dev.synchronize()` which is supposed to wait for execution of IB to finish, however that doesn't happen if AMDComputeQueue is dropped right after submit before timeline signal is incremented, which it is in most places leading to a race if .bind() is also used (required for multi-xcc because bug in mec fw treats all PACKET3_PRED_EXECs outside IBs as if they had EXEC_COUNT of zero).	2025-03-23 18:30:38 +07:00
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
leopf	e4dad99145	nn.state docs cleanup (#8332 ) * doc cleanup * extension cleanup * manual definition * bring back accept_filename for gguf_load --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-18 17:16:40 -04:00
geohotstan	53d6f1e1bb	Add bitonic cat sort (#9422 ) * poc * repeated values fail, sigh * is this being timed out? * fix up down names * bitonic v2, does this run? * bitonic v3, faster * bitonic v3.1, faster * bitonic v3.1.1, same speed unlucky * support dim and indices * bitonic v3.2, simpler code, TODO repeated indices * bruv gimme green for once cmon * cat (stack) implementation, slow but maybe one day when cat is fast meow * revert to v3.2 * bitonic v4, who let the cats out edition * clean up variable names * figured out repeated indices :D * ruff check --fix * use sort for topk * add Tensor.sort everywhere * fix docs and add some types * slightly better variable names * am I doing torch inplace correctly? * delegate sort to values_stable * add a contig, faster first sort * maybe don't test_inplace --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-17 12:01:23 -04:00
geohotstan	1d64c12f2b	add Topk to tensor (#9343 ) * terrible but somewhat working impl * linux behaves differently than macos? * slightly better impl * small clean up; haven't figured this out yet * better * torch has different behavior on linux and macos for duplicated values * add sum docs * fix test * add torch return_type test * add an exception test * wrap_fxn instead, and move op lower in order * better repeated values test * rerun ci	2025-03-09 20:01:42 -04:00
Francis Lata	86b737a120	leakyrelu to leaky_relu (#9270 )	2025-02-26 13:22:08 -05:00
chenyu	aaf0a8069f	xor -> bitwise_xor (#9264 )	2025-02-26 10:21:14 -05:00
nimlgen	56288243e6	metal PyTorch interop (#9229 ) * add from_blob support to mps cuda * objc_id * metal pytorch interop * fix comments --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-02-24 22:36:08 +03:00
nimlgen	1d06d61b16	from_blob for cuda (#9223 ) * from_blob for cuda * maybe docs? * minor docs * example * waiting 9224 --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-24 14:02:06 +03:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
Ahmed Harmouche	0f94b98646	Force WebGPU backend type [pr] (#9164 ) * Force webgpu backend type * Mypy fix * Rename to WEBGPU_BACKEND * Add it to env_vars docs * Remove link	2025-02-19 17:19:39 +08:00
Clément Verrier	a7f91224eb	add `Tensor.isclose()` (#8844 ) * add `Tensor.isclose()` * support `equal_nan` so as to match PyTorch's behavior * update unit tests * remove some tests temporarily * re-enable one test * re-enable other test * try to fix failing tests during CI * save one line of code --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-17 10:11:40 -05:00
Josh Moore	1f9d2442b9	Add `Tensor.scatter_reduce` (#8947 ) * pytorch scatter -> scatter_reduce * WIP scatter_reduce implementation * _pre_scatter return type hint * split out src, mask to satisfy linter * Add src cast back in * dict of lambdas instead of ifs * sum and prod reduction ops with include_self * add reduce arg error message * add amax and amin reduction ops * Fix include_self for higher dims * Simplify * Simplify amax and amin too * Pull include_self logic out into _inv_mask function * reduce arg cannot be None for scatter_reduce * Fix self-mask issue * Add mean reduce op * Add tests * any() not needed here * remove comment * End support for Tensor src with reduce arg in tinygrad scatter * Process index, dim inside actual functions * Add scatter_reduce to onnx * Add excluded onnx ScatterElements reduction tests back in * Save 2 lines on the mask helpers * Update docs * Add include_self=False tests * cleanup * Remove unneeded helper function --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-13 09:08:54 -05:00
George Hotz	a3c78d47b3	speed docs + upgrades [pr] (#8964 ) * add some docs about speed [pr] * better torch gemm * enable locals on llvm/clang * disable locals for beam speed on LLVM/CLANG * 0x20 alignment in llvm allows ymm use	2025-02-08 17:28:52 +08:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
uuuvn	6dadb60c93	LLVM JIT (+autogen llvm instead of llvmlite) (#8486 ) * LLVM JIT * Autogen LLVM * Update autogen * Move things around * even more non-determinism * windows * more autogen weirdness * more windows stuff * blind windows development try 2 * more blind windows development * even more blind windows development * maybe i should just set up a windows vm... * why can't everyone just use sysv abi? * cleanup debugging stuff * unused import * icache flushing isn't required on x86 * merge jit_nt and jit_unix * more * Temporary hack to not segfault * better error * bad conflict resolution * Attempt to simplify support/llvm.py * More refactoring --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-02 19:52:42 +08:00
qazal	c8d878a5c1	remove r.lazydata.buf_uop_view [pr] (#8817 )	2025-01-30 23:14:36 +02:00
qazal	530961f7d5	realized only exists on base (#8815 ) * realized only exists on base [pr] * shorter * update that too	2025-01-30 23:02:25 +02:00
George Hotz	a6e496b195	remove Function class [pr] (#8753 ) * remove Function class [pr] * actually remove function * fix docs	2025-01-26 18:58:02 +09:00
nimlgen	6733a3a96b	am: fix typo (#8700 )	2025-01-21 14:35:15 +03:00
George Hotz	168c16646a	change create_schedule_with_vars api to big_sink [pr] (#8677 )	2025-01-19 13:30:26 -08:00
ignaciosica	d2234e308a	tf32 tc for nv and ptx (#8635 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-17 17:43:57 -08:00
nimlgen	b3efeeb717	docs: start am docs (#8638 ) * docs: init am docs * missing	2025-01-16 00:22:35 +03:00
qazal	0e97f807e0	test fixup prereqs for delete_buffer_view [pr] (#8523 )	2025-01-07 11:52:18 +02:00
nimlgen	5cb9443ebb	PROFILE is enabled when VIZ is enabled (#8516 )	2025-01-06 19:47:16 +03:00
uuuvn	5ffc50d58c	Clang JIT (#8481 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-03 11:12:55 -05:00
qazal	bd4d7dc4eb	return becomes_map from the scheduler (#8483 ) * return becomes_map from the scheduler * fix test_schedule * fix abstractions2 * s/becomes/becomes_map	2025-01-03 22:47:21 +08:00
chenyu	f3fdec940d	Tensor.mod (#8458 ) it's a python style mod. possibily can be cleaner with a floor div relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs	2024-12-31 11:31:42 -05:00
George Hotz	4c94726bac	remove uop mutability [pr] (#8441 ) * remove uop mutability [pr] * test fixups * most tests pass * more tests pass * lil test fixups * them too * fix test * unneeded * err, that * fix test_hcq * fix test failures * fix that test * tensor universe * does this pass test * Revert "does this pass test" This reverts commit `ed516b3169`. * Revert "tensor universe" This reverts commit `c21301852a`. * proper spidering for uops * cleanups * all tensors * all tensors * slow but correct * fast * no WeakSet * faster * no need for list * revert that	2024-12-31 00:29:56 -05:00
chenyu	19a54ae0b4	add Tensor.roll and Tensor.rearrange to doc (#8454 ) also moved rearrange in tensor.py to high level movement	2024-12-30 20:25:50 -05:00
George Hotz	803a47494e	Revert "Clang JIT (#8312 )" (#8452 ) This reverts commit `b6266c8e41`.	2024-12-30 17:49:35 -05:00
uuuvn	b6266c8e41	Clang JIT (#8312 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-30 17:37:53 -05:00
qazal	866dfa1f23	create_schedule([x.lazydata]) -> x.schedule() in tests (#8449 )	2024-12-31 03:15:52 +08:00
geohotstan	78cb47dfc5	docs and tests clean ups (#8383 )	2024-12-23 11:12:13 -05:00
chenyu	63f195729d	add gguf_load to doc [pr] (#8314 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-18 12:44:09 -05:00
qazal	d05e21cb69	replace lazy srcs with the new uop api [pr] (#8255 ) * buf_uop_view function * srcs shouldn't exist * fix TestTensorMetadata --------- Co-authored-by: George Hotz <geohot@gmail.com>	2024-12-15 17:09:54 +08:00
George Hotz	8396d90f91	non controversial changes from optim branch [pr] (#8234 )	2024-12-13 19:24:16 -08:00
George Hotz	37fa38d272	Revert "switch beautiful_mnist to use new optimizer [pr] (#8231 )" (#8233 ) This reverts commit `e9ee39df22`.	2024-12-13 19:07:09 -08:00
George Hotz	e9ee39df22	switch beautiful_mnist to use new optimizer [pr] (#8231 ) * switch beautiful_mnist to use new optimizer [pr] * fix abstractions3 + docs * fix OptimizerGroup with schedule_step api	2024-12-13 18:27:16 -08:00

1 2 3 4 5 ...

357 Commits