tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 23:48:01 -05:00

Author	SHA1	Message	Date
leopf	e4dad99145	nn.state docs cleanup (#8332 ) * doc cleanup * extension cleanup * manual definition * bring back accept_filename for gguf_load --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-18 17:16:40 -04:00
geohotstan	53d6f1e1bb	Add bitonic cat sort (#9422 ) * poc * repeated values fail, sigh * is this being timed out? * fix up down names * bitonic v2, does this run? * bitonic v3, faster * bitonic v3.1, faster * bitonic v3.1.1, same speed unlucky * support dim and indices * bitonic v3.2, simpler code, TODO repeated indices * bruv gimme green for once cmon * cat (stack) implementation, slow but maybe one day when cat is fast meow * revert to v3.2 * bitonic v4, who let the cats out edition * clean up variable names * figured out repeated indices :D * ruff check --fix * use sort for topk * add Tensor.sort everywhere * fix docs and add some types * slightly better variable names * am I doing torch inplace correctly? * delegate sort to values_stable * add a contig, faster first sort * maybe don't test_inplace --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-17 12:01:23 -04:00
geohotstan	1d64c12f2b	add Topk to tensor (#9343 ) * terrible but somewhat working impl * linux behaves differently than macos? * slightly better impl * small clean up; haven't figured this out yet * better * torch has different behavior on linux and macos for duplicated values * add sum docs * fix test * add torch return_type test * add an exception test * wrap_fxn instead, and move op lower in order * better repeated values test * rerun ci	2025-03-09 20:01:42 -04:00
Francis Lata	86b737a120	leakyrelu to leaky_relu (#9270 )	2025-02-26 13:22:08 -05:00
chenyu	aaf0a8069f	xor -> bitwise_xor (#9264 )	2025-02-26 10:21:14 -05:00
nimlgen	56288243e6	metal PyTorch interop (#9229 ) * add from_blob support to mps cuda * objc_id * metal pytorch interop * fix comments --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-02-24 22:36:08 +03:00
nimlgen	1d06d61b16	from_blob for cuda (#9223 ) * from_blob for cuda * maybe docs? * minor docs * example * waiting 9224 --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-24 14:02:06 +03:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
Ahmed Harmouche	0f94b98646	Force WebGPU backend type [pr] (#9164 ) * Force webgpu backend type * Mypy fix * Rename to WEBGPU_BACKEND * Add it to env_vars docs * Remove link	2025-02-19 17:19:39 +08:00
Clément Verrier	a7f91224eb	add `Tensor.isclose()` (#8844 ) * add `Tensor.isclose()` * support `equal_nan` so as to match PyTorch's behavior * update unit tests * remove some tests temporarily * re-enable one test * re-enable other test * try to fix failing tests during CI * save one line of code --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-17 10:11:40 -05:00
Josh Moore	1f9d2442b9	Add `Tensor.scatter_reduce` (#8947 ) * pytorch scatter -> scatter_reduce * WIP scatter_reduce implementation * _pre_scatter return type hint * split out src, mask to satisfy linter * Add src cast back in * dict of lambdas instead of ifs * sum and prod reduction ops with include_self * add reduce arg error message * add amax and amin reduction ops * Fix include_self for higher dims * Simplify * Simplify amax and amin too * Pull include_self logic out into _inv_mask function * reduce arg cannot be None for scatter_reduce * Fix self-mask issue * Add mean reduce op * Add tests * any() not needed here * remove comment * End support for Tensor src with reduce arg in tinygrad scatter * Process index, dim inside actual functions * Add scatter_reduce to onnx * Add excluded onnx ScatterElements reduction tests back in * Save 2 lines on the mask helpers * Update docs * Add include_self=False tests * cleanup * Remove unneeded helper function --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-13 09:08:54 -05:00
George Hotz	a3c78d47b3	speed docs + upgrades [pr] (#8964 ) * add some docs about speed [pr] * better torch gemm * enable locals on llvm/clang * disable locals for beam speed on LLVM/CLANG * 0x20 alignment in llvm allows ymm use	2025-02-08 17:28:52 +08:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
uuuvn	6dadb60c93	LLVM JIT (+autogen llvm instead of llvmlite) (#8486 ) * LLVM JIT * Autogen LLVM * Update autogen * Move things around * even more non-determinism * windows * more autogen weirdness * more windows stuff * blind windows development try 2 * more blind windows development * even more blind windows development * maybe i should just set up a windows vm... * why can't everyone just use sysv abi? * cleanup debugging stuff * unused import * icache flushing isn't required on x86 * merge jit_nt and jit_unix * more * Temporary hack to not segfault * better error * bad conflict resolution * Attempt to simplify support/llvm.py * More refactoring --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-02 19:52:42 +08:00
qazal	c8d878a5c1	remove r.lazydata.buf_uop_view [pr] (#8817 )	2025-01-30 23:14:36 +02:00
qazal	530961f7d5	realized only exists on base (#8815 ) * realized only exists on base [pr] * shorter * update that too	2025-01-30 23:02:25 +02:00
George Hotz	a6e496b195	remove Function class [pr] (#8753 ) * remove Function class [pr] * actually remove function * fix docs	2025-01-26 18:58:02 +09:00
nimlgen	6733a3a96b	am: fix typo (#8700 )	2025-01-21 14:35:15 +03:00
George Hotz	168c16646a	change create_schedule_with_vars api to big_sink [pr] (#8677 )	2025-01-19 13:30:26 -08:00
ignaciosica	d2234e308a	tf32 tc for nv and ptx (#8635 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-17 17:43:57 -08:00
nimlgen	b3efeeb717	docs: start am docs (#8638 ) * docs: init am docs * missing	2025-01-16 00:22:35 +03:00
qazal	0e97f807e0	test fixup prereqs for delete_buffer_view [pr] (#8523 )	2025-01-07 11:52:18 +02:00
nimlgen	5cb9443ebb	PROFILE is enabled when VIZ is enabled (#8516 )	2025-01-06 19:47:16 +03:00
uuuvn	5ffc50d58c	Clang JIT (#8481 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-03 11:12:55 -05:00
qazal	bd4d7dc4eb	return becomes_map from the scheduler (#8483 ) * return becomes_map from the scheduler * fix test_schedule * fix abstractions2 * s/becomes/becomes_map	2025-01-03 22:47:21 +08:00
chenyu	f3fdec940d	Tensor.mod (#8458 ) it's a python style mod. possibily can be cleaner with a floor div relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs	2024-12-31 11:31:42 -05:00
George Hotz	4c94726bac	remove uop mutability [pr] (#8441 ) * remove uop mutability [pr] * test fixups * most tests pass * more tests pass * lil test fixups * them too * fix test * unneeded * err, that * fix test_hcq * fix test failures * fix that test * tensor universe * does this pass test * Revert "does this pass test" This reverts commit `ed516b3169`. * Revert "tensor universe" This reverts commit `c21301852a`. * proper spidering for uops * cleanups * all tensors * all tensors * slow but correct * fast * no WeakSet * faster * no need for list * revert that	2024-12-31 00:29:56 -05:00
chenyu	19a54ae0b4	add Tensor.roll and Tensor.rearrange to doc (#8454 ) also moved rearrange in tensor.py to high level movement	2024-12-30 20:25:50 -05:00
George Hotz	803a47494e	Revert "Clang JIT (#8312 )" (#8452 ) This reverts commit `b6266c8e41`.	2024-12-30 17:49:35 -05:00
uuuvn	b6266c8e41	Clang JIT (#8312 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-30 17:37:53 -05:00
qazal	866dfa1f23	create_schedule([x.lazydata]) -> x.schedule() in tests (#8449 )	2024-12-31 03:15:52 +08:00
geohotstan	78cb47dfc5	docs and tests clean ups (#8383 )	2024-12-23 11:12:13 -05:00
chenyu	63f195729d	add gguf_load to doc [pr] (#8314 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-18 12:44:09 -05:00
qazal	d05e21cb69	replace lazy srcs with the new uop api [pr] (#8255 ) * buf_uop_view function * srcs shouldn't exist * fix TestTensorMetadata --------- Co-authored-by: George Hotz <geohot@gmail.com>	2024-12-15 17:09:54 +08:00
George Hotz	8396d90f91	non controversial changes from optim branch [pr] (#8234 )	2024-12-13 19:24:16 -08:00
George Hotz	37fa38d272	Revert "switch beautiful_mnist to use new optimizer [pr] (#8231 )" (#8233 ) This reverts commit `e9ee39df22`.	2024-12-13 19:07:09 -08:00
George Hotz	e9ee39df22	switch beautiful_mnist to use new optimizer [pr] (#8231 ) * switch beautiful_mnist to use new optimizer [pr] * fix abstractions3 + docs * fix OptimizerGroup with schedule_step api	2024-12-13 18:27:16 -08:00
George Hotz	8a04a3a77a	rename LazyBuffer -> UOp [pr] (#8169 ) * rename LazyBuffer -> UOp [pr] * fix docs	2024-12-11 16:15:52 -08:00
geohotstan	0a2e10be1d	add SELU to Tensor (#7993 ) * add selu * more clean ups	2024-12-02 10:04:01 -05:00
nimlgen	10f431b96d	hcq replace update with sint (#7899 ) * try sym hcq * start with amd * move to nv * nv works * cache and qcom * fixes * signals * fix nv * qcom fixes * linter * linter * cache + typings * fixes * tiny fixes * linter * linter * lntr * ugh * comments	2024-11-29 20:08:13 +03:00
geohotstan	cea5853cfa	add Tensor.scatter (#7737 ) * working I think * where are my onnx scatter tests?? * forward_only for now * try if nan hack fix NV * looks like issue is different... CUDA WHY * oops that was wrong. Try if this fixes CUDA * simpler multiply * actually finish this up tmrw morning :x * fix tests? * improve tests * improve test and implementation * fix ruff * complete but lots of expected failure... * reviewed tests * add onnx tests * is this a processing op? * add return type to indicate that it's not in-place * final cleanups * use or and improve tests a little * add masked_index_select * call it masked_setitem instead * try * FIXED --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-27 10:52:04 -05:00
chenyu	3b26e51fce	Tensor.cummax (#7854 ) generalized the existing cumsum and take Ops.MAX in addition to Ops.ADD	2024-11-22 15:55:02 -05:00
geohotstan	cf1ec90ad4	add inverse trig functions to Tensor (#7805 ) * implement inverse trig functions * guess we should still test nans? * magnitude as variable name :D * reorder onnx_ops ops * approximation -> x for consistency * address feedback * simpler acos * improvement? * actually just have asin depend on atan * actually this is nicer * remove a comment --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-21 09:13:36 -05:00
George Hotz	9df5a62c5e	unify to HWQueue [pr] (#7812 ) * unify to HWCommandQueue [pr] * all is HWQueue	2024-11-21 10:33:08 +08:00
George Hotz	d71fe7faa5	rename allocator methods to not conflict [pr] (#7788 ) * rename allocator methods to not conflict [pr] * forgot those * transfer + offset	2024-11-20 00:10:29 +08:00
geohotstan	72a41095bc	add Tensor.meshgrid (#7714 ) * initial implementation and test * some other places that can use meshgrid * revert the onnx_ops change * add to docs * revert interpolate too * update * improve edge case test * might as well test grad * add to test can improve docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-16 23:06:47 -05:00
ignaciosica	597a239e28	Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725 ) * remove unaryops * remove ternaryops * remove metaops * hotfix * remove binaryops * hotfix: test_pattern_matcher --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-16 20:56:56 +08:00
geohotstan	f8056a74d6	combine pad2d with pad (#7677 ) * I have pad2d, I have pad, uuh~, pad2dpad~ * fix some small things * strategically placed cast hack * fix more * fix more more * tests * periods	2024-11-14 17:56:02 +08:00
chenyu	51afc3cc88	update env_vars doc on VIZ link (#7689 ) existing one throws 404 because mkdocs does not allow traverse above doc root (i think?). so for now just stick the github link to it	2024-11-13 17:28:14 -05:00
geohotstan	9c41c376d3	add Tensor.nll_loss (#7683 ) * move nll_loss to new branch * make nll_loss examples practical * self is * add to docs * small	2024-11-13 13:12:13 -05:00

1 2 3 4 5 ...

294 Commits