tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	a03b930339	hotfix: green v2 in docs	2025-08-24 10:25:14 -07:00
chenyu	fb8ee02424	Tensor.logaddexp (#11793 )	2025-08-23 09:15:00 -04:00
wozeparrot	1826004ef9	feat: add tinyos builder link (#11570 )	2025-08-07 17:42:18 -04:00
George Hotz	82be8abfd2	move opt under codegen (#11569 )	2025-08-07 14:19:17 -07:00
chenyu	83385e7abc	update gradient src in ramp.py (#11499 ) that's simplified now	2025-08-04 18:58:03 -04:00
George Hotz	842184a1ab	rename kernelize to schedule, try 2 (#11305 )	2025-07-21 11:18:36 -07:00
nimlgen	cc3c1e4c14	hcq: move cpu to hcq (#11262 ) * hcq: move cpu to hcq * import time * upd * fix * windows support * hm * cleaner * fix timer * fix timing * std is ns * skip profiler * mypy * cleaner * cleanups * after merge * default is back	2025-07-21 15:10:38 +03:00
nimlgen	9a88bd841c	hcq: refactor into peer_groups (#11277 ) * hcq: refactor into peer_groups * fix fors * fixes * ooops * mypy * tiny fixes	2025-07-18 16:34:18 +03:00
chenyu	845a4d32bc	Tensor.diag (#11108 ) also updated Tensor.eye to use it	2025-07-05 23:03:02 -04:00
Ahmed Harmouche	e992ed10dc	WebGPU on Windows (#10890 ) * WebGPU on Windows * Fix dawn-python install * New test * pydeps * Minor fix * Only install dawn-python on windows webgpu --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-07-02 08:38:45 -07:00
chenyu	18e264a449	Tensor.logsigmoid (#10955 )	2025-06-24 11:16:14 -04:00
George Hotz	b09c47366f	opt transforms the ast into an optimized ast (#10900 ) * opt transforms the ast into an optimized ast * fix get_kernel order and to_function_name * function_name property * update docs * copy from kernel.py * improve docs * ci didn't trigger?	2025-06-22 09:41:26 -07:00
George Hotz	7636d2cdc5	flip order of get_program args (#10905 )	2025-06-20 17:23:23 -07:00
George Hotz	1ce63f8d04	move functions to view and update docs [pr] (#10904 ) * move functions to view and update docs [pr] * move quantize	2025-06-20 16:47:58 -07:00
George Hotz	b41e0563a3	move stuff to kernelize folder (#10902 ) * move stuff to kernelize folder * oops, forgot that	2025-06-20 16:10:20 -07:00
George Hotz	cba6e15937	split grouper and kernelize [pr] (#10854 )	2025-06-17 17:54:20 -07:00
George Hotz	5dc1bc6070	switch get_kernel -> get_program [pr] (#10817 ) * switch get_kernel -> get_program [pr] * fix tests	2025-06-15 12:26:50 -07:00
Dan German	24e7aed74b	ramp.py: correct UOp and Ops import path from tinygrad.uop to tinygrad.uop.ops (#10791 )	2025-06-12 10:07:03 -04:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
George Hotz	db01c5a08a	ramp.py file from stream (#10686 )	2025-06-07 14:58:21 -07:00
George Hotz	5ef7c5923f	docs: remove unused METAL_XCODE env var (#10421 )	2025-06-06 18:39:54 -04:00
Eitan Turok	61352b8aa2	Add some more docs (#10634 ) * more docs * Add multinomial to ops * better doc	2025-06-05 19:40:37 -04:00
qazal	5b59728c75	refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541 ) * changes to core tinygrad * fixups pt1 TC=3 docs/abstractions2.py IMAGE=2 test_quantize_dsp test_schedule * more tests * green now * images stay images	2025-05-30 14:27:58 +03:00
Eitan Turok	c07f13c438	Docs for masked_fill (#10558 ) * add docs * fix doc examples * add to docs * fix typo	2025-05-29 03:49:02 -07:00
geohotstan	602a145f8f	Add Tensor.unfold (#10518 ) * yoinked 10272 * eitanturok's fixes * hmmm should size be sint? * add test	2025-05-26 11:15:44 -04:00
George Hotz	147f7747f2	remove the map from create_schedule_with_vars [pr] (#10472 )	2025-05-22 15:58:25 -07:00
George Hotz	0d39bb5de1	rename to get_kernelize_map (#10465 )	2025-05-22 11:44:44 -07:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
George Hotz	6ebfb505e9	docs: fix crossentropy name (#10377 )	2025-05-17 16:39:14 -07:00
Elnur Rakhmatullin	de2b323d97	Fixed a typo in "simplify" (#10358 )	2025-05-16 14:45:14 -07:00
chenyu	8a906cb124	Tensor.randn_like (#10276 )	2025-05-13 11:53:59 -04:00
nimlgen	b583ece8f3	amd: replace AMD_DRIVERLESS with AMD_IFACE (#10116 ) * amd: replace AMD_DRIVERLESS with AMD_IFACE * docs * print direct err for amd_iface * print for all	2025-04-30 20:22:02 +03:00
qazal	0bee225a58	Tensor.kernelize docs (#9946 ) * Tensor.kernelize docs * syntax * test_kernelize_bw * Tensor.kernelize docstring * pruning * tiny details * details 2 * becomes_map terminology * more changes to becomes	2025-04-21 16:34:03 +08:00
qazal	e20ef7196a	Tensor.kernelize (#9845 ) * add kernelize * remove that * kernelize returns self * update abstractions2.py * kernelize in test_schedule * temp: assert BUFFER_VIEW's existence * ASSIGN must have a buffer or subbuffer target * assert and shrink * fix * padded setitem * var * toposort once * extra * base_buffer * end with BUFFER_VIEW * setitem for disk * test_setitem_becomes_subbuffer * mul slice test * torch backend fix 1 * non-deterministic * keep subbuffer	2025-04-20 20:53:49 +08:00
qazal	218e01833d	update scheduler section for abstractions2.py [pr] (#9927 )	2025-04-19 12:09:14 +03:00
Alexey Zaytsev	78a6af3da7	Use $CUDA_PATH/include for CUDA headers (#9858 )	2025-04-13 16:20:19 +01:00
nimlgen	23b67f532c	amd: minor comments and readme updates (#9865 )	2025-04-12 23:24:05 +03:00
qazal	f2bd65ccfc	delete Ops.EMPTY and Tensor._metaop (#9715 ) * delete Ops.EMPTY and Tensor._metaop [pr] * test_creation * arg= * abstractions2	2025-04-03 12:29:02 +08:00
Ignacio Sica	876a8be97a	Debug env var breakdown (#9663 ) * add debug level breakdown * hotfix * Update env_vars.md	2025-04-02 14:34:07 +08:00
chenyu	162f286a0e	add a few Tensor method to doc (#9614 ) * add a few Tensor method to doc * clone	2025-03-28 13:47:16 -04:00
uuuvn	c631c72f22	HCQ: Increment timeline signal before submitting (#9550 ) `AMDComputeQueue.__del__` frees `hw_page` which is safe because `AMDAllocator._free` does `self.dev.synchronize()` which is supposed to wait for execution of IB to finish, however that doesn't happen if AMDComputeQueue is dropped right after submit before timeline signal is incremented, which it is in most places leading to a race if .bind() is also used (required for multi-xcc because bug in mec fw treats all PACKET3_PRED_EXECs outside IBs as if they had EXEC_COUNT of zero).	2025-03-23 18:30:38 +07:00
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
leopf	e4dad99145	nn.state docs cleanup (#8332 ) * doc cleanup * extension cleanup * manual definition * bring back accept_filename for gguf_load --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-18 17:16:40 -04:00
geohotstan	53d6f1e1bb	Add bitonic cat sort (#9422 ) * poc * repeated values fail, sigh * is this being timed out? * fix up down names * bitonic v2, does this run? * bitonic v3, faster * bitonic v3.1, faster * bitonic v3.1.1, same speed unlucky * support dim and indices * bitonic v3.2, simpler code, TODO repeated indices * bruv gimme green for once cmon * cat (stack) implementation, slow but maybe one day when cat is fast meow * revert to v3.2 * bitonic v4, who let the cats out edition * clean up variable names * figured out repeated indices :D * ruff check --fix * use sort for topk * add Tensor.sort everywhere * fix docs and add some types * slightly better variable names * am I doing torch inplace correctly? * delegate sort to values_stable * add a contig, faster first sort * maybe don't test_inplace --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-17 12:01:23 -04:00
geohotstan	1d64c12f2b	add Topk to tensor (#9343 ) * terrible but somewhat working impl * linux behaves differently than macos? * slightly better impl * small clean up; haven't figured this out yet * better * torch has different behavior on linux and macos for duplicated values * add sum docs * fix test * add torch return_type test * add an exception test * wrap_fxn instead, and move op lower in order * better repeated values test * rerun ci	2025-03-09 20:01:42 -04:00
Francis Lata	86b737a120	leakyrelu to leaky_relu (#9270 )	2025-02-26 13:22:08 -05:00
chenyu	aaf0a8069f	xor -> bitwise_xor (#9264 )	2025-02-26 10:21:14 -05:00
nimlgen	56288243e6	metal PyTorch interop (#9229 ) * add from_blob support to mps cuda * objc_id * metal pytorch interop * fix comments --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-02-24 22:36:08 +03:00
nimlgen	1d06d61b16	from_blob for cuda (#9223 ) * from_blob for cuda * maybe docs? * minor docs * example * waiting 9224 --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-24 14:02:06 +03:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00

1 2 3 4 5 ...

336 Commits