Commit Graph

336 Commits

Author SHA1 Message Date
George Hotz
a03b930339 hotfix: green v2 in docs 2025-08-24 10:25:14 -07:00
chenyu
fb8ee02424 Tensor.logaddexp (#11793) 2025-08-23 09:15:00 -04:00
wozeparrot
1826004ef9 feat: add tinyos builder link (#11570) 2025-08-07 17:42:18 -04:00
George Hotz
82be8abfd2 move opt under codegen (#11569) 2025-08-07 14:19:17 -07:00
chenyu
83385e7abc update gradient src in ramp.py (#11499)
that's simplified now
2025-08-04 18:58:03 -04:00
George Hotz
842184a1ab rename kernelize to schedule, try 2 (#11305) 2025-07-21 11:18:36 -07:00
nimlgen
cc3c1e4c14 hcq: move cpu to hcq (#11262)
* hcq: move cpu to hcq

* import time

* upd

* fix

* windows support

* hm

* cleaner

* fix timer

* fix timing

* std is ns

* skip profiler

* mypy

* cleaner

* cleanups

* after merge

* default is back
2025-07-21 15:10:38 +03:00
nimlgen
9a88bd841c hcq: refactor into peer_groups (#11277)
* hcq: refactor into peer_groups

* fix fors

* fixes

* ooops

* mypy

* tiny fixes
2025-07-18 16:34:18 +03:00
chenyu
845a4d32bc Tensor.diag (#11108)
also updated Tensor.eye to use it
2025-07-05 23:03:02 -04:00
Ahmed Harmouche
e992ed10dc WebGPU on Windows (#10890)
* WebGPU on Windows

* Fix dawn-python install

* New test

* pydeps

* Minor fix

* Only install dawn-python on windows webgpu

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-07-02 08:38:45 -07:00
chenyu
18e264a449 Tensor.logsigmoid (#10955) 2025-06-24 11:16:14 -04:00
George Hotz
b09c47366f opt transforms the ast into an optimized ast (#10900)
* opt transforms the ast into an optimized ast

* fix get_kernel order and to_function_name

* function_name property

* update docs

* copy from kernel.py

* improve docs

* ci didn't trigger?
2025-06-22 09:41:26 -07:00
George Hotz
7636d2cdc5 flip order of get_program args (#10905) 2025-06-20 17:23:23 -07:00
George Hotz
1ce63f8d04 move functions to view and update docs [pr] (#10904)
* move functions to view and update docs [pr]

* move quantize
2025-06-20 16:47:58 -07:00
George Hotz
b41e0563a3 move stuff to kernelize folder (#10902)
* move stuff to kernelize folder

* oops, forgot that
2025-06-20 16:10:20 -07:00
George Hotz
cba6e15937 split grouper and kernelize [pr] (#10854) 2025-06-17 17:54:20 -07:00
George Hotz
5dc1bc6070 switch get_kernel -> get_program [pr] (#10817)
* switch get_kernel -> get_program [pr]

* fix tests
2025-06-15 12:26:50 -07:00
Dan German
24e7aed74b ramp.py: correct UOp and Ops import path from tinygrad.uop to tinygrad.uop.ops (#10791) 2025-06-12 10:07:03 -04:00
George Hotz
32e9949052 rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
George Hotz
db01c5a08a ramp.py file from stream (#10686) 2025-06-07 14:58:21 -07:00
George Hotz
5ef7c5923f docs: remove unused METAL_XCODE env var (#10421) 2025-06-06 18:39:54 -04:00
Eitan Turok
61352b8aa2 Add some more docs (#10634)
* more docs

* Add multinomial to ops

* better doc
2025-06-05 19:40:37 -04:00
qazal
5b59728c75 refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541)
* changes to core tinygrad

* fixups pt1

TC=3
docs/abstractions2.py
IMAGE=2
test_quantize_dsp
test_schedule

* more tests

* green now

* images stay images
2025-05-30 14:27:58 +03:00
Eitan Turok
c07f13c438 Docs for masked_fill (#10558)
* add docs

* fix doc examples

* add to docs

* fix typo
2025-05-29 03:49:02 -07:00
geohotstan
602a145f8f Add Tensor.unfold (#10518)
* yoinked 10272

* eitanturok's fixes

* hmmm should size be sint?

* add test
2025-05-26 11:15:44 -04:00
George Hotz
147f7747f2 remove the map from create_schedule_with_vars [pr] (#10472) 2025-05-22 15:58:25 -07:00
George Hotz
0d39bb5de1 rename to get_kernelize_map (#10465) 2025-05-22 11:44:44 -07:00
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
George Hotz
6ebfb505e9 docs: fix crossentropy name (#10377) 2025-05-17 16:39:14 -07:00
Elnur Rakhmatullin
de2b323d97 Fixed a typo in "simplify" (#10358) 2025-05-16 14:45:14 -07:00
chenyu
8a906cb124 Tensor.randn_like (#10276) 2025-05-13 11:53:59 -04:00
nimlgen
b583ece8f3 amd: replace AMD_DRIVERLESS with AMD_IFACE (#10116)
* amd: replace AMD_DRIVERLESS with AMD_IFACE

* docs

* print direct err for amd_iface

* print for all
2025-04-30 20:22:02 +03:00
qazal
0bee225a58 Tensor.kernelize docs (#9946)
* Tensor.kernelize docs

* syntax

* test_kernelize_bw

* Tensor.kernelize docstring

* pruning

* tiny details

* details 2

* becomes_map terminology

* more changes to becomes
2025-04-21 16:34:03 +08:00
qazal
e20ef7196a Tensor.kernelize (#9845)
* add kernelize

* remove that

* kernelize returns self

* update abstractions2.py

* kernelize in test_schedule

* temp: assert BUFFER_VIEW's existence

* ASSIGN must have a buffer or subbuffer target

* assert and shrink

* fix

* padded setitem

* var

* toposort once

* extra

* base_buffer

* end with BUFFER_VIEW

* setitem for disk

* test_setitem_becomes_subbuffer

* mul slice test

* torch backend fix 1

* non-deterministic

* keep subbuffer
2025-04-20 20:53:49 +08:00
qazal
218e01833d update scheduler section for abstractions2.py [pr] (#9927) 2025-04-19 12:09:14 +03:00
Alexey Zaytsev
78a6af3da7 Use $CUDA_PATH/include for CUDA headers (#9858) 2025-04-13 16:20:19 +01:00
nimlgen
23b67f532c amd: minor comments and readme updates (#9865) 2025-04-12 23:24:05 +03:00
qazal
f2bd65ccfc delete Ops.EMPTY and Tensor._metaop (#9715)
* delete Ops.EMPTY and Tensor._metaop [pr]

* test_creation

* arg=

* abstractions2
2025-04-03 12:29:02 +08:00
Ignacio Sica
876a8be97a Debug env var breakdown (#9663)
* add debug level breakdown

* hotfix

* Update env_vars.md
2025-04-02 14:34:07 +08:00
chenyu
162f286a0e add a few Tensor method to doc (#9614)
* add a few Tensor method to doc

* clone
2025-03-28 13:47:16 -04:00
uuuvn
c631c72f22 HCQ: Increment timeline signal before submitting (#9550)
`AMDComputeQueue.__del__` frees `hw_page` which is safe because
`AMDAllocator._free` does `self.dev.synchronize()` which is supposed
to wait for execution of IB to finish, however that doesn't happen if
AMDComputeQueue is dropped right after submit before timeline signal is
incremented, which it is in most places leading to a race if .bind() is
also used (required for multi-xcc because bug in mec fw treats all
PACKET3_PRED_EXECs outside IBs as if they had EXEC_COUNT of zero).
2025-03-23 18:30:38 +07:00
geohotstan
309afa20b7 add Tensor.max_unpool2d (#9518)
* why does max_unpool2d feel slower than out.gradient ...

* slightly cleaner

* what happened to ruff

* need to think about this some more

* slightly faster now?

* clean up, 1 more failing edge case

* ok good

* working TINY_BACKEND

* nit doc wording

* retry CI
2025-03-22 12:11:33 -04:00
leopf
e4dad99145 nn.state docs cleanup (#8332)
* doc cleanup

* extension cleanup

* manual definition

* bring back accept_filename for gguf_load

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-18 17:16:40 -04:00
geohotstan
53d6f1e1bb Add bitonic cat sort (#9422)
* poc

* repeated values fail, sigh

* is this being timed out?

* fix up down names

* bitonic v2, does this run?

* bitonic v3, faster

* bitonic v3.1, faster

* bitonic v3.1.1, same speed unlucky

* support dim and indices

* bitonic v3.2, simpler code, TODO repeated indices

* bruv gimme green for once cmon

* cat (stack) implementation, slow but maybe one day when cat is fast meow

* revert to v3.2

* bitonic v4, who let the cats out edition

* clean up variable names

* figured out repeated indices :D

* ruff check --fix

* use sort for topk

* add Tensor.sort everywhere

* fix docs and add some types

* slightly better variable names

* am I doing torch inplace correctly?

* delegate sort to values_stable

* add a contig, faster first sort

* maybe don't test_inplace

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-17 12:01:23 -04:00
geohotstan
1d64c12f2b add Topk to tensor (#9343)
* terrible but somewhat working impl

* linux behaves differently than macos?

* slightly better impl

* small clean up; haven't figured this out yet

* better

* torch has different behavior on linux and macos for duplicated values

* add sum docs

* fix test

* add torch return_type test

* add an exception test

* wrap_fxn instead, and move op lower in order

* better repeated values test

* rerun ci
2025-03-09 20:01:42 -04:00
Francis Lata
86b737a120 leakyrelu to leaky_relu (#9270) 2025-02-26 13:22:08 -05:00
chenyu
aaf0a8069f xor -> bitwise_xor (#9264) 2025-02-26 10:21:14 -05:00
nimlgen
56288243e6 metal PyTorch interop (#9229)
* add from_blob support to mps cuda

* objc_id

* metal pytorch interop

* fix comments

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-02-24 22:36:08 +03:00
nimlgen
1d06d61b16 from_blob for cuda (#9223)
* from_blob for cuda

* maybe docs?

* minor docs

* example

* waiting 9224

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-24 14:02:06 +03:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00