Commit Graph

294 Commits

Author SHA1 Message Date
leopf
e4dad99145 nn.state docs cleanup (#8332)
* doc cleanup

* extension cleanup

* manual definition

* bring back accept_filename for gguf_load

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-18 17:16:40 -04:00
geohotstan
53d6f1e1bb Add bitonic cat sort (#9422)
* poc

* repeated values fail, sigh

* is this being timed out?

* fix up down names

* bitonic v2, does this run?

* bitonic v3, faster

* bitonic v3.1, faster

* bitonic v3.1.1, same speed unlucky

* support dim and indices

* bitonic v3.2, simpler code, TODO repeated indices

* bruv gimme green for once cmon

* cat (stack) implementation, slow but maybe one day when cat is fast meow

* revert to v3.2

* bitonic v4, who let the cats out edition

* clean up variable names

* figured out repeated indices :D

* ruff check --fix

* use sort for topk

* add Tensor.sort everywhere

* fix docs and add some types

* slightly better variable names

* am I doing torch inplace correctly?

* delegate sort to values_stable

* add a contig, faster first sort

* maybe don't test_inplace

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-17 12:01:23 -04:00
geohotstan
1d64c12f2b add Topk to tensor (#9343)
* terrible but somewhat working impl

* linux behaves differently than macos?

* slightly better impl

* small clean up; haven't figured this out yet

* better

* torch has different behavior on linux and macos for duplicated values

* add sum docs

* fix test

* add torch return_type test

* add an exception test

* wrap_fxn instead, and move op lower in order

* better repeated values test

* rerun ci
2025-03-09 20:01:42 -04:00
Francis Lata
86b737a120 leakyrelu to leaky_relu (#9270) 2025-02-26 13:22:08 -05:00
chenyu
aaf0a8069f xor -> bitwise_xor (#9264) 2025-02-26 10:21:14 -05:00
nimlgen
56288243e6 metal PyTorch interop (#9229)
* add from_blob support to mps cuda

* objc_id

* metal pytorch interop

* fix comments

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-02-24 22:36:08 +03:00
nimlgen
1d06d61b16 from_blob for cuda (#9223)
* from_blob for cuda

* maybe docs?

* minor docs

* example

* waiting 9224

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-24 14:02:06 +03:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
Ahmed Harmouche
0f94b98646 Force WebGPU backend type [pr] (#9164)
* Force webgpu backend type

* Mypy fix

* Rename to WEBGPU_BACKEND

* Add it to env_vars docs

* Remove link
2025-02-19 17:19:39 +08:00
Clément Verrier
a7f91224eb add Tensor.isclose() (#8844)
* add `Tensor.isclose()`

* support `equal_nan`

so as to match PyTorch's behavior

* update unit tests

* remove some tests temporarily

* re-enable one test

* re-enable other test

* try to fix failing tests during CI

* save one line of code

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-17 10:11:40 -05:00
Josh Moore
1f9d2442b9 Add Tensor.scatter_reduce (#8947)
* pytorch scatter -> scatter_reduce

* WIP scatter_reduce implementation

* _pre_scatter return type hint

* split out src, mask to satisfy linter

* Add src cast back in

* dict of lambdas instead of ifs

* sum and prod reduction ops with include_self

* add reduce arg error message

* add amax and amin reduction ops

* Fix include_self for higher dims

* Simplify

* Simplify amax and amin too

* Pull include_self logic out into _inv_mask function

* reduce arg cannot be None for scatter_reduce

* Fix self-mask issue

* Add mean reduce op

* Add tests

* any() not needed here

* remove comment

* End support for Tensor src with reduce arg in tinygrad scatter

* Process index, dim inside actual functions

* Add scatter_reduce to onnx

* Add excluded onnx ScatterElements reduction tests back in

* Save 2 lines on the mask helpers

* Update docs

* Add include_self=False tests

* cleanup

* Remove unneeded helper function

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-13 09:08:54 -05:00
George Hotz
a3c78d47b3 speed docs + upgrades [pr] (#8964)
* add some docs about speed [pr]

* better torch gemm

* enable locals on llvm/clang

* disable locals for beam speed on LLVM/CLANG

* 0x20 alignment in llvm allows ymm use
2025-02-08 17:28:52 +08:00
Ahmed Harmouche
133cacadde Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646)
* Switch to dawn, all tests passing locally

* Use dawn-python

* Skip failing test

* Skip midcast and fix timestamp on metal ci

* Autogen webgpu

* Try fetch dawn lib again

* /usr/lib

* Without lib prefix

* Test autogen diff

* Delete webgpu support, move everything to ops_webgpu

* mypy fix

* Simplify, refactor

* Line savings

* No ResultContainer

* Type annotation for result

* Some more simplifications

* Why was this explicit sync used at all?

* Refactor: delete functions that are only used once

* Create shader module inline

* Clear unit tests cache, maybe that solves it

* That wasn't it

* Try deleting cache to pass failing weight compare

* weights_only=False for pytorch 2.6

* Simplify ctype array creation

* Remove nanosecond precision timestamps

* Simplify error handling

* Refactor, add back type annotations

* Deleted custom submit function, refactor

* read_buffer simplify

* Fix use after free, refactor

* Simplify supported_features

* Runtime docs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 15:16:59 +08:00
uuuvn
6dadb60c93 LLVM JIT (+autogen llvm instead of llvmlite) (#8486)
* LLVM JIT

* Autogen LLVM

* Update autogen

* Move things around

* even more non-determinism

* windows

* more autogen weirdness

* more windows stuff

* blind windows development try 2

* more blind windows development

* even more blind windows development

* maybe i should just set up a windows vm...

* why can't everyone just use sysv abi?

* cleanup debugging stuff

* unused import

* icache flushing isn't required on x86

* merge jit_nt and jit_unix

* more

* Temporary hack to not segfault

* better error

* bad conflict resolution

* Attempt to simplify support/llvm.py

* More refactoring

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-02 19:52:42 +08:00
qazal
c8d878a5c1 remove r.lazydata.buf_uop_view [pr] (#8817) 2025-01-30 23:14:36 +02:00
qazal
530961f7d5 realized only exists on base (#8815)
* realized only exists on base [pr]

* shorter

* update that too
2025-01-30 23:02:25 +02:00
George Hotz
a6e496b195 remove Function class [pr] (#8753)
* remove Function class [pr]

* actually remove function

* fix docs
2025-01-26 18:58:02 +09:00
nimlgen
6733a3a96b am: fix typo (#8700) 2025-01-21 14:35:15 +03:00
George Hotz
168c16646a change create_schedule_with_vars api to big_sink [pr] (#8677) 2025-01-19 13:30:26 -08:00
ignaciosica
d2234e308a tf32 tc for nv and ptx (#8635)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-17 17:43:57 -08:00
nimlgen
b3efeeb717 docs: start am docs (#8638)
* docs: init am docs

* missing
2025-01-16 00:22:35 +03:00
qazal
0e97f807e0 test fixup prereqs for delete_buffer_view [pr] (#8523) 2025-01-07 11:52:18 +02:00
nimlgen
5cb9443ebb PROFILE is enabled when VIZ is enabled (#8516) 2025-01-06 19:47:16 +03:00
uuuvn
5ffc50d58c Clang JIT (#8481)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-03 11:12:55 -05:00
qazal
bd4d7dc4eb return becomes_map from the scheduler (#8483)
* return becomes_map from the scheduler

* fix test_schedule

* fix abstractions2

* s/becomes/becomes_map
2025-01-03 22:47:21 +08:00
chenyu
f3fdec940d Tensor.mod (#8458)
it's a python style mod. possibily can be cleaner with a floor div

relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs
2024-12-31 11:31:42 -05:00
George Hotz
4c94726bac remove uop mutability [pr] (#8441)
* remove uop mutability [pr]

* test fixups

* most tests pass

* more tests pass

* lil test fixups

* them too

* fix test

* unneeded

* err, that

* fix test_hcq

* fix test failures

* fix that test

* tensor universe

* does this pass test

* Revert "does this pass test"

This reverts commit ed516b3169.

* Revert "tensor universe"

This reverts commit c21301852a.

* proper spidering for uops

* cleanups

* all tensors

* all tensors

* slow but correct

* fast

* no WeakSet

* faster

* no need for list

* revert that
2024-12-31 00:29:56 -05:00
chenyu
19a54ae0b4 add Tensor.roll and Tensor.rearrange to doc (#8454)
also moved rearrange in tensor.py to high level movement
2024-12-30 20:25:50 -05:00
George Hotz
803a47494e Revert "Clang JIT (#8312)" (#8452)
This reverts commit b6266c8e41.
2024-12-30 17:49:35 -05:00
uuuvn
b6266c8e41 Clang JIT (#8312)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-30 17:37:53 -05:00
qazal
866dfa1f23 create_schedule([x.lazydata]) -> x.schedule() in tests (#8449) 2024-12-31 03:15:52 +08:00
geohotstan
78cb47dfc5 docs and tests clean ups (#8383) 2024-12-23 11:12:13 -05:00
chenyu
63f195729d add gguf_load to doc [pr] (#8314)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-18 12:44:09 -05:00
qazal
d05e21cb69 replace lazy srcs with the new uop api [pr] (#8255)
* buf_uop_view function

* srcs shouldn't exist

* fix TestTensorMetadata

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2024-12-15 17:09:54 +08:00
George Hotz
8396d90f91 non controversial changes from optim branch [pr] (#8234) 2024-12-13 19:24:16 -08:00
George Hotz
37fa38d272 Revert "switch beautiful_mnist to use new optimizer [pr] (#8231)" (#8233)
This reverts commit e9ee39df22.
2024-12-13 19:07:09 -08:00
George Hotz
e9ee39df22 switch beautiful_mnist to use new optimizer [pr] (#8231)
* switch beautiful_mnist to use new optimizer [pr]

* fix abstractions3 + docs

* fix OptimizerGroup with schedule_step api
2024-12-13 18:27:16 -08:00
George Hotz
8a04a3a77a rename LazyBuffer -> UOp [pr] (#8169)
* rename LazyBuffer -> UOp [pr]

* fix docs
2024-12-11 16:15:52 -08:00
geohotstan
0a2e10be1d add SELU to Tensor (#7993)
* add selu

* more clean ups
2024-12-02 10:04:01 -05:00
nimlgen
10f431b96d hcq replace update with sint (#7899)
* try sym hcq

* start with amd

* move to nv

* nv works

* cache and qcom

* fixes

* signals

* fix nv

* qcom fixes

* linter

* linter

* cache + typings

* fixes

* tiny fixes

* linter

* linter

* lntr

* ugh

* comments
2024-11-29 20:08:13 +03:00
geohotstan
cea5853cfa add Tensor.scatter (#7737)
* working I think

* where are my onnx scatter tests??

* forward_only for now

* try if nan hack fix NV

* looks like issue is different... CUDA WHY

* oops that was wrong. Try if this fixes CUDA

* simpler multiply

* actually finish this up tmrw morning :x

* fix tests?

* improve tests

* improve test and implementation

* fix ruff

* complete but lots of expected failure...

* reviewed tests

* add onnx tests

* is this a processing op?

* add return type to indicate that it's not in-place

* final cleanups

* use or and improve tests a little

* add masked_index_select

* call it masked_setitem instead

* try

* FIXED

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-27 10:52:04 -05:00
chenyu
3b26e51fce Tensor.cummax (#7854)
generalized the existing cumsum and take Ops.MAX in addition to Ops.ADD
2024-11-22 15:55:02 -05:00
geohotstan
cf1ec90ad4 add inverse trig functions to Tensor (#7805)
* implement inverse trig functions

* guess we should still test nans?

* magnitude as variable name :D

* reorder onnx_ops ops

* approximation -> x for consistency

* address feedback

* simpler acos

* improvement?

* actually just have asin depend on atan

* actually this is nicer

* remove a comment

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-21 09:13:36 -05:00
George Hotz
9df5a62c5e unify to HWQueue [pr] (#7812)
* unify to HWCommandQueue [pr]

* all is HWQueue
2024-11-21 10:33:08 +08:00
George Hotz
d71fe7faa5 rename allocator methods to not conflict [pr] (#7788)
* rename allocator methods to not conflict [pr]

* forgot those

* transfer + offset
2024-11-20 00:10:29 +08:00
geohotstan
72a41095bc add Tensor.meshgrid (#7714)
* initial implementation and test

* some other places that can use meshgrid

* revert the onnx_ops change

* add to docs

* revert interpolate too

* update

* improve edge case test

* might as well test grad

* add to test can improve docs

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-16 23:06:47 -05:00
ignaciosica
597a239e28 Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725)
* remove unaryops

* remove ternaryops

* remove metaops

* hotfix

* remove binaryops

* hotfix: test_pattern_matcher

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-16 20:56:56 +08:00
geohotstan
f8056a74d6 combine pad2d with pad (#7677)
* I have pad2d, I have pad, uuh~, pad2dpad~

* fix some small things

* strategically placed cast hack

* fix more

* fix more more

* tests

* periods
2024-11-14 17:56:02 +08:00
chenyu
51afc3cc88 update env_vars doc on VIZ link (#7689)
existing one throws 404 because mkdocs does not allow traverse above doc root (i think?). so for now just stick the github link to it
2024-11-13 17:28:14 -05:00
geohotstan
9c41c376d3 add Tensor.nll_loss (#7683)
* move nll_loss to new branch

* make nll_loss examples practical

* self *is*

* add to docs

* small
2024-11-13 13:12:13 -05:00