Commit Graph

4433 Commits

Author SHA1 Message Date
chenyu
8dfa0024f0 raise in scatter if self and src have different dtype [pr] (#9109)
raise RuntimeError that matches torch instead of an implcitly cast
2025-02-15 11:21:34 -05:00
George Hotz
4672d9af73 actual tests for the dsp backend [pr] (#9102)
* actual tests for the dsp backend [pr]

* fix name
2025-02-15 15:17:56 +08:00
Marcello Fuschi
8824f7e9df Make logcumsumexp numerically stable (#9050)
* Make logcumsumexp numerically stable

* Refactor

* Refactor for special case ndim=0

* Refactor

* Use the correct device for mask

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-14 19:25:17 -05:00
b1tg
1f1362fd27 add truncate_bf16 (#9078)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-02-15 07:59:09 +08:00
chenyu
73af42aeab fix pow backward when base is 0 (#9075) 2025-02-13 21:06:01 -05:00
qazal
2d04a75a40 start tracking bottom_up_rewrite in viz [pr] (#9071)
* start tracking bottom_up_rewrite in viz [pr]

* use the tracking matcher in test_viz
2025-02-14 00:28:10 +01:00
chenyu
5ef48bbe0a swap order in rsqrt (#9069)
fixed backward for 0
2025-02-13 16:51:21 -05:00
chenyu
e02e3b94c3 remove SQRT hack in llvm (#9067)
replaced with xpow 0.5 in transcendental. fixed sqrt(0) backward
2025-02-13 15:42:34 -05:00
chenyu
947c97e6ff add test_sqrt to test_speed_v_torch (#9066)
working on getting rid of llvm sqrt hack
2025-02-13 15:25:54 -05:00
chenyu
49abc09f77 remove the reshapes in test_arange_2_reduce [pr] (#9063) 2025-02-13 12:33:25 -05:00
chenyu
2573d0621a Tensor.scatter_reduce touchup [pr] (#9060) 2025-02-13 10:01:14 -05:00
Josh Moore
1f9d2442b9 Add Tensor.scatter_reduce (#8947)
* pytorch scatter -> scatter_reduce

* WIP scatter_reduce implementation

* _pre_scatter return type hint

* split out src, mask to satisfy linter

* Add src cast back in

* dict of lambdas instead of ifs

* sum and prod reduction ops with include_self

* add reduce arg error message

* add amax and amin reduction ops

* Fix include_self for higher dims

* Simplify

* Simplify amax and amin too

* Pull include_self logic out into _inv_mask function

* reduce arg cannot be None for scatter_reduce

* Fix self-mask issue

* Add mean reduce op

* Add tests

* any() not needed here

* remove comment

* End support for Tensor src with reduce arg in tinygrad scatter

* Process index, dim inside actual functions

* Add scatter_reduce to onnx

* Add excluded onnx ScatterElements reduction tests back in

* Save 2 lines on the mask helpers

* Update docs

* Add include_self=False tests

* cleanup

* Remove unneeded helper function

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-13 09:08:54 -05:00
qazal
2b9ce1235a simple failing case for reorder expand + keep views in tensor_map [pr] (#9057) 2025-02-13 11:22:55 +01:00
George Hotz
33a1151f2f Revert "match torch rmsnorm implementation (#6799)" (#9052)
This reverts commit a66b8250e0.
2025-02-13 14:42:45 +08:00
Ryan Dorrington
a66b8250e0 match torch rmsnorm implementation (#6799)
* update rmsnorm to match torch implementation

* run all tests

* formatting

* formatting

* oneline

* default to 1e-6

* restore old test

* formatting

* don't save elementwise_affine

* your message

* ignore webgpu

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-13 13:02:51 +08:00
gg
19ae829bd1 test float uop in sym_infer (#7456)
* float uop in sym_infer

* break line :(

* rerun mypy

* update GlobalCounters types

* revert type change and cast assignments to mem and ops

* cast inferred value to UOp in reshape

* cast hcq, update view reshape to handle inferred float

* rm extra space

* update error

* no type updates
2025-02-13 12:55:28 +08:00
JaSpa99
d2ff55e9c6 OSX GPUOcelot (#8209)
* add patches

* add osx test in ci

* macos specific uvm, gpfifo mask

* only do that for now

* Revert "add patches"

This reverts commit 80d3112a57.

* use fork for now

* workflow only one worker

* merge osxtests with tests

* Revert "merge osxtests with tests"

This reverts commit 3461c8f46c.

* macos pagesize 16384

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-13 12:24:29 +08:00
chenyu
f4f56d7c15 move time_linearizer to extra.optimization.helpers [pr] (#9048)
no longer used in tinygrad
2025-02-12 15:49:58 -05:00
chenyu
c15486cf39 remove contiguous in test_subbuffer_used [pr] (#9046)
test works without contiguous
2025-02-12 14:41:16 -05:00
chenyu
f53b819648 UOps. -> Ops. [pr] (#9044)
updated the comments and doc except extra
2025-02-12 12:53:23 -05:00
Ahmed Harmouche
916d5e7f08 WebGPU f16 support (f16 bounty part 2) (#8653)
* WebGPU f16 support

* Don't enable f16 yet

* dtype tests passing after bitcast fix

* Maybe all WebGPU green?

* Require shader-f16 in examples

* Minor wgsl touchup

* 1 line shorter

* Simpler

* Add transcendetal support

* log2 nan location mismatch on Vulkan

* Nan skips
2025-02-12 19:46:53 +08:00
Ignacio Sica
aaed315fee add AMX support to LLVM (#8957)
* init amx support for llvm

* revert elf changes

* fix attributes for AMX asm calls

* add comments

* add llvm amx job to benchmarks

* cleanup

* cleanup

* hotfix: improve comments

* comment for aux buffers

* hotfix:

* move amx_tc to ClangRenderer

* merge master

* refactor

* add docs

* add corsix docs reference

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-12 16:01:18 +08:00
Josh Moore
0c97c10814 TestOps: silence pytorch std()/var() degrees of freedom warnings (#9034) 2025-02-12 14:49:18 +08:00
chenyu
2845f8797a failed test cases for rsqrt at 0 and similar ones (#9035)
* failed test cases for rsqrt at 0 and similar ones

related to 0*inf

* this failed
2025-02-11 17:50:16 -05:00
nimlgen
166670a2f2 nv: fill grid/block sizes (#9025) 2025-02-11 16:30:30 +03:00
qazal
c80603285e bring back some things from the fix_kernel_ops diff [pr] (#9027)
* bring fix_kernel_ops back [pr]

* fix
2025-02-11 14:20:31 +01:00
George Hotz
fb698920f1 revert scheduler change (#9019)
* Revert "cleanup ast rewriter [pr] (#9012)"

This reverts commit bf0bcb2d5a.

* Revert "kernel op cleanups + use ScheduleItem [pr] (#9009)"

This reverts commit c52cd2b437.

* Revert "construct the schedule sink 2 (#8925)"

This reverts commit cfd3db7862.
2025-02-11 11:34:12 +08:00
chenyu
6c39aa4a6b adjust cuda ci test targets (#9014) 2025-02-10 15:29:59 -05:00
qazal
bf0bcb2d5a cleanup ast rewriter [pr] (#9012) 2025-02-10 19:07:59 +01:00
chenyu
586e48d696 a few more backward tests now pass (#9010) 2025-02-10 12:46:21 -05:00
chenyu
25fa5e4d5f enable backward tests in test_std_one_in_axis [pr] (#9007)
still one correction=0 case is broken

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-02-10 10:44:05 -05:00
qazal
cfd3db7862 construct the schedule sink 2 (#8925)
* work

* delete preload

* fix metadata

* this can keep existing

* assign pruning

* dedup early

* bfs

* cycle asserts

* move assign check

* once
2025-02-10 22:23:02 +08:00
qazal
cd77e51810 fix tensor realization bug in #8975 (#8984)
* fix tensor realization bug in #8975

* that's a reshape now

* work

* works

* give those tests better names

* test when multiple mops result in the same ShapeTracker

* test_become_existing_buf_complex is enough

* that too
2025-02-10 13:51:30 +01:00
qazal
b17ec42b56 remove const_arg (#9002)
* remove const_arg

* use -m pytest

* remove test_const_arg test, variable arg on CONST does not exist.

* use base in test_const_dtype
2025-02-10 12:45:11 +01:00
George Hotz
0568720a68 delete revectorize (#9000)
* delete revectorize

* test vectorized LLVM/CLANG

* idk about that

* was that the segfault?
2025-02-10 18:32:35 +08:00
qazal
fd9f9ec772 realized base tensors become RESHAPE(BUFFER) [pr] (#8994) 2025-02-10 10:17:54 +01:00
George Hotz
e618efce22 COMMUTATIVE flipping is only for ints (#8996)
* COMMUTATIVE flipping is only for ints [pr]

* no pr

* comm fixes this
2025-02-10 12:01:28 +08:00
George Hotz
2983285315 use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] (#8993)
* use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr]

* add quantize test to dsp

* fix tests

* older onnx

* debug, let's see what's happening
2025-02-10 11:07:35 +08:00
nimlgen
88add71c25 amd: increase sdma copy size (#8989)
* amd: increase sdma max copy size

* rm this

* fix

* fx

* ops
2025-02-09 20:53:35 +03:00
qazal
7eba5fb413 Tensor.empty is RESHAPE(BUFFER) (#8987)
* empty is RESHAPE(BUFFER)

* eh

* add test_empty_buf

* can we unsupport this

* linter

* Revert "can we unsupport this"

This reverts commit 0f71e1aadb.
2025-02-09 18:42:51 +01:00
qazal
55351ebb31 minimal failing test for #8975 [pr] (#8982) 2025-02-09 14:10:37 +01:00
nimlgen
e5a3f60fc2 am: remove libpciaccess dep (#8980)
* am: remove libpciaccess dep

* offset in mockhwiface

* op

* fake regions
2025-02-09 16:06:55 +03:00
George Hotz
0b26cee2f1 fix some slow tests [pr] (#8979) 2025-02-09 15:57:04 +08:00
George Hotz
a3c78d47b3 speed docs + upgrades [pr] (#8964)
* add some docs about speed [pr]

* better torch gemm

* enable locals on llvm/clang

* disable locals for beam speed on LLVM/CLANG

* 0x20 alignment in llvm allows ymm use
2025-02-08 17:28:52 +08:00
chenyu
cfd28517df move pow folding tests to test_schedule [pr] (#8955)
not really belongs to test_const_folding
2025-02-07 12:51:43 -05:00
George Hotz
c2b4c43edb handle stride 0 reduce (#8068)
* handle stride 0 reduce [pr]

* more test fixups

* a few more

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-02-07 15:40:58 +01:00
Ahmed Harmouche
133cacadde Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646)
* Switch to dawn, all tests passing locally

* Use dawn-python

* Skip failing test

* Skip midcast and fix timestamp on metal ci

* Autogen webgpu

* Try fetch dawn lib again

* /usr/lib

* Without lib prefix

* Test autogen diff

* Delete webgpu support, move everything to ops_webgpu

* mypy fix

* Simplify, refactor

* Line savings

* No ResultContainer

* Type annotation for result

* Some more simplifications

* Why was this explicit sync used at all?

* Refactor: delete functions that are only used once

* Create shader module inline

* Clear unit tests cache, maybe that solves it

* That wasn't it

* Try deleting cache to pass failing weight compare

* weights_only=False for pytorch 2.6

* Simplify ctype array creation

* Remove nanosecond precision timestamps

* Simplify error handling

* Refactor, add back type annotations

* Deleted custom submit function, refactor

* read_buffer simplify

* Fix use after free, refactor

* Simplify supported_features

* Runtime docs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 15:16:59 +08:00
Bhavya Gada
3b67712892 [bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple (#8937)
* fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple

* remove expectedFailure

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 10:07:54 +08:00
George Hotz
f54242849d failing test for the devectorize [pr] (#8940)
* failing test for the devectorize [pr]

* add DEVECTORIZE to method_cache
2025-02-07 09:44:54 +08:00
chenyu
a092b6395d Tuple -> tuple, List -> list [pr] (#8936) 2025-02-06 14:21:19 -05:00