Commit Graph

4618 Commits

Author SHA1 Message Date
gg
19ae829bd1 test float uop in sym_infer (#7456)
* float uop in sym_infer

* break line :(

* rerun mypy

* update GlobalCounters types

* revert type change and cast assignments to mem and ops

* cast inferred value to UOp in reshape

* cast hcq, update view reshape to handle inferred float

* rm extra space

* update error

* no type updates
2025-02-13 12:55:28 +08:00
JaSpa99
d2ff55e9c6 OSX GPUOcelot (#8209)
* add patches

* add osx test in ci

* macos specific uvm, gpfifo mask

* only do that for now

* Revert "add patches"

This reverts commit 80d3112a57.

* use fork for now

* workflow only one worker

* merge osxtests with tests

* Revert "merge osxtests with tests"

This reverts commit 3461c8f46c.

* macos pagesize 16384

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-13 12:24:29 +08:00
chenyu
f4f56d7c15 move time_linearizer to extra.optimization.helpers [pr] (#9048)
no longer used in tinygrad
2025-02-12 15:49:58 -05:00
chenyu
c15486cf39 remove contiguous in test_subbuffer_used [pr] (#9046)
test works without contiguous
2025-02-12 14:41:16 -05:00
chenyu
f53b819648 UOps. -> Ops. [pr] (#9044)
updated the comments and doc except extra
2025-02-12 12:53:23 -05:00
Ahmed Harmouche
916d5e7f08 WebGPU f16 support (f16 bounty part 2) (#8653)
* WebGPU f16 support

* Don't enable f16 yet

* dtype tests passing after bitcast fix

* Maybe all WebGPU green?

* Require shader-f16 in examples

* Minor wgsl touchup

* 1 line shorter

* Simpler

* Add transcendetal support

* log2 nan location mismatch on Vulkan

* Nan skips
2025-02-12 19:46:53 +08:00
Ignacio Sica
aaed315fee add AMX support to LLVM (#8957)
* init amx support for llvm

* revert elf changes

* fix attributes for AMX asm calls

* add comments

* add llvm amx job to benchmarks

* cleanup

* cleanup

* hotfix: improve comments

* comment for aux buffers

* hotfix:

* move amx_tc to ClangRenderer

* merge master

* refactor

* add docs

* add corsix docs reference

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-12 16:01:18 +08:00
Josh Moore
0c97c10814 TestOps: silence pytorch std()/var() degrees of freedom warnings (#9034) 2025-02-12 14:49:18 +08:00
chenyu
2845f8797a failed test cases for rsqrt at 0 and similar ones (#9035)
* failed test cases for rsqrt at 0 and similar ones

related to 0*inf

* this failed
2025-02-11 17:50:16 -05:00
nimlgen
166670a2f2 nv: fill grid/block sizes (#9025) 2025-02-11 16:30:30 +03:00
qazal
c80603285e bring back some things from the fix_kernel_ops diff [pr] (#9027)
* bring fix_kernel_ops back [pr]

* fix
2025-02-11 14:20:31 +01:00
George Hotz
fb698920f1 revert scheduler change (#9019)
* Revert "cleanup ast rewriter [pr] (#9012)"

This reverts commit bf0bcb2d5a.

* Revert "kernel op cleanups + use ScheduleItem [pr] (#9009)"

This reverts commit c52cd2b437.

* Revert "construct the schedule sink 2 (#8925)"

This reverts commit cfd3db7862.
2025-02-11 11:34:12 +08:00
chenyu
6c39aa4a6b adjust cuda ci test targets (#9014) 2025-02-10 15:29:59 -05:00
qazal
bf0bcb2d5a cleanup ast rewriter [pr] (#9012) 2025-02-10 19:07:59 +01:00
chenyu
586e48d696 a few more backward tests now pass (#9010) 2025-02-10 12:46:21 -05:00
chenyu
25fa5e4d5f enable backward tests in test_std_one_in_axis [pr] (#9007)
still one correction=0 case is broken

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-02-10 10:44:05 -05:00
qazal
cfd3db7862 construct the schedule sink 2 (#8925)
* work

* delete preload

* fix metadata

* this can keep existing

* assign pruning

* dedup early

* bfs

* cycle asserts

* move assign check

* once
2025-02-10 22:23:02 +08:00
qazal
cd77e51810 fix tensor realization bug in #8975 (#8984)
* fix tensor realization bug in #8975

* that's a reshape now

* work

* works

* give those tests better names

* test when multiple mops result in the same ShapeTracker

* test_become_existing_buf_complex is enough

* that too
2025-02-10 13:51:30 +01:00
qazal
b17ec42b56 remove const_arg (#9002)
* remove const_arg

* use -m pytest

* remove test_const_arg test, variable arg on CONST does not exist.

* use base in test_const_dtype
2025-02-10 12:45:11 +01:00
George Hotz
0568720a68 delete revectorize (#9000)
* delete revectorize

* test vectorized LLVM/CLANG

* idk about that

* was that the segfault?
2025-02-10 18:32:35 +08:00
qazal
fd9f9ec772 realized base tensors become RESHAPE(BUFFER) [pr] (#8994) 2025-02-10 10:17:54 +01:00
George Hotz
e618efce22 COMMUTATIVE flipping is only for ints (#8996)
* COMMUTATIVE flipping is only for ints [pr]

* no pr

* comm fixes this
2025-02-10 12:01:28 +08:00
George Hotz
2983285315 use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] (#8993)
* use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr]

* add quantize test to dsp

* fix tests

* older onnx

* debug, let's see what's happening
2025-02-10 11:07:35 +08:00
nimlgen
88add71c25 amd: increase sdma copy size (#8989)
* amd: increase sdma max copy size

* rm this

* fix

* fx

* ops
2025-02-09 20:53:35 +03:00
qazal
7eba5fb413 Tensor.empty is RESHAPE(BUFFER) (#8987)
* empty is RESHAPE(BUFFER)

* eh

* add test_empty_buf

* can we unsupport this

* linter

* Revert "can we unsupport this"

This reverts commit 0f71e1aadb.
2025-02-09 18:42:51 +01:00
qazal
55351ebb31 minimal failing test for #8975 [pr] (#8982) 2025-02-09 14:10:37 +01:00
nimlgen
e5a3f60fc2 am: remove libpciaccess dep (#8980)
* am: remove libpciaccess dep

* offset in mockhwiface

* op

* fake regions
2025-02-09 16:06:55 +03:00
George Hotz
0b26cee2f1 fix some slow tests [pr] (#8979) 2025-02-09 15:57:04 +08:00
George Hotz
a3c78d47b3 speed docs + upgrades [pr] (#8964)
* add some docs about speed [pr]

* better torch gemm

* enable locals on llvm/clang

* disable locals for beam speed on LLVM/CLANG

* 0x20 alignment in llvm allows ymm use
2025-02-08 17:28:52 +08:00
chenyu
cfd28517df move pow folding tests to test_schedule [pr] (#8955)
not really belongs to test_const_folding
2025-02-07 12:51:43 -05:00
George Hotz
c2b4c43edb handle stride 0 reduce (#8068)
* handle stride 0 reduce [pr]

* more test fixups

* a few more

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-02-07 15:40:58 +01:00
Ahmed Harmouche
133cacadde Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646)
* Switch to dawn, all tests passing locally

* Use dawn-python

* Skip failing test

* Skip midcast and fix timestamp on metal ci

* Autogen webgpu

* Try fetch dawn lib again

* /usr/lib

* Without lib prefix

* Test autogen diff

* Delete webgpu support, move everything to ops_webgpu

* mypy fix

* Simplify, refactor

* Line savings

* No ResultContainer

* Type annotation for result

* Some more simplifications

* Why was this explicit sync used at all?

* Refactor: delete functions that are only used once

* Create shader module inline

* Clear unit tests cache, maybe that solves it

* That wasn't it

* Try deleting cache to pass failing weight compare

* weights_only=False for pytorch 2.6

* Simplify ctype array creation

* Remove nanosecond precision timestamps

* Simplify error handling

* Refactor, add back type annotations

* Deleted custom submit function, refactor

* read_buffer simplify

* Fix use after free, refactor

* Simplify supported_features

* Runtime docs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 15:16:59 +08:00
Bhavya Gada
3b67712892 [bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple (#8937)
* fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple

* remove expectedFailure

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 10:07:54 +08:00
George Hotz
f54242849d failing test for the devectorize [pr] (#8940)
* failing test for the devectorize [pr]

* add DEVECTORIZE to method_cache
2025-02-07 09:44:54 +08:00
chenyu
a092b6395d Tuple -> tuple, List -> list [pr] (#8936) 2025-02-06 14:21:19 -05:00
qazal
79fb5c6470 hotfix: test_shard_no_recompile shouldn't rely on schedule order [pr] (#8928) 2025-02-06 16:27:59 +02:00
George Hotz
ae45826758 hotfix: GRAPH_ONE_KERNEL + fix timing 2025-02-06 17:52:20 +08:00
George Hotz
1c53e8bf27 Revert "objc fast msg (#8922)" (#8926)
This reverts commit c3f99a727e.
2025-02-06 17:50:49 +08:00
George Hotz
c3f99a727e objc fast msg (#8922)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* new objc message style [pr]

* without sync

* no div 0

* lru cache that

* no sync in the profile

* fix

* update all to new style

* remove comment

* graph one kernel

* fix graph one kernel

* remove that sync
2025-02-06 17:49:06 +08:00
George Hotz
a8e54df363 benchmark single kernel launch (#8921)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* without sync

* no div 0

* lru cache that

* no sync in the profile
2025-02-06 13:35:34 +08:00
Josh Moore
44e0eab8fd Fix AttributeError occurring after ValueError in _apply_uop (#8905)
* Fix AttributeError occurring after ValueError in _apply_uop

* Update tensor.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-06 10:56:29 +08:00
chenyu
30695da256 remove Tensor._to_const_val (#8917)
* remove Tensor._to_const_val

added a TODO for advance indexing on const, which was the last place that checks const in Tensor

* that is not folding now

* one more
2025-02-05 21:44:39 -05:00
uuuvn
09ec33a578 Better errors when relocating against undefined symbol (#8902) 2025-02-06 10:13:44 +08:00
chenyu
488200f16c move more pow const to rewrite (#8916)
* move more pow const to rewrite

one less use of _to_const_val

* fix
2025-02-05 20:30:12 -05:00
chenyu
76671381aa move positive const ** t to a rewrite rule (#8914)
* move positive const ** t to a rewrite rule

* one more test
2025-02-05 19:30:12 -05:00
chenyu
189bfa164e enable backward test for pow(neg const ** x) (#8912)
backward works now. 0**x still does not work because it's a special case fixed in transcendental
2025-02-05 15:35:21 -05:00
Ignacio Sica
aec3b8d515 add regression test: test_get_kernel_actions_preserves_actions_state (#8907)
* test_get_kernel_actions_preserves_actions_state

* simplify

* simplify

* refactor assert message
2025-02-05 14:13:01 -05:00
Ignacio Sica
15f94ac964 TC_SEARCH_OVER_SHAPE to search multiple TC shapes (#8793)
* squash search over search

* refactor assert

* init benchmark

* cleaner get_kernel_actions

* cleaner get_kernel_actions

* add comment
2025-02-05 11:03:46 -05:00
qazal
6f0cc2e9c5 rename to KernelContext and move the linearize_sched comment [pr] (#8899)
* rename to KernelContext and move that comment [pr]

* 500
2025-02-05 07:49:58 +01:00
George Hotz
c1c5227acb preserve size in dtype ptr [pr] (#8898) 2025-02-05 14:38:57 +08:00