Commit Graph

412 Commits

Author SHA1 Message Date
George Hotz
b4eb876d5a kernel.py no longer permutes reduce axis [pr] (#10968)
* kernel.py no longer permutes reduce axis [pr]

* delete tests that handcode uops

* regen of sops is broken...

* put import back

* just remove that

* disable those tests
2025-06-26 17:44:58 -07:00
Ignacio Sica
579194f523 remove some linearize calls from tests 2 [pr] (#10992)
* refactor count_float4 to take uops as input instead of kernel

* remove some calls to linearize in test_linearizer

* remove some more calls

* remove one more call
2025-06-26 18:22:27 -03:00
Ignacio Sica
21f1c4cc09 remove some linearize calls from tests [pr] (#10978)
* remove some linearize calls from tests

speed_compare_cuda_ptx
test_uop_spec
test_linearizer
test_uops
test_winograd

* more clear assert message
2025-06-25 12:37:17 -07:00
Ignacio Sica
98d2cde293 revert tc_group feature (#10971) 2025-06-24 20:58:13 -07:00
George Hotz
8a65720528 hotfix: disable test_tensor_core_opts_group test on real metal 2025-06-24 15:21:33 -07:00
George Hotz
8743ca40e2 force reduce to be in axis order (#10837)
* force reduce to be in axis order

* disable rule causing loop

* disable that rule

* no ra there

* only move non reduce

* fix tests
2025-06-24 13:00:16 -07:00
Ignacio Sica
956a8391a5 minor cleanup on test_tensor_core_opts tests (#10924)
* minor cleanup on test_tensor_core_opts tests

Tests now notify when skipped
Before, they silently skipped if backend didn't had half precision and
accumulation
Also cleaned up atol and rtol setup

* refactor test_tensor_core_opts_group

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-06-23 16:30:21 -07:00
Ignacio Sica
b8d09a1dae tc with group/grouptop (#10903) 2025-06-23 09:58:41 -07:00
George Hotz
92678e59ee move kernel to opt (#10899) 2025-06-20 15:22:28 -07:00
George Hotz
32e9949052 rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
qazal
5b59728c75 refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541)
* changes to core tinygrad

* fixups pt1

TC=3
docs/abstractions2.py
IMAGE=2
test_quantize_dsp
test_schedule

* more tests

* green now

* images stay images
2025-05-30 14:27:58 +03:00
qazal
bbf05110a2 use kernelize in TestLinearizer.test_indexing_multireduce [pr] (#10571) 2025-05-30 11:27:09 +03:00
qazal
9169dcfb49 do not create kernels with more inputs than the backend allows (#10510)
* work

* no itertools + top down pass

* clean viz

* python can do that

* webgpu

* gbarrier of gbarrier is gbarrier

* device can be tuple

* bug in toposort

* failing test for gated toposort

* contiguous of gbarrier is gbarrier

* check for binops

* Revert "check for binops"

This reverts commit 53e3cdf720.

* viz + match on gbarrier, self exists by default

* alt

* green now

* cleanup
2025-05-26 18:02:03 +03:00
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
Ignacio Sica
8f79492c75 fix test_tensor_cores_codegen for ptx renderer (#10119) 2025-05-01 21:52:36 -03:00
Ignacio Sica
bf5fb97498 fix AMD_LLVM bf16 tc for gfx1100 (#10102)
* fix amd_llvm bf16 tc

* cleanup pattern
2025-04-30 20:06:38 -03:00
Ignacio Sica
bda116d773 fix use_tensor_cores propagation (#10048)
* propagate use_tensor_cores

* add use_tensor_core to arg in test and search

* bugfix

* get TC val from ContextVar in search

* revert minor space change

* add tc emulation test to ci and benchmark

* revert

* revert whitespace change

* remove test for ptx

* add comment and remove llvm test run
2025-04-28 19:30:50 -03:00
George Hotz
4c242b0483 hotfix: tests all pass on metal local 2025-04-28 12:09:00 -04:00
qazal
d13c100981 don't sort dims in verify_sink_dims [pr] (#10059)
* don't sort dims in verify_sink_dims [pr]

* 1 can exist with n

* put process_replay warn last

* assert shape is the same

* bring that back
2025-04-26 23:24:30 +08:00
Ignacio Sica
76a86735c0 hotfix amd bf16 is supported case (#10039)
* hotfix amd and amd_llvm

* bf16 not supported in ci

* hotfix amd_llvm is not a device

* remove default

* dont gate on ci and amd_llvm

* minor cleanup

* skip bf16 tc test for amd_llvm
2025-04-24 21:29:27 -03:00
Ignacio Sica
b4f823acbe fix helper_tc_allclose (#9606)
* fix helper_tc_allclose

* cleanup

* hotfix

* cleanup

* cleanup

* check real buffer and add cast for bf16

* cleanup

* fix padded for ops_python

* avoid assert on amd emulated tc

* swap dimensions

* revert, should have nothing to do with padded

* revert fix, should not go in this pr

* remove skip
2025-04-24 18:36:40 -03:00
Ignacio Sica
51ca19d061 set test_tensor_cores_padded_amd to expectedFailure (#10036)
* init

* add expected failure to correctly track progres

* hotfix

* skip for amd_llvm as well

* add skip

* add pr number

* move comment to amd test

* change reason
2025-04-24 17:11:40 -03:00
Ignacio Sica
373ca59b7f use is_dtype_supported to check dtype support in tc tests (#10035) 2025-04-24 14:59:14 -03:00
George Hotz
2ed3acd767 toposort is a function [pr] (#10004) 2025-04-23 16:25:03 +01:00
chenyu
6c30948df6 hand_coded_optimizations returns list[Opt] [pr] (#9938)
new api looks like `k.apply_opts(hand_coded_optimizations(k))`
2025-04-19 20:26:59 -04:00
Ignacio Sica
023b1c28a2 test_tensor_cores_padded refactor (#9724)
* set pad t 3 for amd padded tc test

* change pad for amd regardless CI

* test tc padded uops and correctness separately

* add test_tensor_cores_padded_uops test to ci

* remove redundant chack for amd device

* cleanup
2025-04-18 17:05:54 -03:00
George Hotz
aa98aff4cd don't use ops name, just keep sink (#9922)
* don't use ops name, just keep sink

* fix test

* endif sink
2025-04-18 08:59:18 +01:00
chenyu
f5256e0020 Kernel.apply_opts [pr] (#9917)
* Kernel.apply_opts [pr]

updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization

* not you yet
2025-04-17 08:00:56 -04:00
chenyu
8c6299bced move hand_coded_optimizations to heuristic.py [pr] (#9844)
* move hand_coded_optimizations to heuristic.py [pr]

also folded all long lines

* make a copy and rename self -> k

* fix test
2025-04-10 23:40:16 -04:00
George Hotz
78caf55154 Revert "FP8 support on NVIDIA (#8631)"
This reverts commit 2c8e4ea865.
2025-04-09 12:27:41 +08:00
pkotzbach
2c8e4ea865 FP8 support on NVIDIA (#8631)
* squashed fp8 commits

* tensorcore start

* minor changes

* pre-commit

* pylint

* Delete fp8mul.cu

* clean

* small bugfix

* fix test_dtype

* fix test_dtype_alu

* add EMULATE_CUDA_SM89

* fix ci

* fix test_linearizer

* fix test_linearizer

* fix swizzle

* add debug to simple_matmul

* fixed swizzle

* python emulator

* refactor python emulator

* setup fix

* numpy setup

* ml_dtypes only in emulate_cuda_sm89

* fix pylint

* fix tests

* fix mypy

* fix mypy

* fix ruff

* done python emulator

* add acc type

* tests

* mypy

* clean code

* add cuda tensor core tests to CI

* minor fix

* clean test_dtype.py

* clean cstyle.py

* clean test_ops.py

* fix test

* fix test

* whitespaces

* pylint

* pylint

* amd?

* amd?

* amd

* reduce lines

* mockgpu remove

* fix

* ruff

* ruff

* fix mypy

* ruff

* test only for cuda

* fixed formatting

* small fixes

* small fix

* least_upper_dtype if fp8s not supported

* log and reciprocal are supported for fp8s

* ops python fixes

* dtypes.fp8s use

* e4m3 + e5m2 result dtype test

* truncate linter fix

---------

Co-authored-by: pkotzbach <pawkotz@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-04-08 21:54:04 -04:00
Ignacio Sica
58785181a8 AMD bf16xf32 TC (#9717)
* dont test bf16 for emulated amd tc

* skip bf16 tc test in ci

* skip bf16 for AMD in test_tensor_cores_codegen

* add simple bf16 gemm test to benchmark
2025-04-07 11:41:04 +08:00
George Hotz
cac8bcf8b5 use Ops.REDUCE (#9721)
* decrease bert python time [pr]

* order copies

* Revert "order copies"

This reverts commit 3f62c8693b.

* rewrite count

* Ops.REDUCE

* acc first in the add chain

* Fix tensor core acc

* arange patterns look good

* fix multireduce gate

* reduce rewrite rule

* bump that to 15 minutes

* multiwmma isn't fusing

* gep through wmma is gep pushing

* bump that timeout too, it's all env setup

* add failing test
2025-04-04 10:14:34 +08:00
Ignacio Sica
2d6d8b7355 add bf16 mfma support (#9695)
* add bf16 mfma support

* skip tc if emulated_amd and dtypes is bf16

* hotfix
2025-04-02 21:44:49 +08:00
George Hotz
e78e8722dc Revert "LDS noop and spec (#9669)" (#9691)
This reverts commit 870b545ace.

Co-authored-by: Ignacio Sica <mignacio.sica@gmail.com>
2025-04-02 15:31:32 +08:00
Ignacio Sica
870b545ace LDS noop and spec (#9669)
* init lds noop and lds_0 spec

* refactor lds helper test

* fix typo

* test all lds at the same time

* change comment

* comment

* start test_lds_full

* test_lds_tc

* add tc spec
2025-04-01 18:44:55 +08:00
b1tg
d9af4cfc1b AMD_LLVM: tensor cores support (#9613)
* tensor cores support

* test tesor cores codegen

* use rewrite rules

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-04-01 09:56:27 +08:00
Ignacio Sica
1444069c09 Uppercase K for dimension and lowercase k for kernel in linearizer tc helper test (#9649) 2025-03-31 19:05:36 +08:00
Ignacio Sica
baa67fd124 Uppercase N and M (standalone syntax change) (#9647) 2025-03-31 18:45:30 +08:00
chenyu
f8976dd2eb enable more webgpu tests (#9502)
OSX has larger buffer number limit, and it supports fp16 now
2025-03-18 23:03:54 -04:00
George Hotz
117b7a16ef VALIDATE_WITH_CPU [pr] (#9488)
* VALIDATE_WITH_CPU [pr]

* fix test
2025-03-18 15:15:04 +08:00
chenyu
01e8b60911 acc_dtype -> dtype (#9402)
matched numpy and torch
2025-03-10 16:05:30 -04:00
George Hotz
ece0a0f305 use empty for test instead of rand (#9332) 2025-03-03 16:19:06 +08:00
George Hotz
2cc4cb74f0 reorder binops (#9328)
* reorder binops

* test improvements + fix string tests

* ugh, okay this
2025-03-03 14:58:18 +08:00
qazal
2eab8021fb remove inputs+outputs attributes from ScheduleItem [pr] (#9192)
* remove inputs/outputs from ScheduleItem

* fix test_linearizer

* fix test_conv_shapetracker

* fix test_schedule + lint

* test_image_dtype + multitensor + search
2025-02-21 13:48:11 +01:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
George Hotz
a4dab3ec3f add name uop (#9149)
* add name uop, TODO: refactor renderer to use

* renderer uses name uop

* fix tests

* render

* ptx
2025-02-18 15:26:58 +08:00
Ahmed Harmouche
59fe45f947 Solve get_grouped_dims does not split issue (#9085)
* Solve dims too large errors on webgpu

* Simplify divisor find

* Test square root divisor

* Fix lint

* Refactor into group_dims and split_dims

* Refactor

* Fix lint

* Add back max check in _group_dims

* Prefer grouping over split

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-16 19:57:29 -05:00
chenyu
f53b819648 UOps. -> Ops. [pr] (#9044)
updated the comments and doc except extra
2025-02-12 12:53:23 -05:00
Ignacio Sica
aaed315fee add AMX support to LLVM (#8957)
* init amx support for llvm

* revert elf changes

* fix attributes for AMX asm calls

* add comments

* add llvm amx job to benchmarks

* cleanup

* cleanup

* hotfix: improve comments

* comment for aux buffers

* hotfix:

* move amx_tc to ClangRenderer

* merge master

* refactor

* add docs

* add corsix docs reference

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-12 16:01:18 +08:00