Commit Graph

1088 Commits

Author SHA1 Message Date
pkotzbach
2c8e4ea865 FP8 support on NVIDIA (#8631)
* squashed fp8 commits

* tensorcore start

* minor changes

* pre-commit

* pylint

* Delete fp8mul.cu

* clean

* small bugfix

* fix test_dtype

* fix test_dtype_alu

* add EMULATE_CUDA_SM89

* fix ci

* fix test_linearizer

* fix test_linearizer

* fix swizzle

* add debug to simple_matmul

* fixed swizzle

* python emulator

* refactor python emulator

* setup fix

* numpy setup

* ml_dtypes only in emulate_cuda_sm89

* fix pylint

* fix tests

* fix mypy

* fix mypy

* fix ruff

* done python emulator

* add acc type

* tests

* mypy

* clean code

* add cuda tensor core tests to CI

* minor fix

* clean test_dtype.py

* clean cstyle.py

* clean test_ops.py

* fix test

* fix test

* whitespaces

* pylint

* pylint

* amd?

* amd?

* amd

* reduce lines

* mockgpu remove

* fix

* ruff

* ruff

* fix mypy

* ruff

* test only for cuda

* fixed formatting

* small fixes

* small fix

* least_upper_dtype if fp8s not supported

* log and reciprocal are supported for fp8s

* ops python fixes

* dtypes.fp8s use

* e4m3 + e5m2 result dtype test

* truncate linter fix

---------

Co-authored-by: pkotzbach <pawkotz@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-04-08 21:54:04 -04:00
Sieds Lykles
07d1aefaf4 fast idiv (#9755)
* fast idiv with tests and fuzzer

* Add todo comment

* Add env variable to toggle fast_idiv

* Move env check

* Add fuzz fast_idiv to ci

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-04-07 08:32:24 -04:00
Ignacio Sica
58785181a8 AMD bf16xf32 TC (#9717)
* dont test bf16 for emulated amd tc

* skip bf16 tc test in ci

* skip bf16 for AMD in test_tensor_cores_codegen

* add simple bf16 gemm test to benchmark
2025-04-07 11:41:04 +08:00
George Hotz
cac8bcf8b5 use Ops.REDUCE (#9721)
* decrease bert python time [pr]

* order copies

* Revert "order copies"

This reverts commit 3f62c8693b.

* rewrite count

* Ops.REDUCE

* acc first in the add chain

* Fix tensor core acc

* arange patterns look good

* fix multireduce gate

* reduce rewrite rule

* bump that to 15 minutes

* multiwmma isn't fusing

* gep through wmma is gep pushing

* bump that timeout too, it's all env setup

* add failing test
2025-04-04 10:14:34 +08:00
chenyu
1d25844d44 Revert "disable CI red llama 3 4 gpu beam (#9690)" (#9709)
This reverts commit 6a5eacba8b.
2025-04-03 02:34:39 -04:00
George Hotz
49dafe6d43 add gc tests [pr] (#9718)
* add gc tests [pr]

* del

* more gc tests

* add NullGraph
2025-04-03 14:08:32 +08:00
Ignacio Sica
bc91fffc5d fix gated store with index in python backend (#9703)
* add default gate in index

* assert store

* add TestRendererFailures

- move test_gated_store_with_alu to new TestRenderFailures class for
tests that fail on multiple renderers
- add test_renderer_failures.py run on python CI

* add test for gated index in 2d

* test TestRenderFailures
2025-04-03 12:48:28 +08:00
chenyu
6a5eacba8b disable CI red llama 3 4 gpu beam (#9690)
device hangs and ci would fail
2025-04-02 03:19:09 -04:00
George Hotz
6f812d3f2f fixes from the dsp branch + 12500 lines (#9683)
* fixes from the dsp branch

* more changes

* those are gep pushing
2025-04-02 13:07:17 +08:00
Anish Umale
a1ee4d587f Fix test_ops for tiny backend (#9302)
* fix some tests in test_ops for torch backend(171 failing)

* fix more tests (135 failures)

* fix tests (126 failing)

* handle transposed convs (109 tests failing)

* fix slice

* fix lshift & rshift and more tests (87 tests failing)

* revert accidental change

* remove unnecessary changes (82 failures)

* fix backward for avg_pool2d (78 failures)

* fix backward for avg_pool2d (78 failures)

* fix replication backpass

* fix reflection pad back pass (71 failures)

* cummax with indicies, aten.mv and move out methods (67 failures)

* extract avg_pool2d and avg_pool3d to separate functions (62 failures)

* revert changes for cat_out

* rewrite avg_pool and pad without repetition

* remove duplicates from decomps

* slice rewrite and add slice_backward (59 failures)

* add dtype fixup from https://github.com/tinygrad/tinygrad/pull/9297

* fix linter error and remove Tensor.pad (48 failures)

* add select_backward and index_put (40 failures)

* fix some more tests (36 failures)

* fix more tests (12 failures)

* some cleanups and fix couple more tests (10 failures)

* cleaner way to write upsample

* some more upsample cleanups

* use lambda for upsample

* add autowrapper for upsample forward

* cumsum and max_dim without aten functions

* revert _log_softmax

* fix more tests (1 failure)

* make linter happy

* move import to appropriate func

* make linter happy

* add codes for noqa

* some more refactors

* remove comment

* remove dependency on aten function for conv backward

* some more refactors

* add returns

* revert a change from merge

* some cleanups

* remove whitespace

* remove ruff change

* revert upsample

* add masked_fill_.Tensor and scatter.src_out

* add todo

* fix test_biased_conv2d

* fix test_var_one_in_axis & test_std_one_in_axis but break test_biased_conv2d :(

* revert torch_debug

* revert torch_debug

* skip test_gather_failure for the tiny backend

* make padding registration more consise

* add nonzero

* remove scatter_add since we already have the out

* fix scatter

* remove some repetition

* make upsample backward registrations more concise

* remove select.int

* use Tensor.cumsum

* realize conv2d outputs before backward to fix test_biased_conv2d

* add a todo for realize(1 failure)

* add new_empty and new_empty_strided

* make test_pad_circular_mode forward only and remove redundant stuff

* fix linter errors

* remove expect failure

* just tb

* slice is a view_op

* contiguous only when lazydata.is_realized

* fix backward for test_pad_circular_mode

* revert torch.nn.functional.pad override

* add transpose.int and make constant_pad_nd contiguous

* slice_backwards has no kwargs

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-31 21:13:09 -04:00
chenyu
60eb0c4ed7 exclude slow tests on PYTHON (#9634) 2025-03-30 22:55:05 -04:00
geohotstan
a08b07b4da Bump onnx==1.17.0 (#9618)
* bump

* remove resize tf_crop_and_resize

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-30 03:21:51 -04:00
b1tg
f90001e1a6 amd llvm render (no_comgr prereq) (#9543)
* amd llvm render

* skip test_div_rounding_mode

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-03-24 22:50:51 +08:00
quortus
bdd44d4255 Fix DSP transcendentals (#9542) 2025-03-22 11:08:18 +08:00
chenyu
ee3d313b34 Revert "update ruff to 0.11.2 (#9531)" (#9535)
This reverts commit d8d65e2747.
2025-03-21 14:52:25 -04:00
b1tg
58206fa8a9 add amd llvm compiler (#9519)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-21 23:13:27 +08:00
chenyu
d8d65e2747 update ruff to 0.11.2 (#9531)
0.11.2 fixed the false alert from 0.11.1. also pinned the version in setup for now to prevent broken CI from ruff upgrade
2025-03-21 10:32:59 -04:00
chenyu
b9fab9b914 pin ruff to 0.11.0 in CI (#9520)
0.11.1 had a bug https://github.com/astral-sh/ruff/issues/16874 that breaks ci
2025-03-20 13:12:50 -04:00
Ignacio Sica
5c56cac0a0 MI300 mfma support (#9417)
* add f16/f32 mfma support for MI300

- add 16x16 mfma shape support for f16 with f32 acc
- add ops_python mfma emulation
- add arch to AMDRenderer

* minor cleanup

* minor cleanup

* add mfma emulation task to ci

* add back todo

* hotfix: comment

* add tc=3 job to ci
2025-03-18 14:33:30 -03:00
George Hotz
cb7a7f69c7 quantization preprocessor from DSP, should be universal (#9437)
* quantization preprocessor from DSP, should be universal

* touchups

* fix tests
2025-03-15 07:49:37 +08:00
qazal
4df2b6347d hotfix: bump tinybox red training CI timeout to 30 minutes (#9426) 2025-03-13 09:31:44 +01:00
George Hotz
931436204c hotfix: 12000 lines, for AMD stuff 2025-03-13 10:48:14 +08:00
Priyank Patel
4714c4f9ad torch backend multigpu - add devices and tests (#9414)
* add multi-device support and tests

* simplify
2025-03-12 11:33:11 +08:00
uuuvn
e85001b6ee SQTT profiling (#9278)
* sqtt

* docs

* multi-device

* ProfileSQTTEvent

* exec update

* 256mb default

* don't let people hang their gpus

* bitfields from autogen

* asic info from mesa

* more bitfields from autogen

* SQTT_ITRACE_SE_MASK

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-03-11 13:19:56 +08:00
Priyank Patel
796c3bbb23 torch: support in-place operations on views (#9371)
* add torch inplace tests

* first set of tests passing

* wrap all inplace funcs, add more tests

* fixes and wrap more functions

* fix all uint8 tests to avoid slow tests

* fix the one test

* another test, another fix

* and one more, works for ddp now

* something on contiguous, cleanup

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-03-10 23:29:00 +08:00
hooved
136cf7b8b1 hotfix: load >2 GiB from disk on macOS (#9361)
* enable loading >2 GiB buffer from disk on macOS

* handle None case raised by mypy

* add test

* revert fix to repro bug in CI

* tell CI to run a unit test for macOS

* reapply fix
2025-03-07 14:51:58 +08:00
uuuvn
c6d76770e4 Increase timeout on macos tests (#9362)
Process replay timeouts: https://github.com/tinygrad/tinygrad/actions/runs/13682213444/job/38257133289?pr=9360
2025-03-05 13:04:16 -05:00
nimlgen
cd9d74f7ea use am in training benchmarks (#9357)
* am in training benchmarks

* fix

* not needed anymore
2025-03-05 19:13:47 +03:00
George Hotz
7576a1da23 hotfix: line count to 11500, lines for SQTT and AMDLLVM 2025-03-05 09:21:18 +08:00
chenyu
e301f21f63 CI ubuntu-20.04 -> ubuntu-22.04 (#9345)
20.04 is removed now
2025-03-04 11:39:12 -05:00
chenyu
019417743c ruff torch backend (#9341) 2025-03-03 15:15:23 -05:00
chenyu
40619a4bbc separate workflow for TINY_BACKEND=1 mnist (#9339)
* separate workflow for TINY_BACKEND=1 mnist

* rebalance
2025-03-03 13:05:24 -05:00
Eitan Turok
d657d5f754 [Bounty] Vectorize Transcendental (#9058)
* init

* cast everythig right

* more casting

* install pillow in test

* quick tests

* simplify

* quick tests

* delete test

* tests

* fix import error

* add vec to ldexp3k

* vec for bitcast

* some helper tests

* high level tests

* clean tests

* change tolerance so cuda passes

* ruff passes

* remove tests for transcendental helpers

* ruff passes

* make exponent in power vectorized

* fix pow test

* add newline

* add vec dtype to ilogb2k

* comment + clean up

* ruff

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-28 15:47:25 +08:00
George Hotz
387ea41e99 increase speed of torch mnist: use gradient api (#9282) 2025-02-27 11:57:41 +08:00
Priyank Patel
a0764f0dc0 (bounty) Make mnist training run with torch backend (#9233)
* yml changes

* torch backend remove meta decomps and add test

* torch backend bump timeout for tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-27 11:32:25 +08:00
George Hotz
67ba073c55 hotfix: test accuracy in beautiful_mnist_torch 2025-02-27 11:18:59 +08:00
George Hotz
2158dc4849 full fix for as_strided in torch backend (#9257)
* fixes from chargpt for torch backend

* shrink support

* add stride support

* comment cleanup

* a few more

* work

* import the stream hack

* llvm multi auto
2025-02-26 22:34:05 +08:00
George Hotz
7780393460 rig up torch's testing framework [pr] (#9254)
* rig up torch's testing framework [pr]

* support more movement ops

* dec on expand

* fix tests

* work

* fix tests

* a few more

* decomps + opt hook

* installed pytest
2025-02-26 18:46:22 +08:00
George Hotz
b603af373e run some tests from torch [pr] (#9252)
* run some tests from torch [pr]

* yml

* wrap_out

* clean up for the new people

* a lil more
2025-02-26 15:42:22 +08:00
chenyu
731d14e718 hotfix bump testmetal2 timeout-minutes to 20 (#9235)
setup is taking too long
2025-02-24 20:23:56 -05:00
qazal
cbfe95d306 bring cast before view back (#9230)
* bring cast before view back

* tune it to only trigger on expands

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-25 01:50:39 +02:00
geohotstan
f0b24d230c add test_onnx_ops.py (#8569)
* boom

* fix webgpu

* use exact variable names in test so that AI can read easier

* add tag for specific test name like test a specific dtype

* fix ruff

* astype everything

* dtype in array creation

* just arange

* is 67% considered fixed?

* move test up

* small cleanups

* share function

* add qgemm as well

* add qgemm too

* make sure qgemm comes out as int

* take out qgemm for now

* fixed test

* add correct qgemm

* addressing feedback here too, early naive fix for now

* simplify bias and c to be minimalistic enough to test correctness

* refactored qlinearops

* maybe these asserts aren't the best..

* fix test

* updated tests to cover new ops

* try to add to CI

* move test_onnx_ops into testextra/

* more attention tests

* qlinear_add atol=1

* attention still not fullllllly correct

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-24 16:15:22 -05:00
George Hotz
fd731e740a hotfix: add note on backend2.py 2025-02-24 11:23:03 +08:00
chenyu
e0adb1fc76 really run test_ops with TINY_BACKEND in ci (#9206)
was failing with `line 1: pytest: command not found`
2025-02-22 15:51:24 -05:00
George Hotz
97bc723538 torch backend works for ResNet-18 (#9200)
* torch backend progress, a few more functions

* resnet works

* pillow

* tv
2025-02-22 22:16:23 +08:00
George Hotz
f92820d30d torch backend tests (#9198)
* torch backend tests

* pythonpath

* install ninja
2025-02-22 16:01:49 +08:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
chenyu
3e22747799 run unit test on windows ci (#9187)
* factor out testing_minimal in setup.py [pr]

* testing_unit + windows
2025-02-20 14:40:41 -05:00
qazal
574a905291 Fix running VIZ=1 after package installation + test (#9183)
* test running viz from pip install

* add pkg

* do 10 connection attempts

* include assets in package_data

* quiet curl

* better print
2025-02-20 15:02:00 +01:00
Ahmed Harmouche
0f94b98646 Force WebGPU backend type [pr] (#9164)
* Force webgpu backend type

* Mypy fix

* Rename to WEBGPU_BACKEND

* Add it to env_vars docs

* Remove link
2025-02-19 17:19:39 +08:00