Commit Graph

723 Commits

Author SHA1 Message Date
b1tg
1d71436e6a use libllvm19 in ci (#9494)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-03-19 11:53:32 +08:00
Ignacio Sica
5c56cac0a0 MI300 mfma support (#9417)
* add f16/f32 mfma support for MI300

- add 16x16 mfma shape support for f16 with f32 acc
- add ops_python mfma emulation
- add arch to AMDRenderer

* minor cleanup

* minor cleanup

* add mfma emulation task to ci

* add back todo

* hotfix: comment

* add tc=3 job to ci
2025-03-18 14:33:30 -03:00
George Hotz
cb7a7f69c7 quantization preprocessor from DSP, should be universal (#9437)
* quantization preprocessor from DSP, should be universal

* touchups

* fix tests
2025-03-15 07:49:37 +08:00
qazal
4df2b6347d hotfix: bump tinybox red training CI timeout to 30 minutes (#9426) 2025-03-13 09:31:44 +01:00
George Hotz
931436204c hotfix: 12000 lines, for AMD stuff 2025-03-13 10:48:14 +08:00
Priyank Patel
4714c4f9ad torch backend multigpu - add devices and tests (#9414)
* add multi-device support and tests

* simplify
2025-03-12 11:33:11 +08:00
uuuvn
e85001b6ee SQTT profiling (#9278)
* sqtt

* docs

* multi-device

* ProfileSQTTEvent

* exec update

* 256mb default

* don't let people hang their gpus

* bitfields from autogen

* asic info from mesa

* more bitfields from autogen

* SQTT_ITRACE_SE_MASK

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-03-11 13:19:56 +08:00
Priyank Patel
796c3bbb23 torch: support in-place operations on views (#9371)
* add torch inplace tests

* first set of tests passing

* wrap all inplace funcs, add more tests

* fixes and wrap more functions

* fix all uint8 tests to avoid slow tests

* fix the one test

* another test, another fix

* and one more, works for ddp now

* something on contiguous, cleanup

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-03-10 23:29:00 +08:00
hooved
136cf7b8b1 hotfix: load >2 GiB from disk on macOS (#9361)
* enable loading >2 GiB buffer from disk on macOS

* handle None case raised by mypy

* add test

* revert fix to repro bug in CI

* tell CI to run a unit test for macOS

* reapply fix
2025-03-07 14:51:58 +08:00
uuuvn
c6d76770e4 Increase timeout on macos tests (#9362)
Process replay timeouts: https://github.com/tinygrad/tinygrad/actions/runs/13682213444/job/38257133289?pr=9360
2025-03-05 13:04:16 -05:00
nimlgen
cd9d74f7ea use am in training benchmarks (#9357)
* am in training benchmarks

* fix

* not needed anymore
2025-03-05 19:13:47 +03:00
George Hotz
7576a1da23 hotfix: line count to 11500, lines for SQTT and AMDLLVM 2025-03-05 09:21:18 +08:00
chenyu
e301f21f63 CI ubuntu-20.04 -> ubuntu-22.04 (#9345)
20.04 is removed now
2025-03-04 11:39:12 -05:00
chenyu
019417743c ruff torch backend (#9341) 2025-03-03 15:15:23 -05:00
chenyu
40619a4bbc separate workflow for TINY_BACKEND=1 mnist (#9339)
* separate workflow for TINY_BACKEND=1 mnist

* rebalance
2025-03-03 13:05:24 -05:00
Eitan Turok
d657d5f754 [Bounty] Vectorize Transcendental (#9058)
* init

* cast everythig right

* more casting

* install pillow in test

* quick tests

* simplify

* quick tests

* delete test

* tests

* fix import error

* add vec to ldexp3k

* vec for bitcast

* some helper tests

* high level tests

* clean tests

* change tolerance so cuda passes

* ruff passes

* remove tests for transcendental helpers

* ruff passes

* make exponent in power vectorized

* fix pow test

* add newline

* add vec dtype to ilogb2k

* comment + clean up

* ruff

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-28 15:47:25 +08:00
George Hotz
387ea41e99 increase speed of torch mnist: use gradient api (#9282) 2025-02-27 11:57:41 +08:00
Priyank Patel
a0764f0dc0 (bounty) Make mnist training run with torch backend (#9233)
* yml changes

* torch backend remove meta decomps and add test

* torch backend bump timeout for tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-27 11:32:25 +08:00
George Hotz
67ba073c55 hotfix: test accuracy in beautiful_mnist_torch 2025-02-27 11:18:59 +08:00
George Hotz
2158dc4849 full fix for as_strided in torch backend (#9257)
* fixes from chargpt for torch backend

* shrink support

* add stride support

* comment cleanup

* a few more

* work

* import the stream hack

* llvm multi auto
2025-02-26 22:34:05 +08:00
George Hotz
7780393460 rig up torch's testing framework [pr] (#9254)
* rig up torch's testing framework [pr]

* support more movement ops

* dec on expand

* fix tests

* work

* fix tests

* a few more

* decomps + opt hook

* installed pytest
2025-02-26 18:46:22 +08:00
George Hotz
b603af373e run some tests from torch [pr] (#9252)
* run some tests from torch [pr]

* yml

* wrap_out

* clean up for the new people

* a lil more
2025-02-26 15:42:22 +08:00
chenyu
731d14e718 hotfix bump testmetal2 timeout-minutes to 20 (#9235)
setup is taking too long
2025-02-24 20:23:56 -05:00
qazal
cbfe95d306 bring cast before view back (#9230)
* bring cast before view back

* tune it to only trigger on expands

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-25 01:50:39 +02:00
geohotstan
f0b24d230c add test_onnx_ops.py (#8569)
* boom

* fix webgpu

* use exact variable names in test so that AI can read easier

* add tag for specific test name like test a specific dtype

* fix ruff

* astype everything

* dtype in array creation

* just arange

* is 67% considered fixed?

* move test up

* small cleanups

* share function

* add qgemm as well

* add qgemm too

* make sure qgemm comes out as int

* take out qgemm for now

* fixed test

* add correct qgemm

* addressing feedback here too, early naive fix for now

* simplify bias and c to be minimalistic enough to test correctness

* refactored qlinearops

* maybe these asserts aren't the best..

* fix test

* updated tests to cover new ops

* try to add to CI

* move test_onnx_ops into testextra/

* more attention tests

* qlinear_add atol=1

* attention still not fullllllly correct

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-24 16:15:22 -05:00
George Hotz
fd731e740a hotfix: add note on backend2.py 2025-02-24 11:23:03 +08:00
chenyu
e0adb1fc76 really run test_ops with TINY_BACKEND in ci (#9206)
was failing with `line 1: pytest: command not found`
2025-02-22 15:51:24 -05:00
George Hotz
97bc723538 torch backend works for ResNet-18 (#9200)
* torch backend progress, a few more functions

* resnet works

* pillow

* tv
2025-02-22 22:16:23 +08:00
George Hotz
f92820d30d torch backend tests (#9198)
* torch backend tests

* pythonpath

* install ninja
2025-02-22 16:01:49 +08:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
chenyu
3e22747799 run unit test on windows ci (#9187)
* factor out testing_minimal in setup.py [pr]

* testing_unit + windows
2025-02-20 14:40:41 -05:00
qazal
574a905291 Fix running VIZ=1 after package installation + test (#9183)
* test running viz from pip install

* add pkg

* do 10 connection attempts

* include assets in package_data

* quiet curl

* better print
2025-02-20 15:02:00 +01:00
Ahmed Harmouche
0f94b98646 Force WebGPU backend type [pr] (#9164)
* Force webgpu backend type

* Mypy fix

* Rename to WEBGPU_BACKEND

* Add it to env_vars docs

* Remove link
2025-02-19 17:19:39 +08:00
George Hotz
af9d8d39d2 dsp matchers + bump line count to 11300 (#9130) 2025-02-17 17:31:54 +08:00
Ahmed Harmouche
59fe45f947 Solve get_grouped_dims does not split issue (#9085)
* Solve dims too large errors on webgpu

* Simplify divisor find

* Test square root divisor

* Fix lint

* Refactor into group_dims and split_dims

* Refactor

* Fix lint

* Add back max check in _group_dims

* Prefer grouping over split

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-16 19:57:29 -05:00
George Hotz
7e09057afa fixup clang devectorize (#9099)
* fixup clang devectorize

* __builtin_convertvector is some casts

* dsp fixups
2025-02-15 09:29:47 +08:00
JaSpa99
d2ff55e9c6 OSX GPUOcelot (#8209)
* add patches

* add osx test in ci

* macos specific uvm, gpfifo mask

* only do that for now

* Revert "add patches"

This reverts commit 80d3112a57.

* use fork for now

* workflow only one worker

* merge osxtests with tests

* Revert "merge osxtests with tests"

This reverts commit 3461c8f46c.

* macos pagesize 16384

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-13 12:24:29 +08:00
rmtew
b3eab03055 Three things to get Windows CI working correctly: (#9047)
- Ensure that the set backend environment variable is persisted to the next step via $GITHUB_ENV
- It doesn't actually persist for Windows unless shell is explicitly set to bash.
- Add the assertion to ensure the selected backend is actually used.
2025-02-12 14:41:00 -05:00
Ahmed Harmouche
916d5e7f08 WebGPU f16 support (f16 bounty part 2) (#8653)
* WebGPU f16 support

* Don't enable f16 yet

* dtype tests passing after bitcast fix

* Maybe all WebGPU green?

* Require shader-f16 in examples

* Minor wgsl touchup

* 1 line shorter

* Simpler

* Add transcendetal support

* log2 nan location mismatch on Vulkan

* Nan skips
2025-02-12 19:46:53 +08:00
Ignacio Sica
aaed315fee add AMX support to LLVM (#8957)
* init amx support for llvm

* revert elf changes

* fix attributes for AMX asm calls

* add comments

* add llvm amx job to benchmarks

* cleanup

* cleanup

* hotfix: improve comments

* comment for aux buffers

* hotfix:

* move amx_tc to ClangRenderer

* merge master

* refactor

* add docs

* add corsix docs reference

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-12 16:01:18 +08:00
George Hotz
45aae8a6bc hotfix: add External Benchmark Schedule to CI 2025-02-11 22:06:17 +08:00
chenyu
6c39aa4a6b adjust cuda ci test targets (#9014) 2025-02-10 15:29:59 -05:00
chenyu
f9898f7554 update gpuocelot commit (#9011) 2025-02-10 12:18:44 -05:00
qazal
b17ec42b56 remove const_arg (#9002)
* remove const_arg

* use -m pytest

* remove test_const_arg test, variable arg on CONST does not exist.

* use base in test_const_dtype
2025-02-10 12:45:11 +01:00
George Hotz
0568720a68 delete revectorize (#9000)
* delete revectorize

* test vectorized LLVM/CLANG

* idk about that

* was that the segfault?
2025-02-10 18:32:35 +08:00
George Hotz
2983285315 use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] (#8993)
* use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr]

* add quantize test to dsp

* fix tests

* older onnx

* debug, let's see what's happening
2025-02-10 11:07:35 +08:00
nimlgen
52a69dd5e9 Revert "use am in training benchmarks (#8965)" (#8981)
This reverts commit 107e616857.
2025-02-09 15:43:45 +03:00
George Hotz
208097d488 try reducing testing deps [pr] (#8976)
* reduce testing deps

* break out test models

* add PR to models, add models to metal

* okay, not that

* mac cleanup

* mac typo

* other typo
2025-02-09 15:22:32 +08:00
nimlgen
107e616857 use am in training benchmarks (#8965)
* am in training benchmarks

* fix

* not needed anymore
2025-02-08 20:20:47 +03:00
qazal
e7182bbb2c fix "fatal bad object" log in process replay [pr] (#8966) 2025-02-08 11:57:38 +01:00