Commit Graph

518 Commits

Author SHA1 Message Date
George Hotz
2158dc4849 full fix for as_strided in torch backend (#9257)
* fixes from chargpt for torch backend

* shrink support

* add stride support

* comment cleanup

* a few more

* work

* import the stream hack

* llvm multi auto
2025-02-26 22:34:05 +08:00
George Hotz
7780393460 rig up torch's testing framework [pr] (#9254)
* rig up torch's testing framework [pr]

* support more movement ops

* dec on expand

* fix tests

* work

* fix tests

* a few more

* decomps + opt hook

* installed pytest
2025-02-26 18:46:22 +08:00
George Hotz
b603af373e run some tests from torch [pr] (#9252)
* run some tests from torch [pr]

* yml

* wrap_out

* clean up for the new people

* a lil more
2025-02-26 15:42:22 +08:00
chenyu
731d14e718 hotfix bump testmetal2 timeout-minutes to 20 (#9235)
setup is taking too long
2025-02-24 20:23:56 -05:00
qazal
cbfe95d306 bring cast before view back (#9230)
* bring cast before view back

* tune it to only trigger on expands

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-25 01:50:39 +02:00
geohotstan
f0b24d230c add test_onnx_ops.py (#8569)
* boom

* fix webgpu

* use exact variable names in test so that AI can read easier

* add tag for specific test name like test a specific dtype

* fix ruff

* astype everything

* dtype in array creation

* just arange

* is 67% considered fixed?

* move test up

* small cleanups

* share function

* add qgemm as well

* add qgemm too

* make sure qgemm comes out as int

* take out qgemm for now

* fixed test

* add correct qgemm

* addressing feedback here too, early naive fix for now

* simplify bias and c to be minimalistic enough to test correctness

* refactored qlinearops

* maybe these asserts aren't the best..

* fix test

* updated tests to cover new ops

* try to add to CI

* move test_onnx_ops into testextra/

* more attention tests

* qlinear_add atol=1

* attention still not fullllllly correct

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-24 16:15:22 -05:00
George Hotz
fd731e740a hotfix: add note on backend2.py 2025-02-24 11:23:03 +08:00
chenyu
e0adb1fc76 really run test_ops with TINY_BACKEND in ci (#9206)
was failing with `line 1: pytest: command not found`
2025-02-22 15:51:24 -05:00
George Hotz
97bc723538 torch backend works for ResNet-18 (#9200)
* torch backend progress, a few more functions

* resnet works

* pillow

* tv
2025-02-22 22:16:23 +08:00
George Hotz
f92820d30d torch backend tests (#9198)
* torch backend tests

* pythonpath

* install ninja
2025-02-22 16:01:49 +08:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
chenyu
3e22747799 run unit test on windows ci (#9187)
* factor out testing_minimal in setup.py [pr]

* testing_unit + windows
2025-02-20 14:40:41 -05:00
qazal
574a905291 Fix running VIZ=1 after package installation + test (#9183)
* test running viz from pip install

* add pkg

* do 10 connection attempts

* include assets in package_data

* quiet curl

* better print
2025-02-20 15:02:00 +01:00
Ahmed Harmouche
0f94b98646 Force WebGPU backend type [pr] (#9164)
* Force webgpu backend type

* Mypy fix

* Rename to WEBGPU_BACKEND

* Add it to env_vars docs

* Remove link
2025-02-19 17:19:39 +08:00
George Hotz
af9d8d39d2 dsp matchers + bump line count to 11300 (#9130) 2025-02-17 17:31:54 +08:00
Ahmed Harmouche
59fe45f947 Solve get_grouped_dims does not split issue (#9085)
* Solve dims too large errors on webgpu

* Simplify divisor find

* Test square root divisor

* Fix lint

* Refactor into group_dims and split_dims

* Refactor

* Fix lint

* Add back max check in _group_dims

* Prefer grouping over split

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-16 19:57:29 -05:00
George Hotz
7e09057afa fixup clang devectorize (#9099)
* fixup clang devectorize

* __builtin_convertvector is some casts

* dsp fixups
2025-02-15 09:29:47 +08:00
JaSpa99
d2ff55e9c6 OSX GPUOcelot (#8209)
* add patches

* add osx test in ci

* macos specific uvm, gpfifo mask

* only do that for now

* Revert "add patches"

This reverts commit 80d3112a57.

* use fork for now

* workflow only one worker

* merge osxtests with tests

* Revert "merge osxtests with tests"

This reverts commit 3461c8f46c.

* macos pagesize 16384

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-13 12:24:29 +08:00
rmtew
b3eab03055 Three things to get Windows CI working correctly: (#9047)
- Ensure that the set backend environment variable is persisted to the next step via $GITHUB_ENV
- It doesn't actually persist for Windows unless shell is explicitly set to bash.
- Add the assertion to ensure the selected backend is actually used.
2025-02-12 14:41:00 -05:00
Ahmed Harmouche
916d5e7f08 WebGPU f16 support (f16 bounty part 2) (#8653)
* WebGPU f16 support

* Don't enable f16 yet

* dtype tests passing after bitcast fix

* Maybe all WebGPU green?

* Require shader-f16 in examples

* Minor wgsl touchup

* 1 line shorter

* Simpler

* Add transcendetal support

* log2 nan location mismatch on Vulkan

* Nan skips
2025-02-12 19:46:53 +08:00
George Hotz
45aae8a6bc hotfix: add External Benchmark Schedule to CI 2025-02-11 22:06:17 +08:00
chenyu
6c39aa4a6b adjust cuda ci test targets (#9014) 2025-02-10 15:29:59 -05:00
qazal
b17ec42b56 remove const_arg (#9002)
* remove const_arg

* use -m pytest

* remove test_const_arg test, variable arg on CONST does not exist.

* use base in test_const_dtype
2025-02-10 12:45:11 +01:00
George Hotz
0568720a68 delete revectorize (#9000)
* delete revectorize

* test vectorized LLVM/CLANG

* idk about that

* was that the segfault?
2025-02-10 18:32:35 +08:00
George Hotz
2983285315 use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] (#8993)
* use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr]

* add quantize test to dsp

* fix tests

* older onnx

* debug, let's see what's happening
2025-02-10 11:07:35 +08:00
George Hotz
208097d488 try reducing testing deps [pr] (#8976)
* reduce testing deps

* break out test models

* add PR to models, add models to metal

* okay, not that

* mac cleanup

* mac typo

* other typo
2025-02-09 15:22:32 +08:00
George Hotz
5bdd6a1cc4 increase CI speed with more runners [pr] (#8961)
* increase CI speed with more runners [pr]

* splits + cleanups [pr]

* more runners

* need that dep

* split that too

* can't be minimal

* move test readme

* bugfix + naming

* one more split

* bump to 22.04
2025-02-08 09:04:36 +08:00
George Hotz
4de084a835 cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952)
* cleanup ci [pr]

* testing_minimal

* add hypothesis to minimal

* fail tiktoken import okay

* add LLVM speed test

* llvm speed w/o beam
2025-02-07 19:01:59 +08:00
George Hotz
9ed2d0dfa2 refactor into subactions (#8946)
* refactor into subactions

* this work?

* add shell

* move install opencl

* valid?

* support mac os x

* refactor other osx

* fix linux/osx

* fixes

* cleanups

* used everywhere

* no quotes

* quotes on true

* bugfixes

* this run?

* hardcode

* that

* process replay action

* fix checkout

* restore to branch

* fix caching

* fix osx python cache

* does replace function exist

* Revert "does replace function exist"

This reverts commit 622177c5a0.

* Revert "fix osx python cache"

This reverts commit e70d55cd93.

* user on osx to fix untar issue

* that
2025-02-07 18:06:44 +08:00
Ahmed Harmouche
133cacadde Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646)
* Switch to dawn, all tests passing locally

* Use dawn-python

* Skip failing test

* Skip midcast and fix timestamp on metal ci

* Autogen webgpu

* Try fetch dawn lib again

* /usr/lib

* Without lib prefix

* Test autogen diff

* Delete webgpu support, move everything to ops_webgpu

* mypy fix

* Simplify, refactor

* Line savings

* No ResultContainer

* Type annotation for result

* Some more simplifications

* Why was this explicit sync used at all?

* Refactor: delete functions that are only used once

* Create shader module inline

* Clear unit tests cache, maybe that solves it

* That wasn't it

* Try deleting cache to pass failing weight compare

* weights_only=False for pytorch 2.6

* Simplify ctype array creation

* Remove nanosecond precision timestamps

* Simplify error handling

* Refactor, add back type annotations

* Deleted custom submit function, refactor

* read_buffer simplify

* Fix use after free, refactor

* Simplify supported_features

* Runtime docs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 15:16:59 +08:00
George Hotz
dbda72f91d hotfix: raise line limit to 11200 for new webgpu backend 2025-02-07 14:29:20 +08:00
George Hotz
b1e1319972 ci speed on the enterprise plan [pr] (#8942) 2025-02-07 11:18:12 +08:00
uuuvn
a51c688f39 Cleanup llvm cleanup (and some clang things too) (#8871)
* Cleanup llvm cleanup (and some clang things too)

* Tests

* Tests 2

* forgot mockgpu

* more print some sources
2025-02-05 07:49:05 +08:00
George Hotz
56fa5c1191 dsp simulator (#8869)
* dsp simulator

* progress

* fix

* close on test tiny

* working

* less waste

* line savings

* Device DSP compiler

* mock DSP at the bottom

* DSP tests

* docker caching

* test update

* need load

* skip that test for CI DSP

* last touch

* ugh
2025-02-04 09:45:04 +08:00
uuuvn
6dadb60c93 LLVM JIT (+autogen llvm instead of llvmlite) (#8486)
* LLVM JIT

* Autogen LLVM

* Update autogen

* Move things around

* even more non-determinism

* windows

* more autogen weirdness

* more windows stuff

* blind windows development try 2

* more blind windows development

* even more blind windows development

* maybe i should just set up a windows vm...

* why can't everyone just use sysv abi?

* cleanup debugging stuff

* unused import

* icache flushing isn't required on x86

* merge jit_nt and jit_unix

* more

* Temporary hack to not segfault

* better error

* bad conflict resolution

* Attempt to simplify support/llvm.py

* More refactoring

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-02 19:52:42 +08:00
chenyu
7f606fbde4 remove DEBUG=5 in windows ci test [pr] (#8803)
DEBUG=5 prints a lot of info that's slow, and is not visible if test passed on CI.
also skip two tests that took 3 minutes in python backend
2025-01-29 14:18:17 -05:00
FICTURE7
ec120ce6b9 Fix allocator memory alignment (#8800)
* Fix allocator memory alignment

* Run `test_ops.py` using LLVM and CLANG on Windows
2025-01-29 21:03:17 +03:00
b1tg
da464d039f fix windows ci cache (#8787)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-01-28 13:22:15 +02:00
b1tg
5d62aa28dc Support CLANG backend on Windows (#8768)
* Support CLANG on Windows

* Put both backends in a windows ci

* remove coff loader

* use memmove

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-28 18:19:34 +09:00
b1tg
efc7971090 add windows test to ci (#8761)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-01-27 14:53:21 +09:00
George Hotz
1b4618e257 gradient cleanup (#8750)
* switch backward to use gradient [pr]

* set device correctly, dedup

* why does that fail?

* add noop cast

* simple backward

* fix beautiful_mnist

* touchups

* set in compute_gradient

* uop_count

* uop_count was wrong

* collections

* no note

* skip that test

* update sched kernel counts

* train mnist is 65

* fix metadata and gc

* fixes

* materialize_grads

* no pathlib stuff

* add contiguous_backward, fix bugs

* add some realize

* fix multi

* remove unused backward passes [pr]

* lower line count
2025-01-26 09:30:55 +09:00
George Hotz
e82ba1454b MultiLazyBuffer is UOp [pr] (#8662)
* MultiLazyBuffer is UOp [pr]

* this is new mlb

* this is the idea

* progress

* multitensor works

* more movement ops

* this

* MultiLazyBuffer is UOp

* cleanups

* multi axis

* fix more tests

* work

* not that

* add multi grad and move shard to ops

* mops not views

* no double contig

* sweet, all mt tests passing

* port old logic

* remove lbs

* fix realized

* whitespace

* assign tweak

* test_assign_kv_cache_multi passes

* fix is_realized

* fix JIT for multi

* just a few more lines i'll pay them back soon i swear please bro just a few more

* no split reduceop for multi
2025-01-24 13:28:55 +09:00
George Hotz
46a8c5e1e5 delete forced_realize (#8615)
* delete forced_realize

* put that back

* expectedFailures

* cleaner create_subbuffer

* more comments

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-01-20 09:40:36 -08:00
nimlgen
9d3c40601f am: fast memory manager (#8654)
* start

* progress

* fixes

* smth

* mini fixes

* fix2

* ugh, need this for now

* faster

* cleanups

* tiny linters

* make mypy happier

* test & free pts

* ops

* linter

* cleanup vm

* fix

* remove map_from

* tiny fixes

* add test to ci
2025-01-20 16:58:22 +03:00
ignaciosica
d2234e308a tf32 tc for nv and ptx (#8635)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-17 17:43:57 -08:00
George Hotz
bfbe81df71 remove cast before view (#8613)
* remove cast before view

* greener

* indexing

* that passes too

* openpilot too

* ack

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-01-14 15:04:58 -05:00
ignaciosica
d5a646d492 CUDA Turing TC (#8597)
* init turing tc

* reorder tc

* hotfix: remove some spaces

* revert var name to x

* consistent order of factors

* revert order of terms to match old stuff

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-14 10:35:14 -08:00
qazal
2f71a00236 remove PYTHONPATH=. from mypy ci [pr] (#8578) 2025-01-12 09:52:03 -08:00
qazal
98c9e23560 remove global PYTHONPATH setting in CI (test.yml) [pr] (#8568)
* remove global PYTHONPATH setting in CI [pr]

* only run mypy in tinygrad/

* still needed for benchmarks
2025-01-11 12:47:50 -05:00
qazal
60503c8621 use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564) 2025-01-11 06:03:48 -05:00