Commit Graph

39 Commits

Author SHA1 Message Date
George Hotz
67a1c92fc0 remove del spam from CI (#10699)
* remove del spam from CI

* more

* preconstruct default buffer spec

* ignore those errors

* check exception

* more exception check

* skip stuff
2025-06-08 10:14:30 -07:00
uuuvn
7bc4864bc4 Make dev a property of Allocator (#10286)
* Make `dev` a property of `Allocator`

(this is a prereq refactor for #10285)

At least `BufferXfer.copy` accesses it assuming it's always present,
currently most devices just add this property on their own repeating
the same code over and over again.

This is also a bit footguny, see `RemoteAllocator` that named this
property `device` instead of `dev`, i could obviously just change that
in one place but doing it globally seems like a better solution (and it
reduces code duplication too).

`MallocAllocator` is a bit special, but passing `None` works just fine.

* typing

* ignore type instead of cast
2025-05-13 17:01:01 -07:00
hooved
5d85765327 types for WebGPU runtime (#9791)
* add type annotations to ops_webgpu

* rerun CI

* add types to some _run params
2025-04-08 22:52:17 +03:00
Ahmed Harmouche
7ce7fe0574 Refactor webgpu_dawn lib finding (#9547)
* Refactor webgpu_dawn lib finding

* Fix ruff
2025-03-23 08:23:29 -04:00
Ahmed Harmouche
0f94b98646 Force WebGPU backend type [pr] (#9164)
* Force webgpu backend type

* Mypy fix

* Rename to WEBGPU_BACKEND

* Add it to env_vars docs

* Remove link
2025-02-19 17:19:39 +08:00
Ahmed Harmouche
84dc331dd1 Refactor async (#9126) 2025-02-16 17:47:15 -05:00
Ahmed Harmouche
2dc8f1867c Synchronize webgpu (#9093) 2025-02-15 00:52:10 +03:00
Ahmed Harmouche
e83905696e Show install instructions when dawn library is missing (#9059)
* Show install instructions when dawn library is missing

* Handle missing dawn in ops_webgpu

* Simplify

* Solve f-string backlash error
2025-02-14 00:30:20 +03:00
Ahmed Harmouche
916d5e7f08 WebGPU f16 support (f16 bounty part 2) (#8653)
* WebGPU f16 support

* Don't enable f16 yet

* dtype tests passing after bitcast fix

* Maybe all WebGPU green?

* Require shader-f16 in examples

* Minor wgsl touchup

* 1 line shorter

* Simpler

* Add transcendetal support

* log2 nan location mismatch on Vulkan

* Nan skips
2025-02-12 19:46:53 +08:00
Ahmed Harmouche
133cacadde Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646)
* Switch to dawn, all tests passing locally

* Use dawn-python

* Skip failing test

* Skip midcast and fix timestamp on metal ci

* Autogen webgpu

* Try fetch dawn lib again

* /usr/lib

* Without lib prefix

* Test autogen diff

* Delete webgpu support, move everything to ops_webgpu

* mypy fix

* Simplify, refactor

* Line savings

* No ResultContainer

* Type annotation for result

* Some more simplifications

* Why was this explicit sync used at all?

* Refactor: delete functions that are only used once

* Create shader module inline

* Clear unit tests cache, maybe that solves it

* That wasn't it

* Try deleting cache to pass failing weight compare

* weights_only=False for pytorch 2.6

* Simplify ctype array creation

* Remove nanosecond precision timestamps

* Simplify error handling

* Refactor, add back type annotations

* Deleted custom submit function, refactor

* read_buffer simplify

* Fix use after free, refactor

* Simplify supported_features

* Runtime docs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 15:16:59 +08:00
chenyu
f05fd118a2 few minor code cleanups [pr] (#8267) 2024-12-15 23:44:51 -05:00
Ahmed Harmouche
2f2b1e792c wgsl and ops_webgpu simplifications [pr] (#8182)
Simplify wgsl and ops_webgpu
2024-12-12 14:21:58 +01:00
Ahmed Harmouche
1b94cc095a Bump back wgpu to latest (#8179) 2024-12-12 09:40:52 +01:00
Ahmed Harmouche
146e1caea3 Downgrade wgpu to prevent sd segfault (#7969) 2024-12-02 15:48:44 +01:00
Ahmed Harmouche
10618aba98 Bring back WebGPU (#7063)
* Start from andredaprato:webgpu-clean

* Fix infs

* inf wgsl function is not needed

* Emulated ulong for threefry, more tests passing

* Randomness tests passing

* Update model export to support new changes in webgpu, efficientnet export works again

* Simplify shift emulation in wgsl

* Delete test file

* Fix bigger than u32 u32 literal

* Why was skip copies added here?

* Python3.12 for webgpu tests

* Fix model export syntax error

* Get test ops passing with some skips

* Fix lint

* Much simpler shift

* Run more tests

* Timestamp queries are not supported in CI, so skip search tests

* All fancy indexing passing

* r is ctx

* Run more dtype tests by using is_dtype_supported

* Cleanup ulong shift rendering

* UPat -> Pat, UOps -> Ops

* Pat -> UPat

* Refactor render_ushift if-else

* Pattern to avoid ulong mul

* Remove vals_dtype

* is_nan trick + rewrite, test_isnan passing

* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)

* No arg, just op

* Support char, uchar, short, ushort

* Run test_index_mnis now that we have uint8

* Fix pyling

* Save 3 lines by using base Compiler

* No more long emulation

* Remove fixup_binops

* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm

* Simpler, faster copyin/out

* Skip some new tests that use long

* Fix typo

* copyout touchup

* Save lines by using render_cast

* WebGL is not supported in core, delete it from is_dtype_supported

* More narrow test skips for some unary tests

* TernaryOps, UnaryOps -> Ops

* TinyGrad supports WebGPU

* StableDiffusion demo: f16tof32 gpu is a lib, update UI

* Packed load/store, no more scale_size, no core tinygrad changes

* Rename copyin, copyout

* Device -> dev

* Fix lint

* Pattern matcher rule for packed load/store

* Refactor

* Shorter packed load/store

* this should fix lint

* Fix mypy

* SD compile script working

* New SD webgpu UI

* New default prompt

* New SD weights

* Fix title when webgpu not available

* Run symbolic tests, simplify is_nan, use round_up

* Show step time on UI

* Bump minimum wgpu version to v0.19

* Fix latent

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-26 12:26:40 +08:00
George Hotz
8cbcd1b342 Remove webgpu, back to 5k lines (#3040)
* remove webgpu

* max 5000 lines
2024-01-08 09:10:07 -08:00
George Hotz
6d6eb9302d ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00
Ahmed Harmouche
4b01839774 support vals on WebGPU, run more tests (#2668)
* Vals on webgpu, run more tests

* Skip slow tests, run symbolic ops tests

* Balance out tests
2023-12-07 16:45:21 -08:00
George Hotz
664475f247 vals is an argument (#2599)
* vals is an argument

* don't even know how that's legal python
2023-12-03 21:50:43 -08:00
George Hotz
171543fc8d cleanups to save lines and files (#2577)
* runtime/graph -> features/graph

* put all the cstyle renderers in cstyle

* same line for those

* how did that pass mypy
2023-12-02 16:29:56 -08:00
George Hotz
d6b404ac11 No dtype alloc (#2570)
* fix all allocs

* improve docs

* ugh fix fake alloc
2023-12-02 13:29:40 -08:00
George Hotz
5068e99d18 refactor to remove extra kernel params (#2563)
* refactor to have compiled kernel

* bugfixes

* docs/beautiful.py

* revert that

* fix tests
2023-12-02 00:32:25 -08:00
George Hotz
2c363b5f0b new style device (#2530)
* cpu tests pass

* torch works

* works

* metal works

* fix ops_disk

* metal jit works

* fix openpilot

* llvm and clang work

* fix webgpu

* docs are rly broken

* LRU works on metal

* delete comment

* revert name to ._buf. LRU only on Compiled

* changes

* allocator

* allocator, getting closer

* lru alloc

* LRUAllocator

* all pass

* metal

* cuda

* test examples

* linearizer

* test fixes

* fix custom + clean realize

* fix hip

* skip tests

* fix tests

* fix size=0

* fix MOCKHIP

* fix thneed

* copy better

* simple

* old style metal copy

* fix thneed

* np reshape

* give cuda a device
2023-11-30 17:07:16 -08:00
George Hotz
6707f2588e use copyin (#2500)
* it's always copyin

* all RawBuffer are RawBufferCopyIn

* cleanups

* this fixes it

* requirements='C'

* more correct
2023-11-29 09:34:00 -08:00
George Hotz
756b01f46f why were these ever called buffer (#2483) 2023-11-27 21:02:07 -08:00
George Hotz
9e07824542 move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
chenyu
163b2bc26a wgpu.utils._device -> wgpu.utils.device (#2330)
* wgpu.utils._device -> wgpu.utils.device

* can i do this?

* no need to specify metal
2023-11-16 12:52:13 -05:00
valar
123ea051e6 refactor/ci: delete many # type: ignore (#2281)
* refactor/ci: delete many `# type: ignore`

* replace `axis.__class__ is int` with `isinstance(axis, int)` to make mypy happy
* add `--warn-unused-ignores` to mypy flag

refs #2240

* ci: move `--warn-unused-ignores` flag to mypy config

refs #2240
2023-11-12 11:04:20 -08:00
Ahmed Harmouche
265304e7fd Stable diffusion WebGPU port (#1370)
* WIP: Stable diffusion WebGPU port

* Load whole model: split safetensor to avoid Chrome allocation limit

* Gitignore .DS_Store, remove debug print

* Clip tokenizer in JS

* WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS

* e2e stable diffusion flow

* Create initial random latent tensor in JS

* SD working e2e

* Log if some weights were not loaded properly

* Remove latent_tensor.npy used for debugging

* Cleanup, remove useless logs

* Improve UI

* Add progress bar

* Remove .npy files used for debugging

* Add clip tokenizer as external dependency

* Remove alphas_cumprod.js and load it from safetensors

* Refactor

* Simplify a lot

* Dedup base when limiting elementwise merge (webgpu)

* Add return type to safe_load_metadata

* Do not allow run when webgpu is not supported

* Add progress bar, refactor, fix special names

* Add option to chose from local vs huggingface weights

* lowercase tinygrad :)

* fp16 model dl, decompression client side

* Cache f16 model in browser, better progress

* Cache miss recovery

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-03 18:29:16 -07:00
George Hotz
f17bc16f46 simple runtime args (#2211)
* simple runtime args

* fix some tests

* fix abstractions and triton

* fix search
2023-11-03 12:31:29 -07:00
George Hotz
03cf0afa4f move all to compile api (#2203)
* move metal+clang to compile api

* all to the new style

* remove binary arg

* fix triton

* fixup tests

* fix clang

* diskcache is generic

* __wrapped__

* compile_gpu

* fix thneed

* keep the src in the ASTRunner

* lib

* move compile_gpu

* compile_gpu in device

* put compiler in astrunner

* test reverts

* triton compiler

* ugh, that too
2023-11-01 23:01:32 -07:00
George Hotz
453e437598 move stuff in the linearizer (#1726)
* move stuff in linearizer

* move stuff in linearizer

* minor

* fix opts import
2023-08-31 14:42:09 -07:00
nimlgen
bd111411bf init allocator for compiled backends (#1467)
* init allocator for compiled backends

* Update ops_webgpu.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-08-17 10:33:32 -07:00
Diogo
d17ecccd78 Torch/LLVM/arm F64 support (#1551) 2023-08-15 21:21:08 -04:00
George Hotz
f8109b830c promote assembly to the main codebase (#1544)
* promote assembly to the main codebase

* not namedtuple
2023-08-14 22:47:45 -07:00
Diogo
e2af95c2f8 moved global_max and local_max to LinearizerOptions also added assert for max bufs (#1446) 2023-08-05 18:23:18 -07:00
George Hotz
84c430355e fix backends for new style (#1443)
* fix backends for new style

* fix method cache

* fix fakeless

* llvm blacklist

* fix kernel optimizer
2023-08-05 11:07:04 -07:00
George Hotz
ab3d281a6e Refactor MemOps (#1256)
* metal tests pass locally

* define global

* refactor DEFINE_GLOBAL

* move assembly out. it isn't tested

* fix llvm
2023-07-17 16:36:33 -07:00
Diogo
a9a1df785f Webgpu support (#1077)
* initial commit

* 81 passing

* 105 passing tests

* 148 passing

* CI tests

* install dep on ci

* try opencl pkgs

* try using vulkan

* down to only 6 failing

* refactor

* cleaning up

* another test skipped due to buffer limit

* linter

* segfault

* indent fix

* another segfault found

* small touchups

* Fix max and maxpool tests

* Add constant folding

* Add javascript export script

* better asserts in codegen

* manual upcasting

* reverted token type change

* skip safetensor test due to unsupported type

* FIx efficientnet and all other model tests

* Remove np copy

* fixed indent and missing import

* manually destroy the buffer

* revert back to length

* linter errors

* removed extra val

* skip broken tests

* skipping more tests

* Make the page pretty

* Save model weights as safetensor

* Fix imagenet to c test

* Fix second imagenet to c bug

* Async and paralel kernel compilation

* workgroup support

* reversed local size

* fixed non local bug

* correct local groups

* ci experiment

* removed typo

* Fix define local by using shared memory

* Refactor

* try running on mac

* match metal tests

* add more workers

* scope down tests

* trying windows runner

* fixed windows env

* see how many it can do

* merged master

* refactor

* missed refactor

* increase test suite coverage

* missing import

* whitespace in test_efficientnet.py

* getting there

* fixed reset

* fixed bufs

* switched to cstyle

* cleanup

* min/max rename

* one more linter issue

* fixed demo

* linter

* testing ci chrome

* add unsafe webgpu arg

* add build step

* remove WEBGPU from cmd line

* use module

* try forcing directx

* trying forced metal backend

* temp disable conv2d for CI

* disable conv_trasnpose2d

---------

Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-12 12:52:06 -07:00