Commit Graph

903 Commits

Author SHA1 Message Date
nimlgen
0a139b1436 amd iface abstraction (#8413)
* start on amd iface

* t

* unused import

* fixes

* internal api
2024-12-27 15:53:53 +03:00
nimlgen
90f1f0c9d5 eh (#8309)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-26 13:16:34 -05:00
nimlgen
a562ee2c6e BumpAllocator rename start -> base (#8415) 2024-12-25 23:12:55 +03:00
nimlgen
9ed064710a hcq remove old profiler lines (#8414) 2024-12-25 23:12:28 +03:00
chenyu
3f46425f1e typos found by gemini [pr] (#8400)
not very effective... maybe due to tokenizer
2024-12-24 22:32:25 -05:00
nimlgen
a647f3dd2c move mockgpu to tests [pr] (#8396)
* move mockgpu to tests

* linter

* i'm so sorry

* sorry, python

* path
2024-12-24 23:48:02 +03:00
chenyu
7ea633f94f remove from __future__ import annotations from runtimes [pr] (#8373)
it's not needed if we move the Device before Program and Allocator, which need Device.

not updating hcq because it has a lot more stuff, and CLDevice requires CLDevice
2024-12-21 23:46:07 -05:00
chenyu
1ce9851ba6 import and type cleanups [pr] (#8359)
Dict and DefaultDict and some imports
2024-12-20 21:52:02 -05:00
chenyu
e63c7818dc few type cleanups [pr] (#8347) 2024-12-20 01:56:01 -05:00
George Hotz
82833f1b3c a little more typing [pr] (#8346)
* a little more typing [pr]

* few more
2024-12-19 22:09:52 -08:00
George Hotz
62e5d96446 more typing work [pr] (#8345) 2024-12-19 21:46:35 -08:00
George Hotz
9c77e9f9b7 replace Tuple with tuple [pr] (#8344)
* replace Tuple with tuple [pr]

* replace List with list [pr]

* replace Dict with dict [pr]

* replace Set with set [pr]
2024-12-19 21:27:56 -08:00
George Hotz
adcdc583a2 small cleanups [pr] (#8343)
* small cleanups [pr]

* GPU suppress
2024-12-19 21:20:46 -08:00
George Hotz
3a9ca62b9e get_single_element [pr] (#8328) 2024-12-18 22:23:45 -08:00
nimlgen
777d2aec05 metal profiler + cpu_profile (#8291)
* metal + cpu_profile

* gpt example

* linter + revert gpt2 for now

* a bit of readme

* linter

* unrelated

* tests

* linter

* b
2024-12-18 00:06:56 +03:00
nimlgen
af87e4b53c viz profiler (#8287)
* only hcq

* fix get_metadata

* linter

* oops

* tiny

* linter

* time

* print pm

* hmm

* nits
2024-12-17 20:00:53 +03:00
George Hotz
cda34ccadf hotfix: time.time -> time.perf_counter 2024-12-16 11:32:49 -08:00
nimlgen
a2a4ff30dc hcq better timout haandling (#8269) 2024-12-16 13:44:55 +03:00
chenyu
f05fd118a2 few minor code cleanups [pr] (#8267) 2024-12-15 23:44:51 -05:00
chenyu
2e4c7d4cfb add "tinygrad" to be part of cache_dir [pr] (#8188)
instead of having sqlite / http download / metal compile to add "tinygrad" separately. also make it non-private since it's used in metal
2024-12-12 12:09:44 -05:00
nimlgen
bf7d1fcd2c tiny import fixes in hcq graph (#8184) 2024-12-12 16:30:06 +03:00
Ahmed Harmouche
2f2b1e792c wgsl and ops_webgpu simplifications [pr] (#8182)
Simplify wgsl and ops_webgpu
2024-12-12 14:21:58 +01:00
Ahmed Harmouche
1b94cc095a Bump back wgpu to latest (#8179) 2024-12-12 09:40:52 +01:00
chenyu
aaa3cc235d unused from __future__ import annotations (#8171) 2024-12-11 19:05:04 -05:00
George Hotz
8f4299fcc8 hotfix: suppress shutdown errors in CLProgram 2024-12-11 08:08:32 -08:00
nimlgen
3a7d64b96c hcq remove update from args state (#8104)
* hcq remove update from args state

fix amd

ugh

qcom?

qcom ops

ops

qcom fix

qcom texture info

fx

qcom fix

qcom

qcom, sry

minor

works

* remove old code

* unrelated+sint

* qcom

* typing

* rm comments
2024-12-08 15:22:05 +03:00
nimlgen
d6e66095fd hcq buffer is a class (#8106)
* hcq buffer is a class

* qcom

* no from_mv in qcom

* remove qcombuffer

* useless cast

* mypy

* qcom fix

* _md -> meta
2024-12-08 13:29:43 +03:00
nimlgen
8b1fa9cb7d nv hcq queue touchups (#8102) 2024-12-07 14:09:38 +03:00
nimlgen
e180a31c5e tiny metal cleanup (#8089)
* tiny metal cleanup

* cast

* sry
2024-12-06 21:44:32 +03:00
nimlgen
d1282da7e8 hcq bump alloc (#8078)
* hcq bump alloc

* hm

* nv

* typo
2024-12-06 19:19:04 +03:00
nimlgen
c0240855b9 qcom has not transfer (#8075)
* qcom alloc is not hcq alloc

* maybe base?

* test
2024-12-06 14:45:01 +03:00
JaSpa99
3c5d5f9414 mypy==1.13.0 (#7990)
* explicit instantiation and narrowing asserts

* explicit cast

* bump

* one line assert

* handle case for no copy_queue_t

* Revert "handle case for no copy_queue_t"

This reverts commit 38347806ca.

* more readable control flow

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-06 12:09:14 +08:00
nimlgen
78c01a5c2b amd general _gpu_alloc (#8056)
* amd general _gpu_alloc

* hmm

* ops
2024-12-05 15:50:23 +03:00
nimlgen
8071600897 nv one _gpu_alloc (#8055) 2024-12-05 15:22:03 +03:00
uuuvn
e9c5b23ba1 Use MTLCompiler directly (v2) (#7920)
* Use MTLCompiler directly (v2)

* to_block_literal and REQUEST_TYPE_COMPILE

* Rewrite command encoding

* Revert to_block_literal

* Maybe that's more readable to some people?

* Typo and comment about stdlib caching

* Update ops_metal.py

* Update ops_metal.py

* Update ops_metal.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-04 16:36:48 +08:00
nimlgen
7fda464b08 hcq c-like args state (#8020)
* hcq c-like args state

* ugh

* Dfix

* rename

* i
2024-12-03 23:53:35 +03:00
George Hotz
32675a8a77 sacrifice ClangGraph on the altar of lines [pr] (#8009) 2024-12-03 21:11:15 +08:00
Ahmed Harmouche
146e1caea3 Downgrade wgpu to prevent sd segfault (#7969) 2024-12-02 15:48:44 +01:00
wozeparrot
077e7e8ed2 fix: private segment sgpr on gfx103x (#7987)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-02 20:54:50 +08:00
nimlgen
10f431b96d hcq replace update with sint (#7899)
* try sym hcq

* start with amd

* move to nv

* nv works

* cache and qcom

* fixes

* signals

* fix nv

* qcom fixes

* linter

* linter

* cache + typings

* fixes

* tiny fixes

* linter

* linter

* lntr

* ugh

* comments
2024-11-29 20:08:13 +03:00
nimlgen
d3660ccc51 prereqs for hcq updates removal (#7959)
* hcq signals touch ups

* hcq compiled has device id

* helpers

* prreq hcq api

* oops
2024-11-29 18:20:07 +03:00
nimlgen
309dcb1044 hcq signal add sleep (#7955)
* hcqsignal sleep

* fixes

* typing

* time ms is int
2024-11-29 14:04:45 +03:00
nimlgen
81d415be03 amd pkt3 refactor (#7923)
* amd pkt3 refactor

* replace this

* linter

* fix

* cmt

* fast

* simpler

* linter

* smth

* missing
2024-11-28 11:06:37 +03:00
JaSpa99
38f34ca0cb prepare mypy==1.13.0: legacy cast (#7866)
* use helper to narrow literal type

* narrow with asserts instead of cast

* remove parantheses

* tensor.item() calls tensor.data()

* no copy

* proper indexing
2024-11-27 10:33:35 -05:00
nimlgen
84f96e48a1 hcq signal tiny refactor (#7913)
* hcq signal tiny refactor

* no mv

* fix

* fix2

* fix3
2024-11-26 21:48:38 +03:00
Ahmed Harmouche
10618aba98 Bring back WebGPU (#7063)
* Start from andredaprato:webgpu-clean

* Fix infs

* inf wgsl function is not needed

* Emulated ulong for threefry, more tests passing

* Randomness tests passing

* Update model export to support new changes in webgpu, efficientnet export works again

* Simplify shift emulation in wgsl

* Delete test file

* Fix bigger than u32 u32 literal

* Why was skip copies added here?

* Python3.12 for webgpu tests

* Fix model export syntax error

* Get test ops passing with some skips

* Fix lint

* Much simpler shift

* Run more tests

* Timestamp queries are not supported in CI, so skip search tests

* All fancy indexing passing

* r is ctx

* Run more dtype tests by using is_dtype_supported

* Cleanup ulong shift rendering

* UPat -> Pat, UOps -> Ops

* Pat -> UPat

* Refactor render_ushift if-else

* Pattern to avoid ulong mul

* Remove vals_dtype

* is_nan trick + rewrite, test_isnan passing

* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)

* No arg, just op

* Support char, uchar, short, ushort

* Run test_index_mnis now that we have uint8

* Fix pyling

* Save 3 lines by using base Compiler

* No more long emulation

* Remove fixup_binops

* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm

* Simpler, faster copyin/out

* Skip some new tests that use long

* Fix typo

* copyout touchup

* Save lines by using render_cast

* WebGL is not supported in core, delete it from is_dtype_supported

* More narrow test skips for some unary tests

* TernaryOps, UnaryOps -> Ops

* TinyGrad supports WebGPU

* StableDiffusion demo: f16tof32 gpu is a lib, update UI

* Packed load/store, no more scale_size, no core tinygrad changes

* Rename copyin, copyout

* Device -> dev

* Fix lint

* Pattern matcher rule for packed load/store

* Refactor

* Shorter packed load/store

* this should fix lint

* Fix mypy

* SD compile script working

* New SD webgpu UI

* New default prompt

* New SD weights

* Fix title when webgpu not available

* Run symbolic tests, simplify is_nan, use round_up

* Show step time on UI

* Bump minimum wgpu version to v0.19

* Fix latent

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-26 12:26:40 +08:00
chenyu
04bee97d2a hotfix ctypes.c_ulong(size) for metal _alloc (#7902)
fix `Tensor.ones(1000, 1000, 1000).contiguous().realize()` on METAL
2024-11-25 18:25:33 -05:00
George Hotz
1d6d842887 move DSP to extra (room for webgpu) [pr] (#7836) 2024-11-22 11:32:57 +08:00
George Hotz
6fc7013463 put all DSP in dsp file [pr] (#7833) 2024-11-22 11:22:59 +08:00
George Hotz
e39af63156 no loop assert in ops_python [pr] (#7834) 2024-11-22 11:17:36 +08:00