Commit Graph

152 Commits

Author SHA1 Message Date
nimlgen
fb96394ff5 auto-select available compilers (#12094)
* device: auto select compilers

* fix

* metal+opencl

* nv/cuda

* test without ptx

* ptx

* fix tests

* fix

* fix test

* rename

* test + cleaner

* xx

* ops

* better test

* win?

* um?

* types

* debug

* win??

* sep rung

* wtf?

* debug

* skip win

* revert this

* types
2025-09-10 19:52:01 +03:00
nimlgen
8a7be0a747 metal: workaround for transfers sync issue (#11622)
* metal: workaround for transfers sync issue

* metal tracsfer sync is broken

* hm

* rm it?

* keep it
2025-08-12 16:16:34 +03:00
nimlgen
a5371f514b cpu: copies in profile (#11392)
* cpu: copies in profile

* fix

* rename to tiny?
2025-07-27 20:56:27 +03:00
qazal
3466a220de viz: disassembly viewer (#11393)
* test

* CPU=1 disasm works

* METAL=1 disasm works

* fix that

* work

* can unwrap

* work p2

* don't crash
2025-07-27 18:44:28 +03:00
chenyu
54924f9969 type remove Union and Optional [pr] (#11283)
use `|` for consistency
2025-07-19 14:05:52 -04:00
qazal
bde80c0cdf record GraphEvents in metal graph (#11145)
* record GraphEvents in metal graph

* add TestProfiler.test_graph, revert old stuff

* move profile capture to MetalGraph

* comment

* don't double record graph command buffers

* wait_check

* explicit delete
2025-07-10 21:32:06 +03:00
Pyry Kovanen
32117402dd metal: fix incorrect _free on interpreter exit (#11158) 2025-07-10 14:01:30 +03:00
qazal
3dfc0ff887 move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126)
* move cpu_profile and shared ProfileEvents to helpers [pr]

* TestProfiler.test_cpu_profile

* update test_viz.py

* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
simone-pietro
58252e3c49 Change type hint for init_c_struct_t and to_struct [pr] (#10878)
* Change type hint for init_c_struct_t

* Change type hint for to_struct
2025-06-19 13:22:44 +03:00
George Hotz
4b1f1a47bb hotfix: allow ModuleNotFoundError in metal llvm import 2025-05-18 20:46:31 -07:00
uuuvn
7bc4864bc4 Make dev a property of Allocator (#10286)
* Make `dev` a property of `Allocator`

(this is a prereq refactor for #10285)

At least `BufferXfer.copy` accesses it assuming it's always present,
currently most devices just add this property on their own repeating
the same code over and over again.

This is also a bit footguny, see `RemoteAllocator` that named this
property `device` instead of `dev`, i could obviously just change that
in one place but doing it globally seems like a better solution (and it
reduces code duplication too).

`MallocAllocator` is a bit special, but passing `None` works just fine.

* typing

* ignore type instead of cast
2025-05-13 17:01:01 -07:00
uuuvn
82a6160ff7 Detect metal paravirtualization bug via device name instead of CI (#10225) 2025-05-08 19:31:47 -07:00
uuuvn
dba073e5c0 Less messy broken graph on paravirtualized metal workaround (#10182)
* Less messy broken graph on paravirtualized metal workaround

GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).

> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.

This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.

* unused import
2025-05-06 20:41:02 +03:00
George Hotz
5c7b549eab use functools.cache instead of lru_cache(None) [pr] (#9714)
* use functools.cache instead of lru_cache(None) [pr]

* more cache
2025-04-03 11:47:13 +08:00
nimlgen
56288243e6 metal PyTorch interop (#9229)
* add from_blob support to mps cuda

* objc_id

* metal pytorch interop

* fix comments

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-02-24 22:36:08 +03:00
nimlgen
f986e12f91 metal: choose compile spec based on macos (#9188)
* metal: choose compile spec based on macos

* correction
2025-02-21 00:43:39 +03:00
uuuvn
9b9c1e14da Late MTLCompiler load (#8963)
Moved loading MTLCompiler (and trying to load normal llvm before it)
to MetalCompiler, like in CPUProgram with helper
2025-02-08 17:29:23 +08:00
uuuvn
6090cbe3be Try to open llvm first when opening metal (#8949)
* Try to open llvm first when opening metal

* Use more specific FileNotFoundError
2025-02-07 18:58:37 +08:00
uuuvn
67b70e4f6c Fix incorrect __del__ (#8950)
CPython doesn't make any guarantees about order in which globals like
`msg` or `libobjc` are destroyed when the interpreter shuts down

https://github.com/tinygrad/tinygrad/pull/8949 triggered the
unlucky ordering which lead to a bunch of errors at exit

There is also a bunch of other places where similar problems exist
2025-02-07 18:21:44 +08:00
George Hotz
1249e8dd3b objc fast msg, try 2 [pr] (#8927) 2025-02-06 19:06:21 +08:00
George Hotz
1c53e8bf27 Revert "objc fast msg (#8922)" (#8926)
This reverts commit c3f99a727e.
2025-02-06 17:50:49 +08:00
George Hotz
c3f99a727e objc fast msg (#8922)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* new objc message style [pr]

* without sync

* no div 0

* lru cache that

* no sync in the profile

* fix

* update all to new style

* remove comment

* graph one kernel

* fix graph one kernel

* remove that sync
2025-02-06 17:49:06 +08:00
George Hotz
a8e54df363 benchmark single kernel launch (#8921)
* benchmark kernel launch

* don't realize unneeded

* faster

* faster metal

* fix mypy

* without sync

* no div 0

* lru cache that

* no sync in the profile
2025-02-06 13:35:34 +08:00
nimlgen
5afb0a4a81 metal: fix transfer profiling (#8659) 2025-01-17 23:47:01 +03:00
uuuvn
615d5276b1 Suppress 'X warnings generated.' in MTLCompiler (#8489)
'-fno-caret-diagnostics' is what clang-tidy uses when user passes --quiet
2025-01-04 10:22:37 -05:00
chenyu
7ea633f94f remove from __future__ import annotations from runtimes [pr] (#8373)
it's not needed if we move the Device before Program and Allocator, which need Device.

not updating hcq because it has a lot more stuff, and CLDevice requires CLDevice
2024-12-21 23:46:07 -05:00
George Hotz
9c77e9f9b7 replace Tuple with tuple [pr] (#8344)
* replace Tuple with tuple [pr]

* replace List with list [pr]

* replace Dict with dict [pr]

* replace Set with set [pr]
2024-12-19 21:27:56 -08:00
nimlgen
777d2aec05 metal profiler + cpu_profile (#8291)
* metal + cpu_profile

* gpt example

* linter + revert gpt2 for now

* a bit of readme

* linter

* unrelated

* tests

* linter

* b
2024-12-18 00:06:56 +03:00
chenyu
2e4c7d4cfb add "tinygrad" to be part of cache_dir [pr] (#8188)
instead of having sqlite / http download / metal compile to add "tinygrad" separately. also make it non-private since it's used in metal
2024-12-12 12:09:44 -05:00
nimlgen
e180a31c5e tiny metal cleanup (#8089)
* tiny metal cleanup

* cast

* sry
2024-12-06 21:44:32 +03:00
uuuvn
e9c5b23ba1 Use MTLCompiler directly (v2) (#7920)
* Use MTLCompiler directly (v2)

* to_block_literal and REQUEST_TYPE_COMPILE

* Rewrite command encoding

* Revert to_block_literal

* Maybe that's more readable to some people?

* Typo and comment about stdlib caching

* Update ops_metal.py

* Update ops_metal.py

* Update ops_metal.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-04 16:36:48 +08:00
chenyu
04bee97d2a hotfix ctypes.c_ulong(size) for metal _alloc (#7902)
fix `Tensor.ones(1000, 1000, 1000).contiguous().realize()` on METAL
2024-11-25 18:25:33 -05:00
George Hotz
eb0bb7dc0b final dname to device [pr] (#7806)
* final dname to device [pr]

* oops, fix nv
2024-11-20 20:20:28 +08:00
George Hotz
6688539bc9 rename device to dev so Buffer can be Allocator [pr] (#7799)
* rename device to dev to Buffer can be Allocator [pr]

* missed those

* update the Program classes also

* more renames

* oops
2024-11-20 15:47:26 +08:00
George Hotz
913a27ee27 from_buffer on metal was never called [pr] (#7791) 2024-11-20 00:35:17 +08:00
George Hotz
d71fe7faa5 rename allocator methods to not conflict [pr] (#7788)
* rename allocator methods to not conflict [pr]

* forgot those

* transfer + offset
2024-11-20 00:10:29 +08:00
chenyu
573f145dcf METAL raise RuntimeError with no compiler and bad src (#7603)
fixed BEAM if src is invalid on METAL. it currently only accept RuntimeError in `_time_program`
2024-11-08 17:09:12 -05:00
George Hotz
6bb230287b pass the src into Metal [pr] (#7518)
* pass the src into Metal [pr]

* put that comment back

* keep old functionality

* move all to disassembler

* metal supports parallel beam

* touchups

* comment in correct place
2024-11-04 12:35:30 +08:00
chenyu
6021bf87f4 unify T = TypeVar("T") (#7342) 2024-10-28 18:43:44 -04:00
George Hotz
9f32a6f496 Revert "move metal tc check to renderer [pr] (#7248)" (#7251)
This reverts commit 72ddcdb4d1.
2024-10-24 10:57:09 +08:00
George Hotz
72ddcdb4d1 move metal tc check to renderer [pr] (#7248) 2024-10-24 10:38:57 +08:00
mesozoic-egg
0e8bcda07e get readable error from wait_check (#6965)
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
2024-10-09 17:28:58 +03:00
mesozoic-egg
d2e02b47e1 Construct c_ulong in blitCommandEncoder copy method (#6793)
* Construct c_ulong in blitCommandEncoder copy method

* line too long

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
2024-10-02 11:09:37 +08:00
mesozoic-egg
992cde05d7 Metal with CDLL instead of py-objc (#6545)
* Add CDLL interface for metal

* remove two unused functions

* Cover most of the API methods

* switch to cdll

* directly call objc message in ops_metal

* keep only obj interface

* Use direct message sending for graph

* may have found a solution to the memoryview on ctypes pointer

* buf indexing bug fixed

* fix c_int

* fix c int to bytes

* fix gpu time bug

* line savings for cdll metal core

* wip

* c int bug

* fix buf casting

* dedup for c_void_p

* dedup for c_void_p

* linter fix

* remove unused stuff

* my py fix

* more mypy error fix

* line savings

* line savings

* rename send_message to msg; add __hash__ and __eq__ for dedup

* wip

* refactor

* refactor

* remove named import from ctypes

* forgot to change variable name

* file reorg, put support.py to ops_metal

* refactor

* hash error

* remove to_ns_array

* test oom exception, fix exception change

* typevar for msg

* add back dedup

* test for compile error

* move constant to graph

* move header constant around

* get label for icb buffer

* check icb label using "in"

* wip fixing mypy reported error

* fixed mypy error

* code formatting

* all_resources dedup match previous

* code formatting

* code formatting; buffer set to objc_id

* revert changes on buf for the manual release, seems like _free is not always called

* skip unless on metal, for test_metal

* fix premature mem release causing seg fault

* test_metal check for device before importing

* Buffer should only be released under _free explicitly

* mypy fixes

* change object ownership

* test compile success

* lint fixes

* remove load_library

* wrap sel_register in cache

* simplify to_struct

* swap lines

* fix type error in to_struct

* bump line to 9800

* remove pyobjc from setup.py

* command buffer should be objc_instance and get released

* stringWithUTF8String: returns objc_instance

* Use constant for MTLPipelineOptionNone

* better explanation for [MTLBuffer contents:] return

* Use dyld_find in case the path differs

* trailing whitespace

* handle exception for methods that take error:

* load /System/Library instead of /Library

* Init c_void_p with None instead of zero for error objects

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-25 17:43:01 +08:00
George Hotz
638b4843da fix for metal ICB issue on M1/M2 [run_process_replay] (#6313)
* this is a working fix

* better comment

* repro
2024-08-28 21:31:14 -07:00
nimlgen
e9024c691f metal raise when command queue is not created (#6044)
* metal raise when command queue is not created

* dont do that
2024-08-12 18:30:37 +03:00
nimlgen
98df648a79 metal sync queues in transfer (#5308)
* metal sync queues

* cleaner

* need this

* oops
2024-08-05 18:43:22 +03:00
nimlgen
8a548b0b6e metal support offset (#5293) 2024-07-05 16:13:05 +03:00
gip
04ef0fd328 fix: message when applegpu tools missiong (#5236) 2024-07-03 09:07:09 -07:00
chenyu
a8e9307e0b pylint runtime/ and shape/ (#5044)
as pointed out by #4877, need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime
2024-06-18 19:48:18 -04:00