* Make `dev` a property of `Allocator`
(this is a prereq refactor for #10285)
At least `BufferXfer.copy` accesses it assuming it's always present,
currently most devices just add this property on their own repeating
the same code over and over again.
This is also a bit footguny, see `RemoteAllocator` that named this
property `device` instead of `dev`, i could obviously just change that
in one place but doing it globally seems like a better solution (and it
reduces code duplication too).
`MallocAllocator` is a bit special, but passing `None` works just fine.
* typing
* ignore type instead of cast
* Less messy broken graph on paravirtualized metal workaround
GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).
> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.
This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.
* unused import
CPython doesn't make any guarantees about order in which globals like
`msg` or `libobjc` are destroyed when the interpreter shuts down
https://github.com/tinygrad/tinygrad/pull/8949 triggered the
unlucky ordering which lead to a bunch of errors at exit
There is also a bunch of other places where similar problems exist
* benchmark kernel launch
* don't realize unneeded
* faster
* faster metal
* fix mypy
* new objc message style [pr]
* without sync
* no div 0
* lru cache that
* no sync in the profile
* fix
* update all to new style
* remove comment
* graph one kernel
* fix graph one kernel
* remove that sync
* benchmark kernel launch
* don't realize unneeded
* faster
* faster metal
* fix mypy
* without sync
* no div 0
* lru cache that
* no sync in the profile
it's not needed if we move the Device before Program and Allocator, which need Device.
not updating hcq because it has a lot more stuff, and CLDevice requires CLDevice
* pass the src into Metal [pr]
* put that comment back
* keep old functionality
* move all to disassembler
* metal supports parallel beam
* touchups
* comment in correct place
* Add CDLL interface for metal
* remove two unused functions
* Cover most of the API methods
* switch to cdll
* directly call objc message in ops_metal
* keep only obj interface
* Use direct message sending for graph
* may have found a solution to the memoryview on ctypes pointer
* buf indexing bug fixed
* fix c_int
* fix c int to bytes
* fix gpu time bug
* line savings for cdll metal core
* wip
* c int bug
* fix buf casting
* dedup for c_void_p
* dedup for c_void_p
* linter fix
* remove unused stuff
* my py fix
* more mypy error fix
* line savings
* line savings
* rename send_message to msg; add __hash__ and __eq__ for dedup
* wip
* refactor
* refactor
* remove named import from ctypes
* forgot to change variable name
* file reorg, put support.py to ops_metal
* refactor
* hash error
* remove to_ns_array
* test oom exception, fix exception change
* typevar for msg
* add back dedup
* test for compile error
* move constant to graph
* move header constant around
* get label for icb buffer
* check icb label using "in"
* wip fixing mypy reported error
* fixed mypy error
* code formatting
* all_resources dedup match previous
* code formatting
* code formatting; buffer set to objc_id
* revert changes on buf for the manual release, seems like _free is not always called
* skip unless on metal, for test_metal
* fix premature mem release causing seg fault
* test_metal check for device before importing
* Buffer should only be released under _free explicitly
* mypy fixes
* change object ownership
* test compile success
* lint fixes
* remove load_library
* wrap sel_register in cache
* simplify to_struct
* swap lines
* fix type error in to_struct
* bump line to 9800
* remove pyobjc from setup.py
* command buffer should be objc_instance and get released
* stringWithUTF8String: returns objc_instance
* Use constant for MTLPipelineOptionNone
* better explanation for [MTLBuffer contents:] return
* Use dyld_find in case the path differs
* trailing whitespace
* handle exception for methods that take error:
* load /System/Library instead of /Library
* Init c_void_p with None instead of zero for error objects
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
as pointed out by #4877, need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime