CPython doesn't make any guarantees about order in which globals like
`msg` or `libobjc` are destroyed when the interpreter shuts down
https://github.com/tinygrad/tinygrad/pull/8949 triggered the
unlucky ordering which lead to a bunch of errors at exit
There is also a bunch of other places where similar problems exist
* refactor into subactions
* this work?
* add shell
* move install opencl
* valid?
* support mac os x
* refactor other osx
* fix linux/osx
* fixes
* cleanups
* used everywhere
* no quotes
* quotes on true
* bugfixes
* this run?
* hardcode
* that
* process replay action
* fix checkout
* restore to branch
* fix caching
* fix osx python cache
* does replace function exist
* Revert "does replace function exist"
This reverts commit 622177c5a0.
* Revert "fix osx python cache"
This reverts commit e70d55cd93.
* user on osx to fix untar issue
* that
* Switch to dawn, all tests passing locally
* Use dawn-python
* Skip failing test
* Skip midcast and fix timestamp on metal ci
* Autogen webgpu
* Try fetch dawn lib again
* /usr/lib
* Without lib prefix
* Test autogen diff
* Delete webgpu support, move everything to ops_webgpu
* mypy fix
* Simplify, refactor
* Line savings
* No ResultContainer
* Type annotation for result
* Some more simplifications
* Why was this explicit sync used at all?
* Refactor: delete functions that are only used once
* Create shader module inline
* Clear unit tests cache, maybe that solves it
* That wasn't it
* Try deleting cache to pass failing weight compare
* weights_only=False for pytorch 2.6
* Simplify ctype array creation
* Remove nanosecond precision timestamps
* Simplify error handling
* Refactor, add back type annotations
* Deleted custom submit function, refactor
* read_buffer simplify
* Fix use after free, refactor
* Simplify supported_features
* Runtime docs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* benchmark kernel launch
* don't realize unneeded
* faster
* faster metal
* fix mypy
* new objc message style [pr]
* without sync
* no div 0
* lru cache that
* no sync in the profile
* fix
* update all to new style
* remove comment
* graph one kernel
* fix graph one kernel
* remove that sync
* benchmark kernel launch
* don't realize unneeded
* faster
* faster metal
* fix mypy
* without sync
* no div 0
* lru cache that
* no sync in the profile
* remove Tensor._to_const_val
added a TODO for advance indexing on const, which was the last place that checks const in Tensor
* that is not folding now
* one more
* Pass host CPU features to LLVM target
This gets `test_gemm_fp16` to pass on Windows. It would fail because the
generated machine code would call compiler-rt functions to to perform
truncating. This gets the test to pass on some hardware, because LLVM
gets access to more instructions. Essentially this is similar to
`-march=native`.
Unless this was intentionally left as is to be re-implemented fully in
LLVM IR or something.
* Fix linter complaints
* ptx and nv rendering refactor to work with half acc
* ptx fix!
* use same reg for acc and out
* fix comment
* another fix
* minor change in commet
* fix
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>