* float uop in sym_infer
* break line :(
* rerun mypy
* update GlobalCounters types
* revert type change and cast assignments to mem and ops
* cast inferred value to UOp in reshape
* cast hcq, update view reshape to handle inferred float
* rm extra space
* update error
* no type updates
* add patches
* add osx test in ci
* macos specific uvm, gpfifo mask
* only do that for now
* Revert "add patches"
This reverts commit 80d3112a57.
* use fork for now
* workflow only one worker
* merge osxtests with tests
* Revert "merge osxtests with tests"
This reverts commit 3461c8f46c.
* macos pagesize 16384
---------
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* WebGPU f16 support
* Don't enable f16 yet
* dtype tests passing after bitcast fix
* Maybe all WebGPU green?
* Require shader-f16 in examples
* Minor wgsl touchup
* 1 line shorter
* Simpler
* Add transcendetal support
* log2 nan location mismatch on Vulkan
* Nan skips
* fix tensor realization bug in #8975
* that's a reshape now
* work
* works
* give those tests better names
* test when multiple mops result in the same ShapeTracker
* test_become_existing_buf_complex is enough
* that too
* add some docs about speed [pr]
* better torch gemm
* enable locals on llvm/clang
* disable locals for beam speed on LLVM/CLANG
* 0x20 alignment in llvm allows ymm use
* Switch to dawn, all tests passing locally
* Use dawn-python
* Skip failing test
* Skip midcast and fix timestamp on metal ci
* Autogen webgpu
* Try fetch dawn lib again
* /usr/lib
* Without lib prefix
* Test autogen diff
* Delete webgpu support, move everything to ops_webgpu
* mypy fix
* Simplify, refactor
* Line savings
* No ResultContainer
* Type annotation for result
* Some more simplifications
* Why was this explicit sync used at all?
* Refactor: delete functions that are only used once
* Create shader module inline
* Clear unit tests cache, maybe that solves it
* That wasn't it
* Try deleting cache to pass failing weight compare
* weights_only=False for pytorch 2.6
* Simplify ctype array creation
* Remove nanosecond precision timestamps
* Simplify error handling
* Refactor, add back type annotations
* Deleted custom submit function, refactor
* read_buffer simplify
* Fix use after free, refactor
* Simplify supported_features
* Runtime docs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* benchmark kernel launch
* don't realize unneeded
* faster
* faster metal
* fix mypy
* new objc message style [pr]
* without sync
* no div 0
* lru cache that
* no sync in the profile
* fix
* update all to new style
* remove comment
* graph one kernel
* fix graph one kernel
* remove that sync
* benchmark kernel launch
* don't realize unneeded
* faster
* faster metal
* fix mypy
* without sync
* no div 0
* lru cache that
* no sync in the profile
* remove Tensor._to_const_val
added a TODO for advance indexing on const, which was the last place that checks const in Tensor
* that is not folding now
* one more