* fixes from chargpt for torch backend
* shrink support
* add stride support
* comment cleanup
* a few more
* work
* import the stream hack
* llvm multi auto
* rig up torch's testing framework [pr]
* support more movement ops
* dec on expand
* fix tests
* work
* fix tests
* a few more
* decomps + opt hook
* installed pytest
* boom
* fix webgpu
* use exact variable names in test so that AI can read easier
* add tag for specific test name like test a specific dtype
* fix ruff
* astype everything
* dtype in array creation
* just arange
* is 67% considered fixed?
* move test up
* small cleanups
* share function
* add qgemm as well
* add qgemm too
* make sure qgemm comes out as int
* take out qgemm for now
* fixed test
* add correct qgemm
* addressing feedback here too, early naive fix for now
* simplify bias and c to be minimalistic enough to test correctness
* refactored qlinearops
* maybe these asserts aren't the best..
* fix test
* updated tests to cover new ops
* try to add to CI
* move test_onnx_ops into testextra/
* more attention tests
* qlinear_add atol=1
* attention still not fullllllly correct
* it is what it is
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* Solve dims too large errors on webgpu
* Simplify divisor find
* Test square root divisor
* Fix lint
* Refactor into group_dims and split_dims
* Refactor
* Fix lint
* Add back max check in _group_dims
* Prefer grouping over split
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* add patches
* add osx test in ci
* macos specific uvm, gpfifo mask
* only do that for now
* Revert "add patches"
This reverts commit 80d3112a57.
* use fork for now
* workflow only one worker
* merge osxtests with tests
* Revert "merge osxtests with tests"
This reverts commit 3461c8f46c.
* macos pagesize 16384
---------
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
- Ensure that the set backend environment variable is persisted to the next step via $GITHUB_ENV
- It doesn't actually persist for Windows unless shell is explicitly set to bash.
- Add the assertion to ensure the selected backend is actually used.
* WebGPU f16 support
* Don't enable f16 yet
* dtype tests passing after bitcast fix
* Maybe all WebGPU green?
* Require shader-f16 in examples
* Minor wgsl touchup
* 1 line shorter
* Simpler
* Add transcendetal support
* log2 nan location mismatch on Vulkan
* Nan skips
* increase CI speed with more runners [pr]
* splits + cleanups [pr]
* more runners
* need that dep
* split that too
* can't be minimal
* move test readme
* bugfix + naming
* one more split
* bump to 22.04
* refactor into subactions
* this work?
* add shell
* move install opencl
* valid?
* support mac os x
* refactor other osx
* fix linux/osx
* fixes
* cleanups
* used everywhere
* no quotes
* quotes on true
* bugfixes
* this run?
* hardcode
* that
* process replay action
* fix checkout
* restore to branch
* fix caching
* fix osx python cache
* does replace function exist
* Revert "does replace function exist"
This reverts commit 622177c5a0.
* Revert "fix osx python cache"
This reverts commit e70d55cd93.
* user on osx to fix untar issue
* that
* Switch to dawn, all tests passing locally
* Use dawn-python
* Skip failing test
* Skip midcast and fix timestamp on metal ci
* Autogen webgpu
* Try fetch dawn lib again
* /usr/lib
* Without lib prefix
* Test autogen diff
* Delete webgpu support, move everything to ops_webgpu
* mypy fix
* Simplify, refactor
* Line savings
* No ResultContainer
* Type annotation for result
* Some more simplifications
* Why was this explicit sync used at all?
* Refactor: delete functions that are only used once
* Create shader module inline
* Clear unit tests cache, maybe that solves it
* That wasn't it
* Try deleting cache to pass failing weight compare
* weights_only=False for pytorch 2.6
* Simplify ctype array creation
* Remove nanosecond precision timestamps
* Simplify error handling
* Refactor, add back type annotations
* Deleted custom submit function, refactor
* read_buffer simplify
* Fix use after free, refactor
* Simplify supported_features
* Runtime docs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* dsp simulator
* progress
* fix
* close on test tiny
* working
* less waste
* line savings
* Device DSP compiler
* mock DSP at the bottom
* DSP tests
* docker caching
* test update
* need load
* skip that test for CI DSP
* last touch
* ugh
* LLVM JIT
* Autogen LLVM
* Update autogen
* Move things around
* even more non-determinism
* windows
* more autogen weirdness
* more windows stuff
* blind windows development try 2
* more blind windows development
* even more blind windows development
* maybe i should just set up a windows vm...
* why can't everyone just use sysv abi?
* cleanup debugging stuff
* unused import
* icache flushing isn't required on x86
* merge jit_nt and jit_unix
* more
* Temporary hack to not segfault
* better error
* bad conflict resolution
* Attempt to simplify support/llvm.py
* More refactoring
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* MultiLazyBuffer is UOp [pr]
* this is new mlb
* this is the idea
* progress
* multitensor works
* more movement ops
* this
* MultiLazyBuffer is UOp
* cleanups
* multi axis
* fix more tests
* work
* not that
* add multi grad and move shard to ops
* mops not views
* no double contig
* sweet, all mt tests passing
* port old logic
* remove lbs
* fix realized
* whitespace
* assign tweak
* test_assign_kv_cache_multi passes
* fix is_realized
* fix JIT for multi
* just a few more lines i'll pay them back soon i swear please bro just a few more
* no split reduceop for multi
* start
* progress
* fixes
* smth
* mini fixes
* fix2
* ugh, need this for now
* faster
* cleanups
* tiny linters
* make mypy happier
* test & free pts
* ops
* linter
* cleanup vm
* fix
* remove map_from
* tiny fixes
* add test to ci
* init turing tc
* reorder tc
* hotfix: remove some spaces
* revert var name to x
* consistent order of factors
* revert order of terms to match old stuff
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>