* Less messy broken graph on paravirtualized metal workaround
GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).
> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.
This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.
* unused import
* insert Ops.FUSE for arange
* reshape does not collapse
* do not fuse reshapes
* add children
* fixups
* work
* add Ops.WHERE support to z3
* fix fuse for cast
* diff
* ugh
* don't need this anymore
* contiguous
* add always_contiguous
* there too
* start gpu
* progress
* fixes
* read correct
* libusb
* libusb works
* support asm24
* hmm
* one access file
* fix extra
* start AMBar
* works on am
* back to usb
* patch fw
* full fast write into a bar
* ugh, minus one gpus, next please
* mute libusb for now
* usb for asm24
* 63
* hmm
* ops
* rescan
* and gpu shoudl be there
* enumerate them?
* usbgpu bus 4, 100% reliable (draft)
* lil
* works
* comments
* add DEBUG
* cleaner
* simplest
* Revert "simplest"
This reverts commit 1d00354c16.
* Revert "cleaner"
This reverts commit c5662de956.
* assert we find gpu
* that's simpler
* this back
* simpler?
* correcT
* work
* nonsense
* works with more checks
* this works
* the 6s in the right place
* reliable now
* fix after reboot
* set config
* 1s timeouts
* close to fw loading
* streams
* usbhub works
* endpoints
* fix
* want to test tiny10
* move to tiny 10
* fix gpu
* ugly speed
* smth
* mostly broken, but signals and dmas
* do not reset gpu every time
* changes to run kernels
* ugh, not working
* t10
* pg and sc files
* some prog
* um?
* somehow it works
* patched for 24
* some tries
* minimal
* moving
* back to working
* so sloooooow
* move to controller
* usb.py rewrite
* rework
* cleaner 1
* cleaner 2
* cleaner 3
* new abstractions
* aft merge
* init controller
* cleaner 4
* cleaner 5
* patcher + tiny changes
* ignore that
* cleaner 6
* after rebase
* cleaner 7
* bring it back
* start linter war
* linter 2
* autogen was missing
* fix autogen
* typing
* better?
* mypy
* extra/legacy rename and cleaner
* shuffle
* better printing
* tiny changes and tests
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* range has only one op now
* fix z3 checker
* ci fix
* needs shell
* try pip ensure update
* that ensurepip is useless
* upgrade pip before cache
* windows happy?
* propagate use_tensor_cores
* add use_tensor_core to arg in test and search
* bugfix
* get TC val from ContextVar in search
* revert minor space change
* add tc emulation test to ci and benchmark
* revert
* revert whitespace change
* remove test for ptx
* add comment and remove llvm test run
* bug in div range folding
* simpler
* oh, this is right for indexing, but the div mod folding needs to be fixed
* reenable
* Passing test_complexity_w_unroll2 (#10068)
* Passing
* remove non_folded_divs
* Add check for negative tern in div folding
* Add test
* bump that limit
* fix casted
---------
Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
* make beautiful indexing use a Variable
* stunning test
* better color
* training is broken
* fix tests
* fix variable indexing
* fix test
* no contiguous
* revert that
* revert that too
* indexing two bind
* skip for webgpu
* make not slow