* implemented in tensor
* apply onnx tests to asymmetrical pads
* better onnx op ordering
* correct ceil_mode asymmetrical
* fix onnx_ops comments
* a few more TODOs and fix some stupidity
* fix some typing
* fix test
* mypy still a little messed up
* refactor out pad struct transformation
* add simple docs for now
* add whatever tests possible
* add tests for _resolve_pool_pads
* better err msg
* whoops didn't mean to include this
* retry CI
* enable asymmetric pads onnx tests
* better docs
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
First iteration of the AMX fix was using symbol lookup + trampoline
approach which required this, however later i replaced it by marking
amx function `static` and assumed that relocation was still used when
callee wasn't inlined, however this turned out not to be the case
because the callee can't be moved around by linker at link-time and
can't be overloaded by other symbols (`static` means priority + local
visibility)
* init mockcuda
* run gpu ocelot
* fix
* sfixes
* disable broken tests
* linter
* these fails as well
* pylint
* myypy
* this fails on real platforms as well
* mypy please
* _padding2d -> _resolve_pool_pads
* rephrase err msg
* even better error msg
* check asymmetric first os people don't hit error twice
* test against torch
This test assumes that float4 is the max upcast and tests that 8 float
loads are upcasted to 2 float4 loads, however on CLANG+AMX upcasts
can be up to float16 and in this test we get one float8 load instead.
The @unittest.skipIf line is copied from test_linearizer.py where
a bunch of tests make similar assumptions about upcasts.
* hljs.highlightElement target code not pre
* createPre
* no style change
* real no style change
* remove unnecessary scroll bar
* horizontal scrollbar appears only when scrolled all the way to the bottom
* misc
* connect to gpu
* rlc init?
* gfx comp start init
* early init is hardoded, some progress with fw
* gart
* progress, next mqd
* ring setup, still does not execute anything
* ugh write correct reg
* pci2: vm
* pci2: start psp
* vm seems to work
* pci2: gfx start
* pci2: fix psp ring resp
* pci2: try ring
* pci2: mes and some fixes
* pci2: some progress
* pci2: progress
* pci2: mm
* pci2: discovery
* pci2: correct apertures
* pci2: b
* pci2: i
* pci2: l
* pci2: o
* pci2: cmu
* pci2: mes_kiq works
* pci2: mes
* pci2: kcq does not work(
* pci2: unhalt gfx
* ops_am
* minor
* check if amdgpu is there, or we will crash
* bring back graph, it just works
* less prints
* do not init mes (not used)
* remove unused files
* ops_am: start move into core
* ops_am: works
* clcks, but still slower
* faster + no mes_kiq
* vm frags + remove mes
* cleanup fw
* gmc tiny cleanup
* move to ops_amd
* comment out what we dont really need
* driverless
* close in speed
* am clean most of ips
* gmc to ips
* cleaner
* new vm walker
* comment old one
* remove unsued autogens
* last write ups
* remove psp hardcoded values
* more
* add logs
* ih
* p2p and sdma
* vfio hal and interrupts
* smth
* amd dev iface
* minor after rebase
* bind for sdma
* Revert "bind for sdma"
This reverts commit a90766514d.
* tmp
* debug new mm
* ugh, allreduce hangs fixed
* p1
* works
* no pci.py
* cleaner a bit
* smth
* tiny cleanups
* cleaner a bit
* pciiface
* linter
* linter 2
* linter 3
* linter
* pylint
* reverted unrelated changes
* unrelated
* cmp tool
* ugh wrong fw
* clockgating
* unrelated
* alloc smaller chunks
* this
* opt sigs
* collect stat
* ops
* upd
* proclogs
* proclogs2
* vfio
* ruff
* linter pylint
* oops
* mypy p1
* mem fix
* mypy p2
* mypy p3
* mypy p4
* correct
* minor
* more tests
* linter in tests
* pci_regs header
* minor write up
* setup
* do not require libs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
it's a python style mod. possibily can be cleaner with a floor div
relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs
* remove uop mutability [pr]
* test fixups
* most tests pass
* more tests pass
* lil test fixups
* them too
* fix test
* unneeded
* err, that
* fix test_hcq
* fix test failures
* fix that test
* tensor universe
* does this pass test
* Revert "does this pass test"
This reverts commit ed516b3169.
* Revert "tensor universe"
This reverts commit c21301852a.
* proper spidering for uops
* cleanups
* all tensors
* all tensors
* slow but correct
* fast
* no WeakSet
* faster
* no need for list
* revert that