Commit Graph

75 Commits

Author SHA1 Message Date
Christopher Milan
0aabc1e938 Mesa NIR backend (NAK/LLVMpipe) (#12089)
* nak works

* TestOps::test_add works

* testop has no crashes

* fix bool casts

* fix typo

* add disassemble

* RANGE and locals/regs

* simplify NAKCompiler

* disass cleanup

* cleanup nir codegen

* almost all tests passing

* cleanup notes in extra/

* old notes

* only import nak if NIR=1

* fix new SPECIAL syntax

* fix local/shared memory

* more tests passing

* add DEFINE_VAR support

* llvmpipe kinda works

* diskcache

* some mypy stuff

* lvp passing test_ops.py

* fix imports

* actually fix imports

* remove 'stdout'

* fix llvm import

* fix mypy issues

* nicer errors

* simpler test_dtype skips

* test lvp in CI

* fix github action syntax

* fix more actions typos

* switch to mesa 25.1.0

* diskcache_put

* better generation for lvp nir_options

* b64encode shader blobs

* Revert diskcache changes

This reverts commits 930fa3de8a and 8428c694b3.

* general cleanup

* better error messages

* fix llvm import

* fix windows tests

* link with libm and libgcc_s

* fix some errors

* dont check for 'float4'

* NIR uses pointer arithmetic

* use tinymesa

* bump tinymesa

* bump tinymesa again

* update lvp nir_options

* print nir shader with DEBUG

* simplify LVPCompiler

* more tests

* "gated" STORE

* NAK is cacheable

* more tests

* all tests pass locally for NAK

* test autogen in CI

* autogen deps

* more deps

* fix uop_gc

* fix macos

* mypy

* save 2 lines

* save two more lines

* save 1 line

* save 4 lines

* save more lines

* Revert "save more lines"

This reverts commit dd3a720c5a.

* save more lines

* fix LVP on windows

* refactor

* reorganize some code

* refactor lib_gpu

* move LVP check

* out of order loads

* remove support.mesa

* bump tinymesa version

* simplify LVP jit

* macos

* macos ci

* shell: bash

* testing

* more testing

* compute brew prefix

* stupid typo

* actually fix

* lib

* stdout on macos

* inline gallivm_compile_module

* Revert "inline gallivm_compile_module"

This reverts commit b65983b151.

* elf macos

* semicolon

* inherit from CPULLVMCompiler

* ruff

* disas test

* fix libm linking

* default is fine actually

* arm works

* add elf loader link test

* fix NAK beam

* pylint is too smart by half

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-10-15 17:38:33 +08:00
qazal
cd6aeebfee sqtt: osx decoder installer (#12637) 2025-10-13 17:26:12 +08:00
nimlgen
89be3590aa amd: sqtt on gfx12 (#12564)
* amd: sqtt on gfx12

* cleaner

* thi

* and this

* ops

* ugh

* back

* rm this

* rm
2025-10-10 17:54:14 +08:00
nimlgen
1309cea247 rocprof parser in extra (#12569)
* rocprof parser

* viewer

* vw

* skip
2025-10-10 14:56:42 +08:00
nimlgen
9c9e337c78 amd: parse soc enums (#11727)
* amd: parse soc enums

* remove from mock

* fix

* minimal amd_gpu
2025-08-19 15:06:09 +03:00
uuuvn
052191eae4 Remote multihost (p2p with infiniband verbs) (#9746)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-27 14:44:32 -07:00
nimlgen
df3ba0a7c0 autogen: fix imports in libusb (#11294) 2025-07-21 13:04:27 +03:00
nimlgen
f9e4c4e57a nv: nvpci blackwell support (#11127)
* nv: start 5090

* gsp init 5090

* mmu

* works

* after merge

* clenaer

* rwk

* x

* fx

* finish?

* fix

* unrelated

* fix

* commenbt
2025-07-11 17:02:09 +03:00
nimlgen
6067568087 nv: remove hardcoded CTRL_CMD_VASPACE_COPY_SERVER_RESERVED_PDES (#11057) 2025-07-02 20:41:10 +03:00
nimlgen
1c45b9f7fb start nvpci (#10521)
* start nvpci

* talk to fsp

* boot args

* riscv core bootted

* q

* agen

* got gsp init msg

* some fixes

* set registry, stuck aft lockdown(

* start ga/ad port

* gsp init on ada

* more classes allocated

* more

* mm

* fixes and progress

* no huge pages for now

* mm seems workin, but switch to 512mb page for simplicity

* working state

* not cleaned

* claned

* nvd=1

* start gr ctx

* compute

* clean 1

* cleanup 2

* cleanup 3

* cleaner 4

* cleaner 6

* add iface to nv

* save before reboot

* merged into NV

* moveout mm

* post merge

* cleaner 7

* merge and rebase

* pciiface abstraction + reset

* download fw from web

* print logs

* minor changes + p2p

* cleaner 8

* cleaner 9

* cleaner 10

* delete

* delete this as well

* linter 1

* oops

* priv_client -> priv_root

* fix mypy

* mypy?

* mypy?

* small changes

* shorter

* ops

* remove this

* do not allocate paddr for reserve

* nodiff

* unified script

* ops

* dif ver

* add lock

* setup
2025-06-25 00:37:34 +03:00
nimlgen
b6e574fcdf am: smu 14.0.3 is smu 14.0.2 (#10714) 2025-06-13 23:07:56 +03:00
George Hotz
0fbf3f5554 Revert "Revert "Update autogen ci runner to ubuntu 24.04 (#10736)" (#10757)" (#10758)
This reverts commit a6dba9b9d9.
2025-06-10 09:32:27 -07:00
George Hotz
a6dba9b9d9 Revert "Update autogen ci runner to ubuntu 24.04 (#10736)" (#10757)
This reverts commit 1d15374c7a.
2025-06-10 09:31:51 -07:00
uuuvn
1d15374c7a Update autogen ci runner to ubuntu 24.04 (#10736)
For `kfd.AMDKFD_IOC_EXPORT_DMABUF`
2025-06-10 08:33:02 -07:00
George Hotz
7ff175c022 cache a venv to avoid pip usage (#10689)
* try built in pip caching

* try venv

* export venv

* set VIRTUAL_ENV

* revert that

* venv key

* fix

* ci cache hit?

* fix windows
2025-06-07 20:13:41 -07:00
nimlgen
85cea23557 nv: original bw qmd (#10672)
* nv: original bw qmd

* forgot
2025-06-07 01:43:22 +03:00
nimlgen
883bb4541c am: reserve address space (#10564)
* am: reserve address space

* f

* cc

* errno

* fix

* always has cpu mapping
2025-05-30 19:31:03 +03:00
nimlgen
d90ddcc365 nv: blackwell support (#10487)
* nv: blackwell support

* fixes

* hm

* h

* fixes

* mypy

* xx

* yy

* arr

* revert

* oops

* unrelated
2025-05-24 18:23:53 +03:00
nimlgen
2895198c36 am: download regs (#10419)
* am: download regs

* x

* linter

* mypy

* after merge

* raise

* fixed name

* fix

* xx

* remove

* missing reg

* missing reg

* move to online

* ops
2025-05-20 18:59:56 +03:00
nimlgen
30bd6a619f usb gpu (#8766)
* start gpu

* progress

* fixes

* read correct

* libusb

* libusb works

* support asm24

* hmm

* one access file

* fix extra

* start AMBar

* works on am

* back to usb

* patch fw

* full fast write into a bar

* ugh, minus one gpus, next please

* mute libusb for now

* usb for asm24

* 63

* hmm

* ops

* rescan

* and gpu shoudl be there

* enumerate them?

* usbgpu bus 4, 100% reliable (draft)

* lil

* works

* comments

* add DEBUG

* cleaner

* simplest

* Revert "simplest"

This reverts commit 1d00354c16.

* Revert "cleaner"

This reverts commit c5662de956.

* assert we find gpu

* that's simpler

* this back

* simpler?

* correcT

* work

* nonsense

* works with more checks

* this works

* the 6s in the right place

* reliable now

* fix after reboot

* set config

* 1s timeouts

* close to fw loading

* streams

* usbhub works

* endpoints

* fix

* want to test tiny10

* move to tiny 10

* fix gpu

* ugly speed

* smth

* mostly broken, but signals and dmas

* do not reset gpu every time

* changes to run kernels

* ugh, not working

* t10

* pg and sc files

* some prog

* um?

* somehow it works

* patched for 24

* some tries

* minimal

* moving

* back to working

* so sloooooow

* move to controller

* usb.py rewrite

* rework

* cleaner 1

* cleaner 2

* cleaner 3

* new abstractions

* aft merge

* init controller

* cleaner 4

* cleaner 5

* patcher + tiny changes

* ignore that

* cleaner 6

* after rebase

* cleaner 7

* bring it back

* start linter war

* linter 2

* autogen was missing

* fix autogen

* typing

* better?

* mypy

* extra/legacy rename and cleaner

* shuffle

* better printing

* tiny changes and tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-01 18:03:47 +03:00
nimlgen
9caceda79a amd: comgr is not required (#10128) 2025-05-01 13:41:44 +03:00
nimlgen
db51133537 rename HWInterface -> FileIOInterface (#9989)
* rename HWInterface -> FileIOInterface

* ugh
2025-04-22 22:18:57 +03:00
deftdawg
32bbff942c amd: add nbio 7.2.0 for some rdna2 (#9964)
* - Updated of #9700 which fixes #9665 but for the Steam Deck which was erroring on NBIO 7.2.0

* unrelated change

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-04-22 12:10:48 +03:00
nimlgen
a9430b4118 am: fix metrics table for smu14_0_2 (#9863) 2025-04-12 19:07:22 +03:00
uuuvn
3ee317ffed Fix kfd autogen and verify it in ci (#9818)
Had to autogen newer uapi headers for #9746 (dmabuf export ioctl missing),
submitting just the fix without updating to newer headers as they are only
needed for infiniband stuff
2025-04-10 09:53:42 +08:00
nimlgen
3e2f42c2e8 autogen: remove am headers from extra (#9666) 2025-04-01 14:45:30 +07:00
uuuvn
962c0f65f8 Fix generate_am (#9626)
This should be a comment
2025-03-31 01:15:44 +08:00
uuuvn
2a4247b8c2 RDNA 3.5 support (#9627) 2025-03-31 01:15:20 +08:00
nimlgen
54e1e59b44 am: rdna 4 support (#9621)
* hm

* fix

* return this

* fine

* g

* ruff

* fix
2025-03-29 23:16:27 +07:00
uuuvn
5908b89f71 MI300X support (WIP) (#9585) 2025-03-29 19:46:42 +08:00
uuuvn
dd9aae02c3 Refactor ops_amd.py (MI300X prereq) (#9428) 2025-03-29 00:17:20 +07:00
nimlgen
edf9e1bf8d am: move out soc21 to a sep module (#9551)
* am: soc module is not part of am

* am: soc module is not part of am
2025-03-24 14:17:42 +07:00
Ahmed Harmouche
7ce7fe0574 Refactor webgpu_dawn lib finding (#9547)
* Refactor webgpu_dawn lib finding

* Fix ruff
2025-03-23 08:23:29 -04:00
uuuvn
e85001b6ee SQTT profiling (#9278)
* sqtt

* docs

* multi-device

* ProfileSQTTEvent

* exec update

* 256mb default

* don't let people hang their gpus

* bitfields from autogen

* asic info from mesa

* more bitfields from autogen

* SQTT_ITRACE_SE_MASK

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-03-11 13:19:56 +08:00
uuuvn
b75f307234 amd: autogen ip bases (#9360) 2025-03-05 22:30:38 +03:00
nimlgen
993ef42bd5 am: hdp cg (#9346) 2025-03-04 20:44:09 +03:00
George Hotz
bf36967883 cuda hooking (#9180)
* cuda hooking

* progress

* more hook cuda

* fix params

* compile + cuMemHostAlloc hook

* work

* revert that
2025-02-20 19:20:01 +08:00
nimlgen
c6c2373bc0 replace libpciaccess autogen with just pci regs (#8983)
* replace libpciaccess autogen with just pci regs

* add pci.py
2025-02-09 18:40:45 +03:00
nimlgen
79de980565 am: do not fork pci bars (#8969) 2025-02-08 19:03:17 +03:00
Ahmed Harmouche
133cacadde Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646)
* Switch to dawn, all tests passing locally

* Use dawn-python

* Skip failing test

* Skip midcast and fix timestamp on metal ci

* Autogen webgpu

* Try fetch dawn lib again

* /usr/lib

* Without lib prefix

* Test autogen diff

* Delete webgpu support, move everything to ops_webgpu

* mypy fix

* Simplify, refactor

* Line savings

* No ResultContainer

* Type annotation for result

* Some more simplifications

* Why was this explicit sync used at all?

* Refactor: delete functions that are only used once

* Create shader module inline

* Clear unit tests cache, maybe that solves it

* That wasn't it

* Try deleting cache to pass failing weight compare

* weights_only=False for pytorch 2.6

* Simplify ctype array creation

* Remove nanosecond precision timestamps

* Simplify error handling

* Refactor, add back type annotations

* Deleted custom submit function, refactor

* read_buffer simplify

* Fix use after free, refactor

* Simplify supported_features

* Runtime docs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-07 15:16:59 +08:00
nimlgen
86feb98dcd am: add support for 7600 (#8910)
* am: start to add support for 7600

* test_tiny passes

* mmhub 3 0 2

* cleaner
2025-02-06 14:04:07 +03:00
uuuvn
6dadb60c93 LLVM JIT (+autogen llvm instead of llvmlite) (#8486)
* LLVM JIT

* Autogen LLVM

* Update autogen

* Move things around

* even more non-determinism

* windows

* more autogen weirdness

* more windows stuff

* blind windows development try 2

* more blind windows development

* even more blind windows development

* maybe i should just set up a windows vm...

* why can't everyone just use sysv abi?

* cleanup debugging stuff

* unused import

* icache flushing isn't required on x86

* merge jit_nt and jit_unix

* more

* Temporary hack to not segfault

* better error

* bad conflict resolution

* Attempt to simplify support/llvm.py

* More refactoring

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-02 19:52:42 +08:00
nimlgen
2f0856c1e2 qcom: use hwinterface (#8565)
* qcom: use hwinterface

* ops

* not needed anymore
2025-01-11 17:11:23 +03:00
nimlgen
aa3d612df2 add script to install amd mockgpu on macOS (#8536)
* upload artifact every time

* hm

* sh script

* hm

* hm2

* hm2

* hm2

* no sudo

* def paths

* small comments

* text

* try auth for bigger limits
2025-01-09 01:29:25 +03:00
patrini32
afef69a37d MOCKGPU on mac os (#8520)
* tweaks for macos

* fix

* fix

* typo

* remove nvidia changes

* remove nv related changes

* change address back
2025-01-07 20:27:43 +03:00
nimlgen
ab3ac2b58d hw interface abstraction (#8524)
* use HWInterface in autogen

* mockgpu

* HWInterface

* more HWInterface

* fix

* fix

* old code

* fix

* implicit field definition

* add offset check to mockgpu too

* refactor

* forgot to pass flags + read rewrite

* test

* play with vfio

* nv: this should be kept

* try this

* vfio

* rm overwrite=True

* linetr

* do not reinit kfd

* minor

* mypy

* mock

* init them once

---------

Co-authored-by: patrini32 <patrini23@proton.me>
2025-01-07 18:18:28 +03:00
nimlgen
c18307e749 AM driver (#6923)
* connect to gpu

* rlc init?

* gfx comp start init

* early init is hardoded, some progress with fw

* gart

* progress, next mqd

* ring setup, still does not execute anything

* ugh write correct reg

* pci2: vm

* pci2: start psp

* vm seems to work

* pci2: gfx start

* pci2: fix psp ring resp

* pci2: try ring

* pci2: mes and some fixes

* pci2: some progress

* pci2: progress

* pci2: mm

* pci2: discovery

* pci2: correct apertures

* pci2: b

* pci2: i

* pci2: l

* pci2: o

* pci2: cmu

* pci2: mes_kiq works

* pci2: mes

* pci2: kcq does not work(

* pci2: unhalt gfx

* ops_am

* minor

* check if amdgpu is there, or we will crash

* bring back graph, it just works

* less prints

* do not init mes (not used)

* remove unused files

* ops_am: start move into core

* ops_am: works

* clcks, but still slower

* faster + no mes_kiq

* vm frags + remove mes

* cleanup fw

* gmc tiny cleanup

* move to ops_amd

* comment out what we dont really need

* driverless

* close in speed

* am clean most of ips

* gmc to ips

* cleaner

* new vm walker

* comment old one

* remove unsued autogens

* last write ups

* remove psp hardcoded values

* more

* add logs

* ih

* p2p and sdma

* vfio hal and interrupts

* smth

* amd dev iface

* minor after rebase

* bind for sdma

* Revert "bind for sdma"

This reverts commit a90766514d.

* tmp

* debug new mm

* ugh, allreduce hangs fixed

* p1

* works

* no pci.py

* cleaner a bit

* smth

* tiny cleanups

* cleaner a bit

* pciiface

* linter

* linter 2

* linter 3

* linter

* pylint

* reverted unrelated changes

* unrelated

* cmp tool

* ugh wrong fw

* clockgating

* unrelated

* alloc smaller chunks

* this

* opt sigs

* collect stat

* ops

* upd

* proclogs

* proclogs2

* vfio

* ruff

* linter pylint

* oops

* mypy p1

* mem fix

* mypy p2

* mypy p3

* mypy p4

* correct

* minor

* more tests

* linter in tests

* pci_regs header

* minor write up

* setup

* do not require libs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-31 23:06:17 +03:00
nimlgen
81d415be03 amd pkt3 refactor (#7923)
* amd pkt3 refactor

* replace this

* linter

* fix

* cmt

* fast

* simpler

* linter

* smth

* missing
2024-11-28 11:06:37 +03:00
Jacky Lee
c8b59416d0 fix: find_library can be None (#7145) 2024-10-18 20:50:52 +03:00
nimlgen
8094340221 nv print info about faults (#7057)
* nv print info about faults

* unrelated changes

* nv_gpu.GT200_DEBUGGER in mockgpu

* regen with ocrrect version

* spacing
2024-10-14 21:49:38 +03:00