Commit Graph

71 Commits

Author SHA1 Message Date
wozeparrot
059cf2a90d feat: autogen from kernel register offset headers (#6056) 2024-08-12 14:08:35 -07:00
nimlgen
5d53fa491b amd autogened kfd ioctls (#5757)
* amd autogened kio

* unused import

* linter
2024-07-27 22:49:48 +03:00
nimlgen
dcd462860f elf loader (#5508)
* elf loader

* cleanup

* cleaner

* cleaner

* fixes

* revert this

* fix div 0

* fix nv

* amd fix

* fix mockgpu

* amd better?

* restore relocs for <12.4

* linter

* this is fixed now

* revert this

* process cdefines as function

* cleaner

* align

* save lines

* revert this change
2024-07-17 17:09:34 +03:00
nimlgen
dd7eef7d71 libc defs to autogen (#5217)
* libc defs to autogen

* amd import libc

* linter

* better a bit

* remove comment, check this

* not hardcoded path
2024-06-29 14:37:33 +03:00
nimlgen
ee02dcb98e nv supports PTX=1 (#5222)
* nv supports PTX=1

* not needed

* split nv compiler into nvrtc autogen

* remove to_c_array

* test

* Revert "test"

This reverts commit f0b56f308b.
2024-06-29 10:46:29 +03:00
nimlgen
fb1bf48cfe io_uring for copies from disk (#5035)
* exp uring

* fixes and old version

* nv

* cleaner

* cmp vs aio

* fix

* no lib

* fix nv

* linter

* disk_speed_test now runs default

* fixes

* uring -> io_uring

* linter happy

* get_temp_buf comment added

* tiny nits

* put wait back

* test runs everywhere

* remove consts

* remove mmap consts

* do not require iouring to run test, they are generic
2024-06-21 11:36:51 +03:00
wozeparrot
62dc36d371 autogen _try_dlopen (#4949) 2024-06-14 12:12:18 -07:00
nimlgen
5bf1f7d4d3 nv better error messages for ioctls (#4899) 2024-06-10 16:01:50 +03:00
nimlgen
7384ee08a0 amd cleanup sdma (#4796)
* amd cleanup sdma

* faster enqueue for sdma

* typo

* remove commnted lines

* fix overrun check

* flushhdp better command
2024-06-01 17:06:44 +03:00
nimlgen
bd2e7c8b31 amd registers from file (#4778)
* amd registers from file

* remove commentes

* linetr

* no off
2024-05-31 18:48:57 +03:00
Yury Zhuravlev
af56f0e68a fix HSA/KFD load for system-wide installation (#4218)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2024-05-22 20:33:21 -07:00
George Hotz
2ae4f45272 WIP PM4 Support (#4110)
* pm4 kernel launch works

* disable USE_THREAD_DIMENSIONS

* add kernel code

* work on real pm4

* pm4 signal

* same

* gate pm4

* hcq tests pass

* ops passes

* pm4 is closer

* pm4 debug (#4165)

* start debug tests passing

* prg

* smth

* hdp flush

* cleaner 1

* do not need this

* logs not need

* small things

* linter

* remove AQL

* test hcq

* fix tests

* it's subtracting, it shouldn't be -1

* pm4 changes (#4251)

* not need this anymore

* sdma signal with non atomic

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2024-04-23 08:31:27 +04:00
nimlgen
e6227bdb15 nv driver (#4044)
* start

* fix err 93

* gpu

* ioctl mappings

* alloc like cuda

* semaphores

* wait for semaphores value

* start ops_nv

* very simple kernels work

* init several gpus

* qmd dumper

* dirty, but most of kernels work

* always all test_ops

* progress, more tests, stable

* test_ops passes, gpt2 works

but wth big fifo, wrap of fifo doesn't work, i think it's something coherency releated

* need better sync

* fix sync

* alloc2

* all tests pass!

* cleanup 1

* cleanup

* multigpu, simple transfer

* fix sync

* correct init

* nv_gpu autogen + sync bug fix

* clean extra/nv_gpu_driver

* p2p

* clean up

* remove old gen

* small fixes

* cleanup

* cleanup 2

* small fixes

* bigger queue size

* cleanups

* wait

* fixed signals for devs

* fix hang + parallel beam

* small fixes

* detect when local memory is big in kernel

* correct assert

* small fixes

* correct tls size est

* one va space

* less lines

* shorter

* save 2 lines

* save some lines

* remove type ignores

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-22 19:50:20 +04:00
nimlgen
d6ba44bc1e kfd free buffers (#4027)
* kfd free buffers

* unmap

* all test passes

* better pm4

* forgot these

* invalidate only range

* better cache

* forgot

* comments

* fixes
2024-04-01 15:50:58 -07:00
George Hotz
2abb474d43 kfd driver wip (#3912)
* kfd driver wip

* cleanups

* kfd almost ready to ring doorbell

* ding dong?

* issues with signals

* something

* works

* ops kfd

* add amd_signal_t

* works...sometimes

* program runs

* _gpu_alloc cleanup

* cleanups

* work

* header + enable profiling (#3959)

* header + enable profiling

* just cleaner

* measure

* only local time domain

* remove old comments

* fix with master

* elf parsing (#3965)

* elf parsing

* fix kernels with private

* not used

* clean up

* clean up 2

* add flags

* kfd sdma (#3970)

* working sdma

* remove driver, shorter

* all commands we might need

* svm

* kfd remove hardcoded values (#4007)

* remove hardcoded values

* match above line

* 7k lines + revert hsa

* update that from origin

* fix sdma reg gen

* not the updated SDMA

* compiler_opts

* don't require kfd_ioctl

* get ioctls from python

* get ioctls from python

* remove build_sdma_command

* merge into 64-bit fields

* shorter

* fix property spelling and off by one

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2024-03-30 15:08:12 -07:00
George Hotz
f4055439dc don't include hip common (#3851)
* don't install hip common

* only that

* Revert "only that"

This reverts commit 85f22015d9.

* less

* needed

* sep comgr

* header file

* 6.0.2

* update hsa

* hsakmt

* Revert "hsakmt"

This reverts commit d3a118078e.
2024-03-22 08:50:50 -07:00
nimlgen
dd1a1c12df rocm path in autogen (#3697) 2024-03-12 14:06:43 +03:00
nimlgen
002bf380b0 hsa runtime (#3382)
* hsa init

* handles transfer

* linter

* clean up hwqueue

* fix sync freezes

* print errors
2024-02-15 14:14:34 +01:00
George Hotz
0aad8d238b rebuild ocelot (#3259)
* rebuild

* strip trailing whitespace
2024-01-26 18:46:36 -08:00
George Hotz
03a6bc59c1 move autogen to runtime/autogen (#3254) 2024-01-26 12:44:19 -08:00
George Hotz
a3869ffd46 move gpuctypes in tree (#3253)
* move gpuctypes in tree

* fix mypy

* regex exclude

* autogen sh

* mypy exclude

* does that fix it

* fix mypy

* add hip confirm

* verify all autogens

* build clang2py

* opencl headers

* gpu on 22.04
2024-01-26 12:25:03 -08:00