chenyu
9838c1a6ff
update import style in runtime ( #5735 )
2024-07-26 14:00:23 -04:00
George Hotz
5c688560bc
move CUDA/HIP compilers to their own files [run_process_replay] ( #5732 )
2024-07-26 10:00:15 -07:00
nimlgen
6ec9ea9ddd
hcq update_exec with optional params ( #5708 )
2024-07-26 00:04:57 +03:00
nimlgen
b026312a31
nv ptx print log ( #5691 )
2024-07-24 21:40:58 +03:00
nimlgen
2ea54176e2
docs: add more info on HCQProgram ( #5683 )
...
* docs: add more info on HCQProgram
* linter
* linter2
* one more type
2024-07-24 17:20:18 +03:00
nimlgen
baface413a
nv better nvdisasm fail message ( #5682 )
...
* nv better nvdisasm message
* cuda
2024-07-24 16:19:26 +03:00
nimlgen
a93982ef42
hcq move out program call to base class ( #5638 )
...
* hcq move out program call to base class
* fix
2024-07-23 14:25:38 +03:00
nimlgen
ee633c1988
hcq move out synchronize to base class ( #5634 )
2024-07-22 20:36:04 +03:00
Vyacheslav Pachkov
edc58e6b6e
hcq: remove duplicate allocation of kernel args by abstracting ( #5633 )
2024-07-22 18:29:41 +03:00
nimlgen
08a9c0ae5e
hcq cache invalidation for beam ( #5630 )
...
* nv full cache invalidation
* the same command on amd
* linter
* fix amd
* nv no hardcoded consts
* beam default
2024-07-22 18:13:17 +03:00
Vyacheslav Pachkov
583829ab44
helpers: remove duplicate data64 helpers in amd/nv ( #5627 )
2024-07-21 16:50:59 -07:00
nimlgen
0de5812032
hcq move map to allocator ( #5610 )
...
* hcq move map to allocator
* fix
2024-07-20 19:02:45 +03:00
nimlgen
b1782e3fef
hcq refactor signal into class ( #5575 )
...
* hcq refactor signal into class
* fix amd
* amd do not use amd_signal_t
* cleanup
* signal setter
* fix linter
* docs
* more docs + types
* fix types
2024-07-19 23:23:05 +03:00
nimlgen
9d7edc9269
hcq rename HCQCompat -> HCQ ( #5577 )
2024-07-19 11:34:17 +03:00
nimlgen
4e9d2b1615
nv memory_barrier command ( #5548 )
2024-07-18 16:23:11 +03:00
nimlgen
dcd462860f
elf loader ( #5508 )
...
* elf loader
* cleanup
* cleaner
* cleaner
* fixes
* revert this
* fix div 0
* fix nv
* amd fix
* fix mockgpu
* amd better?
* restore relocs for <12.4
* linter
* this is fixed now
* revert this
* process cdefines as function
* cleaner
* align
* save lines
* revert this change
2024-07-17 17:09:34 +03:00
nimlgen
661da32aff
nv do not map regions twice ( #5521 )
2024-07-17 11:20:02 +03:00
nimlgen
8dfd11c1d8
docs: hcq add types ( #5495 )
...
* docs: hcq add types
* linter
2024-07-15 22:14:48 +03:00
nimlgen
c9ec7ce070
start hcq docs ( #5411 )
...
* start hcq docs
* more hcq docs
* docs
* docs
* linter
* correct args
* linter
* ts returns int
2024-07-15 21:31:11 +03:00
chenyu
eef43c9f49
include dims in kernel/nv invalid err msg ( #5487 )
2024-07-14 22:51:30 -04:00
nimlgen
61822d1a14
nv fix timeline signal rollover on copy queue ( #5473 )
...
* hotfix: nv rollover to 32bits
* test both queues
2024-07-14 16:06:12 +03:00
nimlgen
8835d6c49a
cleanup nv/amd program ( #5449 )
...
* cleanup nv/amd program
* fix amd
* a bit cleaner
* ugh, typo
* linter
* fix nv
* tiny thing
2024-07-14 14:08:35 +03:00
nimlgen
6943ea5f29
nv remove copy_from_cpu command ( #5459 )
2024-07-13 23:08:49 +03:00
nimlgen
6604d2b2c3
amd/nv respect visible devs ( #5409 )
...
* nv/amd respect visible devices
* linter
* sort amd gpus
* env docs
2024-07-12 20:02:12 +03:00
nimlgen
b3790b759b
nv cleanup gpfifo setup ( #5382 )
...
* nv cleanup gpfifo setup
* save lines
2024-07-11 17:50:52 +03:00
nimlgen
2ba96d4c29
nv use mv_address ( #5381 )
...
* nv use mv_address
* unsued import
2024-07-11 16:45:03 +03:00
nimlgen
bd77efda2f
add HWCommandQueue base class for hcq devices ( #5303 )
...
* add HWCommandQueue as base queue for hcq devices
* try this
* fixes
* comments
* linter
* linetr2
* linter
* linter
* fixed
* revert this
2024-07-11 16:19:13 +03:00
nimlgen
1678199b15
add update_copy to hcq spec ( #5348 )
...
* add update_copy to hcq spec
* fix amd
2024-07-09 20:44:44 +03:00
nimlgen
e815c57039
use hcq_profile in nv/amd program ( #5344 )
2024-07-09 15:56:06 +03:00
nimlgen
a2a9bfd2ec
nv correct error messages with ptx ( #5341 )
...
* nv correct error messages with ptx
* return compile error
2024-07-09 10:39:39 +03:00
nimlgen
51d6f372e4
nv get classes based on device ( #5325 )
...
* nv get classes
* support in mockgpu
* choose sm based on gpu
* fix
* fix
* fix arch
2024-07-08 18:25:05 +03:00
nimlgen
b0c5c58833
nv rm_control to rmctrl type ( #5327 )
...
* nv rm_control to rmctrl type
* fix
2024-07-08 17:24:33 +03:00
nimlgen
778d1cdbee
nv allocate local memory dynamically ( #5277 )
...
* nv allocate local memory dynamically
* fix
* linter
* linter 2
* linter
* fixes
2024-07-07 17:34:49 +03:00
nimlgen
2778b6046c
new memory scheduler ( #5278 )
...
* new memory schedule algo
* works
* fix
* fix
* linter
* tiny fixes
* do not optimize copy buffers
* mpre comments
* tiny cleanups
2024-07-04 18:06:04 +03:00
nimlgen
84b3e3bb6f
hcq exec no embedded signal ( #5142 )
2024-07-04 13:29:21 +03:00
nimlgen
21d41f06a2
nv follows HCQCompatAllocRes protocol ( #5275 )
...
* nv follows HCQCompatAllocRes protocol
* fix amd
2024-07-03 11:34:10 +03:00
Vyacheslav Pachkov
d3e4e21759
add return type for HCQCompatAllocator _alloc ( #5267 )
...
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
2024-07-03 10:25:44 +03:00
nimlgen
7be776f9af
add _alloc_signal/_free_signal to hcq ( #5264 )
...
* add _alloc_signal/_free_signal api
* oops, revert this
* linter
2024-07-02 23:35:39 +03:00
nimlgen
e050603b4b
nv close fds after mapping ( #5246 )
2024-07-02 13:57:46 +03:00
nimlgen
dd7eef7d71
libc defs to autogen ( #5217 )
...
* libc defs to autogen
* amd import libc
* linter
* better a bit
* remove comment, check this
* not hardcoded path
2024-06-29 14:37:33 +03:00
nimlgen
ee02dcb98e
nv supports PTX=1 ( #5222 )
...
* nv supports PTX=1
* not needed
* split nv compiler into nvrtc autogen
* remove to_c_array
* test
* Revert "test"
This reverts commit f0b56f308b .
2024-06-29 10:46:29 +03:00
nimlgen
ac748cccdb
nv apply relocs ( #5165 )
...
* nv do reloc
* a bit cleaner
2024-06-27 23:54:16 +03:00
Roelof van Dijk
01e8838b65
ruff: suppressible-exception ( #5182 )
...
* fix: use contextlib to suppress errors
* enable rule SIM105
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-06-27 08:23:44 -07:00
Roelof van Dijk
f88f71d73a
ruff: unnecessary-comprehension ( #5174 )
...
* enable ruff C416 unnecessary-comprehension
* already a list
2024-06-27 07:45:29 -04:00
nimlgen
69f116a7e1
nv/amd profiler ( #4718 )
...
* nv/amd profiler
* fix
* fix
* profile copies
* profile logger
* fixes
* more fixes
* less lines and fixes
* fixes
* some linter
* back sync, no related change
* fix gpu2cpu time def
* simpler
* linter
* linter
* docs
* add add_event api
2024-06-23 17:10:12 +03:00
nimlgen
2dcef5a0d7
hcq spec ( #5081 )
...
* hcq spec
* small change
* not used import
* fixes
* fix
* signals into base class
* more into base class
* remove imports
* fix wrap timeline
* raise when not implemented
* simpler
2024-06-22 15:32:12 +03:00
nimlgen
fb1bf48cfe
io_uring for copies from disk ( #5035 )
...
* exp uring
* fixes and old version
* nv
* cleaner
* cmp vs aio
* fix
* no lib
* fix nv
* linter
* disk_speed_test now runs default
* fixes
* uring -> io_uring
* linter happy
* get_temp_buf comment added
* tiny nits
* put wait back
* test runs everywhere
* remove consts
* remove mmap consts
* do not require iouring to run test, they are generic
2024-06-21 11:36:51 +03:00
chenyu
a8e9307e0b
pylint runtime/ and shape/ ( #5044 )
...
as pointed out by #4877 , need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime
2024-06-18 19:48:18 -04:00
nimlgen
194a168630
hcq signal scheduler ( #5016 )
...
* faster hcq
* fix nv
* linter
* cleaner
* fix sync
* cleaner
* a bit cleaner
2024-06-18 14:02:21 +03:00
nimlgen
794acefbf3
hcq update waits and signals in place ( #4984 )
...
* hcq update waits and signals in place
* start amd
* amd works
* prettier
* test
* normal messages
* linetr
* linter 2
2024-06-17 17:19:07 +03:00