Ignacio Sica
459d0cd14f
add arch to AMDRenderer and HIPRenderer ( #9431 )
2025-03-13 13:06:27 -03:00
chenyu
7ea633f94f
remove from __future__ import annotations from runtimes [pr] ( #8373 )
...
it's not needed if we move the Device before Program and Allocator, which need Device.
not updating hcq because it has a lot more stuff, and CLDevice requires CLDevice
2024-12-21 23:46:07 -05:00
George Hotz
9c77e9f9b7
replace Tuple with tuple [pr] ( #8344 )
...
* replace Tuple with tuple [pr]
* replace List with list [pr]
* replace Dict with dict [pr]
* replace Set with set [pr]
2024-12-19 21:27:56 -08:00
George Hotz
c5d458ce02
BufferSpec and ProgramSpec [pr] ( #7814 )
...
* BufferSpec and ProgramSpec [pr]
* delete preallocate, it's unused
* Revert "delete preallocate, it's unused"
This reverts commit dcfcfaccde .
2024-11-21 12:18:05 +08:00
George Hotz
6688539bc9
rename device to dev so Buffer can be Allocator [pr] ( #7799 )
...
* rename device to dev to Buffer can be Allocator [pr]
* missed those
* update the Program classes also
* more renames
* oops
2024-11-20 15:47:26 +08:00
George Hotz
d71fe7faa5
rename allocator methods to not conflict [pr] ( #7788 )
...
* rename allocator methods to not conflict [pr]
* forgot those
* transfer + offset
2024-11-20 00:10:29 +08:00
George Hotz
6bb230287b
pass the src into Metal [pr] ( #7518 )
...
* pass the src into Metal [pr]
* put that comment back
* keep old functionality
* move all to disassembler
* metal supports parallel beam
* touchups
* comment in correct place
2024-11-04 12:35:30 +08:00
nimlgen
137ad5519f
amd fix cwsr for gfx11 ( #6950 )
...
* amd cwsr
* ()
2024-10-08 17:44:29 +03:00
chenyu
471b188d79
fix mypy errors in latest mypy ( #5794 )
...
* fix mypy errors in latest mypy
mypy has stricter partial and api arg checks now
* PYTHONPATH="."
2024-07-29 14:53:30 -04:00
chenyu
9838c1a6ff
update import style in runtime ( #5735 )
2024-07-26 14:00:23 -04:00
George Hotz
5c688560bc
move CUDA/HIP compilers to their own files [run_process_replay] ( #5732 )
2024-07-26 10:00:15 -07:00
nimlgen
7b7b751513
simple hip backend for debugging ( #5201 )
...
* hip backend
* fix mypy
* shorter
* fixes
* tiny changes
2024-06-30 23:00:11 +03:00
George Hotz
53adcb34f5
remove hip backend ( #3783 )
...
* remove hip backend
* remove unused
* rhip
* more RHIP
2024-03-17 10:12:16 -07:00
Francis Lam
b6e2495fdd
kernel: limit shared memory usage when adding opts ( #3705 )
...
* kernel: limit shared memory usage when adding opts
* search: remove unnecessary limit on search space
apply_opt will do the more correct check
2024-03-12 17:06:21 -04:00
George Hotz
fe97a85014
the compiler is a driver ( #3427 )
2024-02-16 10:18:09 +01:00
nimlgen
002bf380b0
hsa runtime ( #3382 )
...
* hsa init
* handles transfer
* linter
* clean up hwqueue
* fix sync freezes
* print errors
2024-02-15 14:14:34 +01:00
George Hotz
d1fb1e0ba4
full sync to fix HIP memory leak ( #3364 )
2024-02-10 11:50:27 +01:00
George Hotz
c32ea95d7d
Python uop emulator ( #3327 )
...
* start uop emu
* tiny_add passes
* more ops
* emulate the whole warp
* test_gemm passes
* metal gemm test pass
* works on big gemm
* works on big gemm
* more tests pass
* touch ups
* fix mypy
* cleanups
* exp2 mypy
* arch is where it belongs
* actually emulate tensor cores
* fix test
* new style
2024-02-08 19:24:55 +01:00
qazal
5b46b0ff3d
Simple RDNA3 emulator ( #2974 )
...
* mockhip->hipcpu
* allocate buffers
* launch a kernel
read_asm api
* run remu in CI
* remu 0.0.2, real test ops
* simple driver
* 0.0.3, all test_ops
* run the latest emulator
* 9 minutes is way too long, drop backprop in CI
* bring back the backward pass
* Revert "bring back the backward pass"
This reverts commit 3781e1bc56 .
* Print slowest tests
* emulated device directly in ops_hip
* fix ruff, override mypy for specific rules
* test in the same code path
- hip backend env variables
- install packages and verify autogen
- run certain tests
- remove the other hip tests path
- verify Device.DEFAULT
* remove the emulated hip in extra
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-01-30 10:39:28 -08:00
George Hotz
3c728d1082
compiler support ( #3260 )
...
* compiler support
* revert that
* fix tests
2024-01-26 23:36:40 -08:00
George Hotz
473935125a
use comgr to compile ( #3248 )
...
* use comgr to compile
* fast
* bfloat16
* move comgr to it's own file
* cleaner style
* comgr in new place
* comgr free + dtype cleanup
2024-01-26 18:27:49 -08:00
George Hotz
c4d870db0d
fix jit realize issue ( #3258 )
2024-01-26 18:27:35 -08:00
George Hotz
03a6bc59c1
move autogen to runtime/autogen ( #3254 )
2024-01-26 12:44:19 -08:00
George Hotz
a3869ffd46
move gpuctypes in tree ( #3253 )
...
* move gpuctypes in tree
* fix mypy
* regex exclude
* autogen sh
* mypy exclude
* does that fix it
* fix mypy
* add hip confirm
* verify all autogens
* build clang2py
* opencl headers
* gpu on 22.04
2024-01-26 12:25:03 -08:00
George Hotz
7feeb118e6
hip launch speed ( #3246 )
...
* faster HIP kernel launch
* args
* expand compile_hip
2024-01-25 15:13:55 -08:00
George Hotz
cb372b053f
add device speed test ( #3244 )
2024-01-25 12:01:22 -08:00
George Hotz
a8fbb03438
minor hip cleanups ( #3237 )
2024-01-24 15:13:38 -08:00
George Hotz
ed8a32722a
hip mutex signal ( #3234 )
...
* hip mutex
* hip mutex 2
* sync
2024-01-24 13:23:09 -08:00
George Hotz
47f9887ce4
hip events work ( #3229 )
...
* hip events work
* event
2024-01-24 11:49:53 -08:00
George Hotz
e2e4632aea
LoadOps SYNC ( #3223 )
...
* LoadOps SYNC and WAIT
* no wait, only sync
* DEBUG >= 1
* track cross device
2024-01-23 21:59:18 -08:00
George Hotz
23b084e70a
add device name to device, all are constructed ( #3221 )
2024-01-23 20:34:56 -08:00
George Hotz
4a07ea355d
buffer options should work ( #3211 )
...
* buffer options should work
* minor
* fix dtype
2024-01-22 19:23:55 -08:00
George Hotz
c80884884e
event driven hip ( #3160 )
...
* event driven hip
* simpler, src makes copy
* pass mypy
2024-01-18 14:35:18 -08:00
George Hotz
9cc2577a08
use hip events ( #3157 )
...
* use hip events
* cleanup
2024-01-17 10:39:57 -08:00
nimlgen
81ae4ea179
compile cache for several devices ( #3148 )
...
* compile cache for several devices
* ops_gpu uses hash to not care about sql
* hip rdna test with device
* linter happy
* no device passed where possible
* arch is optional to compile_{hip|cuda}
2024-01-16 11:45:26 -08:00
George Hotz
120c8b1841
update llvm api + add cache key ( #3140 )
...
* update llvm api + add cache key
* use_xcode is a different function
* types
2024-01-15 17:25:32 -08:00
nimlgen
cf1d0a6704
no exceptions in __del__ when module creation is failed in hip/cuda ( #3107 )
2024-01-13 12:03:55 -05:00
chenyu
0fe6904351
use device from LinearizerOptions in kernel search ( #3090 )
...
* use device from LinearizerOptions in kernel search
removed all Device.DEFAULT in search.py
* pass device string for parallel pickle
* device for interpreted backends in LinearizerOptions
2024-01-11 14:46:03 -05:00
George Hotz
60abc62a3f
fast hip read ( #3014 )
...
* fast hip read
* hip read faster
* fix tests
* to_mv
* simplify
* bump to 6k lines
2024-01-05 10:33:13 -08:00
George Hotz
c2a044ed83
disk_read_speed example
2024-01-04 13:59:43 -08:00
George Hotz
65dc3700b7
hip device is default on supported platforms ( #2993 )
2024-01-03 13:42:13 -08:00
George Hotz
753a7ecc05
Hip driver ( #2992 )
...
* start hip driver
* fix hip llama
* make HIP default if we can
* don't change those
2024-01-03 12:53:47 -08:00
George Hotz
56f44bd10e
move the compiler cache to be global ( #2957 )
...
* move the compiler cache to be global
* remove non robust test
* remove dead code
2024-01-01 10:59:56 -08:00
George Hotz
6617dcf095
move graph to runtime, check line count with sz.py ( #2842 )
...
* move graph to runtime, check line count with sz.py
* oops, didn't save
* dtype aliases
* restore comment, REALCOUNT
2023-12-18 20:30:06 -08:00
George Hotz
6d6eb9302d
ruff checks the max line length is 150 ( #2734 )
...
* ruff checks the max line length is 150
* fix tensor.py
* a lot more
* done
2023-12-12 17:34:47 -08:00
George Hotz
0fd44259cd
bf16 fix + cleanups from mixtral ( #2698 )
...
* bf16 fix + cleanups from mixtral
* generic bf16 cast
2023-12-10 16:31:52 -08:00
George Hotz
4164d0ebbd
multitensor start ( #2676 )
...
* multitensor work
* early gen fixes the tests
* atol for flaky test
2023-12-07 17:07:05 -08:00
George Hotz
41d696145d
hotfix: forking works okay in HIP now
2023-12-04 21:59:18 +00:00
George Hotz
09b6e254a3
hip compile speed ( #2606 )
2023-12-04 13:47:40 -08:00
George Hotz
664475f247
vals is an argument ( #2599 )
...
* vals is an argument
* don't even know how that's legal python
2023-12-03 21:50:43 -08:00