Commit Graph

101 Commits

Author SHA1 Message Date
Ignacio Sica
459d0cd14f add arch to AMDRenderer and HIPRenderer (#9431) 2025-03-13 13:06:27 -03:00
chenyu
7ea633f94f remove from __future__ import annotations from runtimes [pr] (#8373)
it's not needed if we move the Device before Program and Allocator, which need Device.

not updating hcq because it has a lot more stuff, and CLDevice requires CLDevice
2024-12-21 23:46:07 -05:00
George Hotz
9c77e9f9b7 replace Tuple with tuple [pr] (#8344)
* replace Tuple with tuple [pr]

* replace List with list [pr]

* replace Dict with dict [pr]

* replace Set with set [pr]
2024-12-19 21:27:56 -08:00
George Hotz
c5d458ce02 BufferSpec and ProgramSpec [pr] (#7814)
* BufferSpec and ProgramSpec [pr]

* delete preallocate, it's unused

* Revert "delete preallocate, it's unused"

This reverts commit dcfcfaccde.
2024-11-21 12:18:05 +08:00
George Hotz
6688539bc9 rename device to dev so Buffer can be Allocator [pr] (#7799)
* rename device to dev to Buffer can be Allocator [pr]

* missed those

* update the Program classes also

* more renames

* oops
2024-11-20 15:47:26 +08:00
George Hotz
d71fe7faa5 rename allocator methods to not conflict [pr] (#7788)
* rename allocator methods to not conflict [pr]

* forgot those

* transfer + offset
2024-11-20 00:10:29 +08:00
George Hotz
6bb230287b pass the src into Metal [pr] (#7518)
* pass the src into Metal [pr]

* put that comment back

* keep old functionality

* move all to disassembler

* metal supports parallel beam

* touchups

* comment in correct place
2024-11-04 12:35:30 +08:00
nimlgen
137ad5519f amd fix cwsr for gfx11 (#6950)
* amd cwsr

* ()
2024-10-08 17:44:29 +03:00
chenyu
471b188d79 fix mypy errors in latest mypy (#5794)
* fix mypy errors in latest mypy

mypy has stricter partial and api arg checks now

* PYTHONPATH="."
2024-07-29 14:53:30 -04:00
chenyu
9838c1a6ff update import style in runtime (#5735) 2024-07-26 14:00:23 -04:00
George Hotz
5c688560bc move CUDA/HIP compilers to their own files [run_process_replay] (#5732) 2024-07-26 10:00:15 -07:00
nimlgen
7b7b751513 simple hip backend for debugging (#5201)
* hip backend

* fix mypy

* shorter

* fixes

* tiny changes
2024-06-30 23:00:11 +03:00
George Hotz
53adcb34f5 remove hip backend (#3783)
* remove hip backend

* remove unused

* rhip

* more RHIP
2024-03-17 10:12:16 -07:00
Francis Lam
b6e2495fdd kernel: limit shared memory usage when adding opts (#3705)
* kernel: limit shared memory usage when adding opts

* search: remove unnecessary limit on search space

apply_opt will do the more correct check
2024-03-12 17:06:21 -04:00
George Hotz
fe97a85014 the compiler is a driver (#3427) 2024-02-16 10:18:09 +01:00
nimlgen
002bf380b0 hsa runtime (#3382)
* hsa init

* handles transfer

* linter

* clean up hwqueue

* fix sync freezes

* print errors
2024-02-15 14:14:34 +01:00
George Hotz
d1fb1e0ba4 full sync to fix HIP memory leak (#3364) 2024-02-10 11:50:27 +01:00
George Hotz
c32ea95d7d Python uop emulator (#3327)
* start uop emu

* tiny_add passes

* more ops

* emulate the whole warp

* test_gemm passes

* metal gemm test pass

* works on big gemm

* works on big gemm

* more tests pass

* touch ups

* fix mypy

* cleanups

* exp2 mypy

* arch is where it belongs

* actually emulate tensor cores

* fix test

* new style
2024-02-08 19:24:55 +01:00
qazal
5b46b0ff3d Simple RDNA3 emulator (#2974)
* mockhip->hipcpu

* allocate buffers

* launch a kernel

read_asm api

* run remu in CI

* remu 0.0.2, real test ops

* simple driver

* 0.0.3, all test_ops

* run the latest emulator

* 9 minutes is way too long, drop backprop in CI

* bring back the backward pass

* Revert "bring back the backward pass"

This reverts commit 3781e1bc56.

* Print slowest tests

* emulated device directly in ops_hip

* fix ruff, override mypy for specific rules

* test in the same code path

- hip backend env variables

- install packages and verify autogen

- run certain tests

- remove the other hip tests path

- verify Device.DEFAULT

* remove the emulated hip in extra

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-01-30 10:39:28 -08:00
George Hotz
3c728d1082 compiler support (#3260)
* compiler support

* revert that

* fix tests
2024-01-26 23:36:40 -08:00
George Hotz
473935125a use comgr to compile (#3248)
* use comgr to compile

* fast

* bfloat16

* move comgr to it's own file

* cleaner style

* comgr in new place

* comgr free + dtype cleanup
2024-01-26 18:27:49 -08:00
George Hotz
c4d870db0d fix jit realize issue (#3258) 2024-01-26 18:27:35 -08:00
George Hotz
03a6bc59c1 move autogen to runtime/autogen (#3254) 2024-01-26 12:44:19 -08:00
George Hotz
a3869ffd46 move gpuctypes in tree (#3253)
* move gpuctypes in tree

* fix mypy

* regex exclude

* autogen sh

* mypy exclude

* does that fix it

* fix mypy

* add hip confirm

* verify all autogens

* build clang2py

* opencl headers

* gpu on 22.04
2024-01-26 12:25:03 -08:00
George Hotz
7feeb118e6 hip launch speed (#3246)
* faster HIP kernel launch

* args

* expand compile_hip
2024-01-25 15:13:55 -08:00
George Hotz
cb372b053f add device speed test (#3244) 2024-01-25 12:01:22 -08:00
George Hotz
a8fbb03438 minor hip cleanups (#3237) 2024-01-24 15:13:38 -08:00
George Hotz
ed8a32722a hip mutex signal (#3234)
* hip mutex

* hip mutex 2

* sync
2024-01-24 13:23:09 -08:00
George Hotz
47f9887ce4 hip events work (#3229)
* hip events work

* event
2024-01-24 11:49:53 -08:00
George Hotz
e2e4632aea LoadOps SYNC (#3223)
* LoadOps SYNC and WAIT

* no wait, only sync

* DEBUG >= 1

* track cross device
2024-01-23 21:59:18 -08:00
George Hotz
23b084e70a add device name to device, all are constructed (#3221) 2024-01-23 20:34:56 -08:00
George Hotz
4a07ea355d buffer options should work (#3211)
* buffer options should work

* minor

* fix dtype
2024-01-22 19:23:55 -08:00
George Hotz
c80884884e event driven hip (#3160)
* event driven hip

* simpler, src makes copy

* pass mypy
2024-01-18 14:35:18 -08:00
George Hotz
9cc2577a08 use hip events (#3157)
* use hip events

* cleanup
2024-01-17 10:39:57 -08:00
nimlgen
81ae4ea179 compile cache for several devices (#3148)
* compile cache for several devices

* ops_gpu uses hash to not care about sql

* hip rdna test with device

* linter happy

* no device passed where possible

* arch is optional to compile_{hip|cuda}
2024-01-16 11:45:26 -08:00
George Hotz
120c8b1841 update llvm api + add cache key (#3140)
* update llvm api + add cache key

* use_xcode is a different function

* types
2024-01-15 17:25:32 -08:00
nimlgen
cf1d0a6704 no exceptions in __del__ when module creation is failed in hip/cuda (#3107) 2024-01-13 12:03:55 -05:00
chenyu
0fe6904351 use device from LinearizerOptions in kernel search (#3090)
* use device from LinearizerOptions in kernel search

removed all Device.DEFAULT in search.py

* pass device string for parallel pickle

* device for interpreted backends in LinearizerOptions
2024-01-11 14:46:03 -05:00
George Hotz
60abc62a3f fast hip read (#3014)
* fast hip read

* hip read faster

* fix tests

* to_mv

* simplify

* bump to 6k lines
2024-01-05 10:33:13 -08:00
George Hotz
c2a044ed83 disk_read_speed example 2024-01-04 13:59:43 -08:00
George Hotz
65dc3700b7 hip device is default on supported platforms (#2993) 2024-01-03 13:42:13 -08:00
George Hotz
753a7ecc05 Hip driver (#2992)
* start hip driver

* fix hip llama

* make HIP default if we can

* don't change those
2024-01-03 12:53:47 -08:00
George Hotz
56f44bd10e move the compiler cache to be global (#2957)
* move the compiler cache to be global

* remove non robust test

* remove dead code
2024-01-01 10:59:56 -08:00
George Hotz
6617dcf095 move graph to runtime, check line count with sz.py (#2842)
* move graph to runtime, check line count with sz.py

* oops, didn't save

* dtype aliases

* restore comment, REALCOUNT
2023-12-18 20:30:06 -08:00
George Hotz
6d6eb9302d ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00
George Hotz
0fd44259cd bf16 fix + cleanups from mixtral (#2698)
* bf16 fix + cleanups from mixtral

* generic bf16 cast
2023-12-10 16:31:52 -08:00
George Hotz
4164d0ebbd multitensor start (#2676)
* multitensor work

* early gen fixes the tests

* atol for flaky test
2023-12-07 17:07:05 -08:00
George Hotz
41d696145d hotfix: forking works okay in HIP now 2023-12-04 21:59:18 +00:00
George Hotz
09b6e254a3 hip compile speed (#2606) 2023-12-04 13:47:40 -08:00
George Hotz
664475f247 vals is an argument (#2599)
* vals is an argument

* don't even know how that's legal python
2023-12-03 21:50:43 -08:00