Commit Graph

23 Commits

Author SHA1 Message Date
George Hotz
b9570d6100 clean up update stats (#4226)
* WIP: clean up update stats

* line savings now

* fix graphs

* fix tests

* tighter prints

* remove extra jit=false

* debug=2 means wait

* that won't update stats

* still wait
2024-04-19 15:41:30 +04:00
nimlgen
24a27a01a9 hotfix: CUDA_P2P works (#4155) 2024-04-12 18:20:12 +03:00
nimlgen
5a57b48134 cuda p2p enable when available (#4153) 2024-04-12 16:21:54 +03:00
George Hotz
bbda20c0db CompiledASTRunner -> CompiledRunner (#4148) 2024-04-11 08:49:52 -07:00
George Hotz
b7e281cf10 JitItem -> ExecItem (#4146)
* JitItem -> ExecItem

* execitem in realize

* cleaner

* JITRunner -> Runner
2024-04-11 08:24:57 -07:00
George Hotz
081dd1573f hotfix: keep CUDA D2D copy behind the CUDA_P2P flag 2024-04-10 21:36:48 +00:00
George Hotz
af5984df43 cudagraph memcpy through host (#4137) 2024-04-10 13:17:17 -07:00
andresgit
7fd12aba85 graph remove input buffer references (#4100)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-08 16:49:16 -04:00
George Hotz
42b9d999ea Buffer isn't always allocated (#3974)
* buffer alloc

* allocate

* missing allocates

* last one
2024-03-28 13:33:47 -07:00
George Hotz
150ea2eb76 create engine folder and move code (#3948)
* retry

* older tf

* that
2024-03-26 20:38:03 -07:00
nimlgen
16e31f7f0d init multidevice cuda graph (#3858)
* init multidevice cuda graph

* cuda just works!

* clean

* linter happier

* liners happy

* update transfer inputs

* do not change free

* useless check for cuda

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-22 13:49:48 -07:00
nimlgen
b78352b423 do not create structs every call in CUDAProgram (#3855)
* do not create structs in cuda

* fix graph

* linter

* do not exec twice

* fix graph
2024-03-21 17:51:40 +03:00
nimlgen
e5745c1a0d fix nan on multigpus cuda (#3854) 2024-03-21 15:21:55 +03:00
George Hotz
41efaa848c move graph.py and jit.py into features (#3376)
* move graph.py into features

* move jit into features

* fix quickstart
2024-02-12 17:34:34 +01:00
George Hotz
03a6bc59c1 move autogen to runtime/autogen (#3254) 2024-01-26 12:44:19 -08:00
George Hotz
a3869ffd46 move gpuctypes in tree (#3253)
* move gpuctypes in tree

* fix mypy

* regex exclude

* autogen sh

* mypy exclude

* does that fix it

* fix mypy

* add hip confirm

* verify all autogens

* build clang2py

* opencl headers

* gpu on 22.04
2024-01-26 12:25:03 -08:00
nimlgen
992067399e clean up exceptions in __del__ everywhere (#3165) 2024-01-18 08:34:09 -08:00
George Hotz
67bc2ddfd8 JIT cleanups (#3164)
* move GraphException

* factor out apply_graph_to_jit

* that return was wrong
2024-01-17 23:39:57 -08:00
George Hotz
228f30b96a multitensor jit (#3149)
* initial multitensor jit support and tests

* Added graphs to multitensor jit and updated tests

* update unbind api

* fix set device, add TinyJit to resnet

* update_stats includes device

---------

Co-authored-by: ramenguy99 <ramenguy99@gmail.com>
2024-01-16 09:09:15 -08:00
George Hotz
a464909d79 fast resnet eval (#3135)
* fast resnet eval

* fix HIP multidevice graph

* neater expression for devices

* lines

* add decorator test
2024-01-15 14:15:18 -08:00
George Hotz
6617dcf095 move graph to runtime, check line count with sz.py (#2842)
* move graph to runtime, check line count with sz.py

* oops, didn't save

* dtype aliases

* restore comment, REALCOUNT
2023-12-18 20:30:06 -08:00
George Hotz
171543fc8d cleanups to save lines and files (#2577)
* runtime/graph -> features/graph

* put all the cstyle renderers in cstyle

* same line for those

* how did that pass mypy
2023-12-02 16:29:56 -08:00
nimlgen
badc97f824 hip & cuda to gpuctypes (#2539)
* cuda with gpuctypes

* hip gpuctypes

* graphs

* rename + linter happy

* use cpu_time_execution

* no ji in build_kernel_node_params

* remove hip_wrapper

* hip fix

* no arc

* smalle changes

* no clean moduke in cudacpu
2023-12-01 09:25:27 -08:00