Commit Graph

128 Commits

Author SHA1 Message Date
nimlgen
1903542c2d nv/cuda compilers touchup (#5759)
* nv/cuda compilers touchup

* fix cuda check + move nv disasm

* remove includes

* fix nvrtc_check
2024-07-28 00:15:28 +03:00
chenyu
9838c1a6ff update import style in runtime (#5735) 2024-07-26 14:00:23 -04:00
George Hotz
5c688560bc move CUDA/HIP compilers to their own files [run_process_replay] (#5732) 2024-07-26 10:00:15 -07:00
nimlgen
baface413a nv better nvdisasm fail message (#5682)
* nv better nvdisasm message

* cuda
2024-07-24 16:19:26 +03:00
nimlgen
b4c49ae3fa remove cudacpu in favour of mockgpu (#5225)
* remove cudacpu in favour of mockgpu

* remove unused import

* not used as well
2024-06-29 11:05:16 +03:00
nimlgen
ee02dcb98e nv supports PTX=1 (#5222)
* nv supports PTX=1

* not needed

* split nv compiler into nvrtc autogen

* remove to_c_array

* test

* Revert "test"

This reverts commit f0b56f308b.
2024-06-29 10:46:29 +03:00
chenyu
a8e9307e0b pylint runtime/ and shape/ (#5044)
as pointed out by #4877, need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime
2024-06-18 19:48:18 -04:00
Roelof van Dijk
0eebb8e998 fix: _free should not return (#4880) 2024-06-08 14:45:06 +02:00
Roelof van Dijk
1785a70e77 fix: else-return on runtime (#4881)
* fix: add init file

* fix: no else-return

* fix: remove file again
2024-06-08 14:44:24 +02:00
Szymon Ożóg
f7201b6852 Remove deprecated code (#4724) 2024-05-25 03:02:12 -04:00
chenyu
286b4dbdf2 compile raise CompileError and skip only RuntimeError in multiprocess… (#4646)
* compile raise CompileError and skip only RuntimeError in multiprocess beam

renderer error with multiprocess should not be skipped by beam

* use `==` for dtype to dtype comparison

* that needs to be is

* typo
2024-05-19 00:25:25 -04:00
George Hotz
347a3acb37 add renderer class (#4524)
* add renderer class

* tests pass

* fix pylint

* fix tensor cores
2024-05-10 21:40:02 -07:00
George Hotz
d438d5698d bring buffer back to device (#4517) 2024-05-10 11:22:31 -07:00
George Hotz
4eef1ee9bf move renderer into options (#4514)
* move renderer into options

* fix tests

* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz
89e119bc58 move Allocator to buffer.py (#4502)
* move Allocator to buffer.py

* move those to realize

* memory file

* cleanup
2024-05-09 19:45:56 -07:00
George Hotz
9fc4465557 subbuffer support (#4397)
* subbuffer support

* diskbuffer offset

* cuda subbuffer works

* use subbuffer

* more subbuffer tests

* consecutive

* cast

* consec

* offset

* view is a better name

* offset is in nbytes

* fix view + memory planner

* delete unused DiskRunner

* reverse order

* no subbuffers on unrealized consts

* only enabled for disk

* don't reverse memory

* view supported devices

* pickle buffer view

* ring jit

* support extra view inputs in jit

* fix JIT=2 issue

* test copy jit

* p2p isn't an option anymore

* fix dep tracking issue

* fix mypy

* fix pickle

* from_nv is contents now
2024-05-03 18:05:57 -07:00
George Hotz
60e3aa5cb1 more docs (#4271)
* more work on docs

* CompilerOptions is dataclass
2024-04-24 10:52:42 +08:00
Micah Zoltu
7bc862767c Improves error message when CUDA module fails to load. (#4243) 2024-04-21 11:10:14 -04:00
nimlgen
5a57b48134 cuda p2p enable when available (#4153) 2024-04-12 16:21:54 +03:00
George Hotz
af5984df43 cudagraph memcpy through host (#4137) 2024-04-10 13:17:17 -07:00
chenyu
1de9778949 import Buffer and BufferOption from tinygrad.buffer (#4076) 2024-04-04 22:12:23 -04:00
chenyu
b47f6cebb2 LinearizerOptions -> CompilerOptions (#3978) 2024-03-28 17:50:23 -04:00
nimlgen
e2d6f76723 _alloc and _free with options (#3934)
* _alloc has options

* linter

* fix hsa
2024-03-26 09:11:41 -07:00
nimlgen
739f47eb0f check on cuEventSynchronize (#3933) 2024-03-26 16:14:38 +03:00
nimlgen
f2a9ea4ea9 lru allocator for copyin host buffers (#3918)
* lru allocator for copyin host buffers

* linter happy
2024-03-25 15:57:18 +03:00
George Hotz
e0e234bf94 hotfix, str compare version for cuda 2024-03-24 20:35:24 -07:00
Arseny Kapoulkine
715850aef9 Fix sm89 PTX=1 compilation (#3915)
* Fix sm89 PTX=1 compilation

The minimum PTX version that supports sm89 is 7.8 (same version also
supports sm90); without this ptxas fails when running tinygrad with
PTX=1 on RTX 4090.

* Use int(arch[3:]) for forward compat with SM10.0 if that happens
2024-03-24 20:32:29 -07:00
sekstini
7c3632fd1e add --minimal flag to nvrtc (#3899) 2024-03-23 16:38:31 -07:00
George Hotz
46a3501cec nv ioctl sniffer (#3892)
* nv ioctl sniffer

* unused import

* Update __init__.py

* that work

* that fix it
2024-03-23 00:29:30 -07:00
chenyu
1c51d586ea replace raise Exception with specific errors (#3874) 2024-03-22 12:32:21 -04:00
nimlgen
8ef5490ec8 cuda tranfer + async copyin (#3873) 2024-03-22 09:01:37 -07:00
Szymon Ożóg
624bc89910 PTX - implement float 4, ptr arithmetics and other speed improvements (#3775)
* ptx float4 implementation

* remove from cache when trimming uops

* Gate for float4

* Linting fix

* disable test reasonable time for ptx

* import getenv

* Update uops.py

* linter

* Add div test for half

* upcast if op does not support operation

* fix offset

* Run only if dtype supported

* zero out registers when accessing by pred + cleanup

* Remove trailing whitespace

* revert

* spacing fix

* move cache clearing outside loop

* did this suddenly start working?

* unused import removed

* Remove cast

* Use pattern matching

* linting

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-22 08:54:02 -07:00
nimlgen
b78352b423 do not create structs every call in CUDAProgram (#3855)
* do not create structs in cuda

* fix graph

* linter

* do not exec twice

* fix graph
2024-03-21 17:51:40 +03:00
Francis Lam
b6e2495fdd kernel: limit shared memory usage when adding opts (#3705)
* kernel: limit shared memory usage when adding opts

* search: remove unnecessary limit on search space

apply_opt will do the more correct check
2024-03-12 17:06:21 -04:00
George Hotz
ac02e7347d ptx timing vs cuda timing (#3659) 2024-03-08 10:17:49 -08:00
George Hotz
6e50582e62 working to improve ptx (#3647)
* working to improve ptx

* fix compile fail
2024-03-07 12:39:31 -08:00
George Hotz
81baf3eed3 bring ptx back (#3623)
* bring ptx back

* ptx back

* fix define var

* fix a few bugs

* bugfixes

* fixes

* fix llvm bug

* fix test bug
2024-03-06 13:34:21 -08:00
Francis Lam
e17f1821a7 wmma: add CUDA tensor core and fix test_speed_v_torch failure (#3544) 2024-03-01 17:51:02 -08:00
nimlgen
94b7ac7a29 no cuda compile helper (#3512) 2024-02-28 01:50:10 +01:00
George Hotz
7698781389 Revert "wmma: add CUDA tensor core (#3464)" (#3474)
This reverts commit e9cef13f0b.
2024-02-22 11:58:16 +01:00
Francis Lam
e9cef13f0b wmma: add CUDA tensor core (#3464) 2024-02-22 11:57:08 +01:00
George Hotz
3c728d1082 compiler support (#3260)
* compiler support

* revert that

* fix tests
2024-01-26 23:36:40 -08:00
George Hotz
03a6bc59c1 move autogen to runtime/autogen (#3254) 2024-01-26 12:44:19 -08:00
George Hotz
a3869ffd46 move gpuctypes in tree (#3253)
* move gpuctypes in tree

* fix mypy

* regex exclude

* autogen sh

* mypy exclude

* does that fix it

* fix mypy

* add hip confirm

* verify all autogens

* build clang2py

* opencl headers

* gpu on 22.04
2024-01-26 12:25:03 -08:00
George Hotz
cb372b053f add device speed test (#3244) 2024-01-25 12:01:22 -08:00
nimlgen
3205fd8481 fix cuda device var rewrite (#3233) 2024-01-24 16:57:49 -05:00
George Hotz
83d614295e reduce lines (#3230) 2024-01-24 10:35:59 -08:00
George Hotz
23b084e70a add device name to device, all are constructed (#3221) 2024-01-23 20:34:56 -08:00
nimlgen
81ae4ea179 compile cache for several devices (#3148)
* compile cache for several devices

* ops_gpu uses hash to not care about sql

* hip rdna test with device

* linter happy

* no device passed where possible

* arch is optional to compile_{hip|cuda}
2024-01-16 11:45:26 -08:00
George Hotz
120c8b1841 update llvm api + add cache key (#3140)
* update llvm api + add cache key

* use_xcode is a different function

* types
2024-01-15 17:25:32 -08:00