Commit Graph

102 Commits

Author SHA1 Message Date
chenyu
a8e9307e0b pylint runtime/ and shape/ (#5044)
as pointed out by #4877, need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime
2024-06-18 19:48:18 -04:00
Roelof van Dijk
1785a70e77 fix: else-return on runtime (#4881)
* fix: add init file

* fix: no else-return

* fix: remove file again
2024-06-08 14:44:24 +02:00
chenyu
286b4dbdf2 compile raise CompileError and skip only RuntimeError in multiprocess… (#4646)
* compile raise CompileError and skip only RuntimeError in multiprocess beam

renderer error with multiprocess should not be skipped by beam

* use `==` for dtype to dtype comparison

* that needs to be is

* typo
2024-05-19 00:25:25 -04:00
George Hotz
347a3acb37 add renderer class (#4524)
* add renderer class

* tests pass

* fix pylint

* fix tensor cores
2024-05-10 21:40:02 -07:00
George Hotz
d438d5698d bring buffer back to device (#4517) 2024-05-10 11:22:31 -07:00
George Hotz
4eef1ee9bf move renderer into options (#4514)
* move renderer into options

* fix tests

* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz
89e119bc58 move Allocator to buffer.py (#4502)
* move Allocator to buffer.py

* move those to realize

* memory file

* cleanup
2024-05-09 19:45:56 -07:00
Sohaib
61c97d5305 refactor ops_gpu ctypes (#4331)
* refactor ops_gpu ctypes

- remove redundant byref as ctypes automatically handles passing `type` as
  `POINTER(type)`
- use walrus operator instead of init_c_var when possible

* clSetKernelArg argtype is POINTER(None)
2024-04-30 01:33:34 +08:00
chenyu
1de9778949 import Buffer and BufferOption from tinygrad.buffer (#4076) 2024-04-04 22:12:23 -04:00
chenyu
b47f6cebb2 LinearizerOptions -> CompilerOptions (#3978) 2024-03-28 17:50:23 -04:00
nimlgen
e2d6f76723 _alloc and _free with options (#3934)
* _alloc has options

* linter

* fix hsa
2024-03-26 09:11:41 -07:00
qazal
27f4de2ce4 delete half_prekernel (#3388)
* generic rendering of half and bf16

hotfix

* fix uops + regression test

* fix the test for metal's half4

* uop.uop fixup

* mypy with --strict-equality, fix ops_gpu
2024-02-14 15:40:48 +01:00
George Hotz
3c728d1082 compiler support (#3260)
* compiler support

* revert that

* fix tests
2024-01-26 23:36:40 -08:00
George Hotz
03a6bc59c1 move autogen to runtime/autogen (#3254) 2024-01-26 12:44:19 -08:00
George Hotz
a3869ffd46 move gpuctypes in tree (#3253)
* move gpuctypes in tree

* fix mypy

* regex exclude

* autogen sh

* mypy exclude

* does that fix it

* fix mypy

* add hip confirm

* verify all autogens

* build clang2py

* opencl headers

* gpu on 22.04
2024-01-26 12:25:03 -08:00
George Hotz
cb372b053f add device speed test (#3244) 2024-01-25 12:01:22 -08:00
George Hotz
ed8a32722a hip mutex signal (#3234)
* hip mutex

* hip mutex 2

* sync
2024-01-24 13:23:09 -08:00
George Hotz
23b084e70a add device name to device, all are constructed (#3221) 2024-01-23 20:34:56 -08:00
George Hotz
4a07ea355d buffer options should work (#3211)
* buffer options should work

* minor

* fix dtype
2024-01-22 19:23:55 -08:00
nimlgen
992067399e clean up exceptions in __del__ everywhere (#3165) 2024-01-18 08:34:09 -08:00
nimlgen
81ae4ea179 compile cache for several devices (#3148)
* compile cache for several devices

* ops_gpu uses hash to not care about sql

* hip rdna test with device

* linter happy

* no device passed where possible

* arch is optional to compile_{hip|cuda}
2024-01-16 11:45:26 -08:00
George Hotz
120c8b1841 update llvm api + add cache key (#3140)
* update llvm api + add cache key

* use_xcode is a different function

* types
2024-01-15 17:25:32 -08:00
chenyu
0fe6904351 use device from LinearizerOptions in kernel search (#3090)
* use device from LinearizerOptions in kernel search

removed all Device.DEFAULT in search.py

* pass device string for parallel pickle

* device for interpreted backends in LinearizerOptions
2024-01-11 14:46:03 -05:00
George Hotz
a280cfe169 move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
56f44bd10e move the compiler cache to be global (#2957)
* move the compiler cache to be global

* remove non robust test

* remove dead code
2024-01-01 10:59:56 -08:00
Marcus Asteborg
1fa4f161fe Update CLProgram to use unsigned long long for event profiling (#2808)
On Windows, the unsigned long type is 32-bit, which is not compatible
with the required data size for event profiling.
2023-12-16 23:48:44 -08:00
George Hotz
6d6eb9302d ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00
George Hotz
c53e854687 cast image doesn't work on nvidia (#2626)
* cast image doesn't work on nvidia

* hmm, interpreteds use buffer size 0

* fix type

* no lru
2023-12-05 12:48:19 -08:00
George Hotz
664475f247 vals is an argument (#2599)
* vals is an argument

* don't even know how that's legal python
2023-12-03 21:50:43 -08:00
George Hotz
fcd0b2ee6c fix multigpu on tinybox (#2595)
* fix multigpu on tinybox

* fixed multigpu
2023-12-03 16:48:07 -08:00
George Hotz
171543fc8d cleanups to save lines and files (#2577)
* runtime/graph -> features/graph

* put all the cstyle renderers in cstyle

* same line for those

* how did that pass mypy
2023-12-02 16:29:56 -08:00
nimlgen
065495e0c9 save a few lines in ops_gpu (#2564)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-12-02 15:05:22 -08:00
George Hotz
d6b404ac11 No dtype alloc (#2570)
* fix all allocs

* improve docs

* ugh fix fake alloc
2023-12-02 13:29:40 -08:00
George Hotz
5068e99d18 refactor to remove extra kernel params (#2563)
* refactor to have compiled kernel

* bugfixes

* docs/beautiful.py

* revert that

* fix tests
2023-12-02 00:32:25 -08:00
George Hotz
27481b9206 Switch ops_gpu -> gpuctypes (#2532)
* ops_gpu is go

* fix size 0

* fix image, and add more tests

* nerf openpilot test, doesn't test thneed

* run the schedule

* better

* oops, new inputs

* delete pyopencl

* Update ops_gpu.py
2023-12-01 22:30:21 -08:00
chenyu
67f4e03724 rewrite 0 size loadop into a CONST (#2556)
* rewrite 0 size loadop into a CONST

* check alloc size

* EMPTY is better

* Revert "EMPTY is better"

This reverts commit 574fe0f9ed28f1b97da5a81afdfd2cd5d9a94ff9.

* no ast is created

* fix test
2023-12-01 18:29:06 -05:00
George Hotz
2c363b5f0b new style device (#2530)
* cpu tests pass

* torch works

* works

* metal works

* fix ops_disk

* metal jit works

* fix openpilot

* llvm and clang work

* fix webgpu

* docs are rly broken

* LRU works on metal

* delete comment

* revert name to ._buf. LRU only on Compiled

* changes

* allocator

* allocator, getting closer

* lru alloc

* LRUAllocator

* all pass

* metal

* cuda

* test examples

* linearizer

* test fixes

* fix custom + clean realize

* fix hip

* skip tests

* fix tests

* fix size=0

* fix MOCKHIP

* fix thneed

* copy better

* simple

* old style metal copy

* fix thneed

* np reshape

* give cuda a device
2023-11-30 17:07:16 -08:00
George Hotz
756b01f46f why were these ever called buffer (#2483) 2023-11-27 21:02:07 -08:00
George Hotz
9e07824542 move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
andresgit
259a869fc1 Fix UnicodeDecodeError when debugging on Intel APU (#2421)
* test DEBUG=5

* print prg if NVIDIA, fixes error on Intel APU
2023-11-25 12:30:50 -08:00
George Hotz
cbb8486779 ResNet training changes (update benchmark) (#2390)
* default arg for chunk

* bring back to_

* good changes

* new set

* unused hash

* fix optim

* new torch loader

* fix test lr scheduler
2023-11-22 17:41:12 -08:00
valar
123ea051e6 refactor/ci: delete many # type: ignore (#2281)
* refactor/ci: delete many `# type: ignore`

* replace `axis.__class__ is int` with `isinstance(axis, int)` to make mypy happy
* add `--warn-unused-ignores` to mypy flag

refs #2240

* ci: move `--warn-unused-ignores` flag to mypy config

refs #2240
2023-11-12 11:04:20 -08:00
vish-pr
6051f0ce82 For cuda get current free space from device, and retry alloc failures (#2197)
* For cuda get current free space from device, and rery alloc failures

* type ignore for mypy

* add init to get free mem in cuda

* Move retry logic in common lib.

Fix typo in override _get_cur_free_space

* linter error fix in test file

* Not catch all, as it will catch KeyboardInterrupt

* fix unintened line changes
2023-11-09 15:53:50 -08:00
George Hotz
f17bc16f46 simple runtime args (#2211)
* simple runtime args

* fix some tests

* fix abstractions and triton

* fix search
2023-11-03 12:31:29 -07:00
George Hotz
03cf0afa4f move all to compile api (#2203)
* move metal+clang to compile api

* all to the new style

* remove binary arg

* fix triton

* fixup tests

* fix clang

* diskcache is generic

* __wrapped__

* compile_gpu

* fix thneed

* keep the src in the ASTRunner

* lib

* move compile_gpu

* compile_gpu in device

* put compiler in astrunner

* test reverts

* triton compiler

* ugh, that too
2023-11-01 23:01:32 -07:00
George Hotz
8932816816 remove arm64, caching for cuda (#2201)
* remove arm64, caching for cuda

* caching in llvm

* switch cache_compiled to new cache

* fix clang

* caching for metal

* fix pylint

* cleanups

* perf_counter and binary
2023-11-01 18:44:00 -07:00
nimlgen
8c07c73a9b Fix cl map buffer (#2190)
* fix gpu enqueue_map_buffer out of space

* add test
2023-10-31 12:02:46 -07:00
imaolo
228b310478 align cpu buffer before copy into cl buffer (#2135) 2023-10-23 21:04:35 -04:00
George Hotz
5472a14544 openpilot compile2 (#1977)
* start compile2

* tweak

* why are there two more kernels?

* minor cleanups

* don't break onnx tests

* add __metadata__ support to safetensors

* no early realize in onnx

* cleanups

* bugfix

* clean up image type, add optimize

* opt to match old

* try that

* opt work

* run compile2

* optimizer

* prt more

* prerealize

* imp

* NOLOCALS works

* no locals means no locals

* support fractional globals

* all locals welcome

* int that

* cleanups

* show gemv regression

* clean up diff

* use idx for the cond

* nolocals

---------

Co-authored-by: Comma Device <device@comma.ai>
2023-10-15 20:39:46 -07:00
qazal
71d93ffd79 Refactor GPU and Metal langauges in their own separate renderers (#2033)
* Refactor GPU and Metal langauges in their own separate renderers

* remove CStyleLanguage imports

* move renderers too
2023-10-10 07:46:41 -07:00