Commit Graph

153 Commits

Author SHA1 Message Date
Christopher Milan
c6ba016da6 fix cuda check (#13726) 2025-12-16 18:00:09 -05:00
nimlgen
77a76d1b13 device: respect compiler ContextVars (#13523)
* device: envvars for cc

* fix

* fix

* x

* um

* fix

* remote

* em

* cleanup

* typing

* fix

* debug

* lvp?

* ugh

* singl

* rm

* lol

* fix

* ?

* this?

* why?

* rev

* mod test

* l
2025-12-02 14:42:04 +03:00
Christopher Milan
09f3aae169 In-tree autogen: all C libraries (#13220)
* checkout files from autogen branch

* ioctl with payload

* fix am generations

* properly fix generations

This reverts commit b2a54f4f41.

* revert discovery.h

* support pragma pack(1)

* typo

* better getter

* typo

* NVCEC0_QMDV05_00_RELEASE[01]_ENABLE

* align support

* anon handling fix

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-13 18:57:44 -08:00
wozeparrot
c3149c618a feat: nvcc compiler (#12852) 2025-10-21 11:31:23 -07:00
George Hotz
1ecf403294 cleanup long lines [pr] (#12623)
* cleanup long lines

* more

* a few more

* all noqa fixed

* fix amd + cuda

* clean that up
2025-10-12 20:18:05 +08:00
chenyu
585bd95b50 fix ruff 0.14.0 [pr] (#12547) 2025-10-09 01:52:30 -04:00
nimlgen
fb96394ff5 auto-select available compilers (#12094)
* device: auto select compilers

* fix

* metal+opencl

* nv/cuda

* test without ptx

* ptx

* fix tests

* fix

* fix test

* rename

* test + cleaner

* xx

* ops

* better test

* win?

* um?

* types

* debug

* win??

* sep rung

* wtf?

* debug

* skip win

* revert this

* types
2025-09-10 19:52:01 +03:00
nimlgen
75c2c42def suppress exceptions only during finalization (#11451)
* suppress exceptions only during finalization

* fix

* fix typing

* fix more warns

* fix

* better?

* Revert "better?"

This reverts commit a068aa5793.

* mm?

* no as e
2025-07-31 13:57:12 +03:00
nimlgen
188ed38315 replace from_mv with lightweight mv_address (#11280) 2025-07-19 13:50:51 +03:00
George Hotz
67a1c92fc0 remove del spam from CI (#10699)
* remove del spam from CI

* more

* preconstruct default buffer spec

* ignore those errors

* check exception

* more exception check

* skip stuff
2025-06-08 10:14:30 -07:00
Ignacio Sica
f69722dc2a refactor cuda disassemble (#10449) 2025-05-22 08:58:24 -07:00
uuuvn
7bc4864bc4 Make dev a property of Allocator (#10286)
* Make `dev` a property of `Allocator`

(this is a prereq refactor for #10285)

At least `BufferXfer.copy` accesses it assuming it's always present,
currently most devices just add this property on their own repeating
the same code over and over again.

This is also a bit footguny, see `RemoteAllocator` that named this
property `device` instead of `dev`, i could obviously just change that
in one place but doing it globally seems like a better solution (and it
reduces code duplication too).

`MallocAllocator` is a bit special, but passing `None` works just fine.

* typing

* ignore type instead of cast
2025-05-13 17:01:01 -07:00
Ignacio Sica
cfad139189 bump assembly debug to 7 (#9662) 2025-04-01 11:51:33 +08:00
nimlgen
1d06d61b16 from_blob for cuda (#9223)
* from_blob for cuda

* maybe docs?

* minor docs

* example

* waiting 9224

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-24 14:02:06 +03:00
nimlgen
9bc317d5d2 mockcuda (#8503)
* init mockcuda

* run gpu ocelot

* fix

* sfixes

* disable broken tests

* linter

* these fails as well

* pylint

* myypy

* this fails on real platforms as well

* mypy please
2025-01-05 01:23:57 +03:00
nimlgen
90f1f0c9d5 eh (#8309)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-26 13:16:34 -05:00
George Hotz
62e5d96446 more typing work [pr] (#8345) 2024-12-19 21:46:35 -08:00
George Hotz
9c77e9f9b7 replace Tuple with tuple [pr] (#8344)
* replace Tuple with tuple [pr]

* replace List with list [pr]

* replace Dict with dict [pr]

* replace Set with set [pr]
2024-12-19 21:27:56 -08:00
George Hotz
c5d458ce02 BufferSpec and ProgramSpec [pr] (#7814)
* BufferSpec and ProgramSpec [pr]

* delete preallocate, it's unused

* Revert "delete preallocate, it's unused"

This reverts commit dcfcfaccde.
2024-11-21 12:18:05 +08:00
George Hotz
6688539bc9 rename device to dev so Buffer can be Allocator [pr] (#7799)
* rename device to dev to Buffer can be Allocator [pr]

* missed those

* update the Program classes also

* more renames

* oops
2024-11-20 15:47:26 +08:00
George Hotz
d71fe7faa5 rename allocator methods to not conflict [pr] (#7788)
* rename allocator methods to not conflict [pr]

* forgot those

* transfer + offset
2024-11-20 00:10:29 +08:00
chenyu
348d37df46 a few more unused type ignore [pr] (#7568) 2024-11-06 10:17:19 -05:00
nimlgen
99fb115791 cuda correct pointer type (#7153) 2024-10-18 22:39:59 +03:00
George Hotz
ca0dca35f7 move ptx renderer [pr] (#7118) 2024-10-17 14:50:32 +08:00
Francis Lam
b0dd407cdd ops_cuda: add optional dynamic smem parameter (#6956)
* ops_cuda: add optional dynamic smem parameter

This is required to enable larger than 48kb shared memory usage on
a per-kernel basis.

* move setting max dynamic smem size to init
2024-10-11 21:51:06 +03:00
nimlgen
1903542c2d nv/cuda compilers touchup (#5759)
* nv/cuda compilers touchup

* fix cuda check + move nv disasm

* remove includes

* fix nvrtc_check
2024-07-28 00:15:28 +03:00
chenyu
9838c1a6ff update import style in runtime (#5735) 2024-07-26 14:00:23 -04:00
George Hotz
5c688560bc move CUDA/HIP compilers to their own files [run_process_replay] (#5732) 2024-07-26 10:00:15 -07:00
nimlgen
baface413a nv better nvdisasm fail message (#5682)
* nv better nvdisasm message

* cuda
2024-07-24 16:19:26 +03:00
nimlgen
b4c49ae3fa remove cudacpu in favour of mockgpu (#5225)
* remove cudacpu in favour of mockgpu

* remove unused import

* not used as well
2024-06-29 11:05:16 +03:00
nimlgen
ee02dcb98e nv supports PTX=1 (#5222)
* nv supports PTX=1

* not needed

* split nv compiler into nvrtc autogen

* remove to_c_array

* test

* Revert "test"

This reverts commit f0b56f308b.
2024-06-29 10:46:29 +03:00
chenyu
a8e9307e0b pylint runtime/ and shape/ (#5044)
as pointed out by #4877, need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime
2024-06-18 19:48:18 -04:00
Roelof van Dijk
0eebb8e998 fix: _free should not return (#4880) 2024-06-08 14:45:06 +02:00
Roelof van Dijk
1785a70e77 fix: else-return on runtime (#4881)
* fix: add init file

* fix: no else-return

* fix: remove file again
2024-06-08 14:44:24 +02:00
Szymon Ożóg
f7201b6852 Remove deprecated code (#4724) 2024-05-25 03:02:12 -04:00
chenyu
286b4dbdf2 compile raise CompileError and skip only RuntimeError in multiprocess… (#4646)
* compile raise CompileError and skip only RuntimeError in multiprocess beam

renderer error with multiprocess should not be skipped by beam

* use `==` for dtype to dtype comparison

* that needs to be is

* typo
2024-05-19 00:25:25 -04:00
George Hotz
347a3acb37 add renderer class (#4524)
* add renderer class

* tests pass

* fix pylint

* fix tensor cores
2024-05-10 21:40:02 -07:00
George Hotz
d438d5698d bring buffer back to device (#4517) 2024-05-10 11:22:31 -07:00
George Hotz
4eef1ee9bf move renderer into options (#4514)
* move renderer into options

* fix tests

* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz
89e119bc58 move Allocator to buffer.py (#4502)
* move Allocator to buffer.py

* move those to realize

* memory file

* cleanup
2024-05-09 19:45:56 -07:00
George Hotz
9fc4465557 subbuffer support (#4397)
* subbuffer support

* diskbuffer offset

* cuda subbuffer works

* use subbuffer

* more subbuffer tests

* consecutive

* cast

* consec

* offset

* view is a better name

* offset is in nbytes

* fix view + memory planner

* delete unused DiskRunner

* reverse order

* no subbuffers on unrealized consts

* only enabled for disk

* don't reverse memory

* view supported devices

* pickle buffer view

* ring jit

* support extra view inputs in jit

* fix JIT=2 issue

* test copy jit

* p2p isn't an option anymore

* fix dep tracking issue

* fix mypy

* fix pickle

* from_nv is contents now
2024-05-03 18:05:57 -07:00
George Hotz
60e3aa5cb1 more docs (#4271)
* more work on docs

* CompilerOptions is dataclass
2024-04-24 10:52:42 +08:00
Micah Zoltu
7bc862767c Improves error message when CUDA module fails to load. (#4243) 2024-04-21 11:10:14 -04:00
nimlgen
5a57b48134 cuda p2p enable when available (#4153) 2024-04-12 16:21:54 +03:00
George Hotz
af5984df43 cudagraph memcpy through host (#4137) 2024-04-10 13:17:17 -07:00
chenyu
1de9778949 import Buffer and BufferOption from tinygrad.buffer (#4076) 2024-04-04 22:12:23 -04:00
chenyu
b47f6cebb2 LinearizerOptions -> CompilerOptions (#3978) 2024-03-28 17:50:23 -04:00
nimlgen
e2d6f76723 _alloc and _free with options (#3934)
* _alloc has options

* linter

* fix hsa
2024-03-26 09:11:41 -07:00
nimlgen
739f47eb0f check on cuEventSynchronize (#3933) 2024-03-26 16:14:38 +03:00
nimlgen
f2a9ea4ea9 lru allocator for copyin host buffers (#3918)
* lru allocator for copyin host buffers

* linter happy
2024-03-25 15:57:18 +03:00