Christopher Milan
c6ba016da6
fix cuda check ( #13726 )
2025-12-16 18:00:09 -05:00
nimlgen
77a76d1b13
device: respect compiler ContextVars ( #13523 )
...
* device: envvars for cc
* fix
* fix
* x
* um
* fix
* remote
* em
* cleanup
* typing
* fix
* debug
* lvp?
* ugh
* singl
* rm
* lol
* fix
* ?
* this?
* why?
* rev
* mod test
* l
2025-12-02 14:42:04 +03:00
Christopher Milan
09f3aae169
In-tree autogen: all C libraries ( #13220 )
...
* checkout files from autogen branch
* ioctl with payload
* fix am generations
* properly fix generations
This reverts commit b2a54f4f41 .
* revert discovery.h
* support pragma pack(1)
* typo
* better getter
* typo
* NVCEC0_QMDV05_00_RELEASE[01]_ENABLE
* align support
* anon handling fix
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-13 18:57:44 -08:00
wozeparrot
c3149c618a
feat: nvcc compiler ( #12852 )
2025-10-21 11:31:23 -07:00
George Hotz
1ecf403294
cleanup long lines [pr] ( #12623 )
...
* cleanup long lines
* more
* a few more
* all noqa fixed
* fix amd + cuda
* clean that up
2025-10-12 20:18:05 +08:00
chenyu
585bd95b50
fix ruff 0.14.0 [pr] ( #12547 )
2025-10-09 01:52:30 -04:00
nimlgen
fb96394ff5
auto-select available compilers ( #12094 )
...
* device: auto select compilers
* fix
* metal+opencl
* nv/cuda
* test without ptx
* ptx
* fix tests
* fix
* fix test
* rename
* test + cleaner
* xx
* ops
* better test
* win?
* um?
* types
* debug
* win??
* sep rung
* wtf?
* debug
* skip win
* revert this
* types
2025-09-10 19:52:01 +03:00
nimlgen
75c2c42def
suppress exceptions only during finalization ( #11451 )
...
* suppress exceptions only during finalization
* fix
* fix typing
* fix more warns
* fix
* better?
* Revert "better?"
This reverts commit a068aa5793 .
* mm?
* no as e
2025-07-31 13:57:12 +03:00
nimlgen
188ed38315
replace from_mv with lightweight mv_address ( #11280 )
2025-07-19 13:50:51 +03:00
George Hotz
67a1c92fc0
remove del spam from CI ( #10699 )
...
* remove del spam from CI
* more
* preconstruct default buffer spec
* ignore those errors
* check exception
* more exception check
* skip stuff
2025-06-08 10:14:30 -07:00
Ignacio Sica
f69722dc2a
refactor cuda disassemble ( #10449 )
2025-05-22 08:58:24 -07:00
uuuvn
7bc4864bc4
Make dev a property of Allocator ( #10286 )
...
* Make `dev` a property of `Allocator`
(this is a prereq refactor for #10285 )
At least `BufferXfer.copy` accesses it assuming it's always present,
currently most devices just add this property on their own repeating
the same code over and over again.
This is also a bit footguny, see `RemoteAllocator` that named this
property `device` instead of `dev`, i could obviously just change that
in one place but doing it globally seems like a better solution (and it
reduces code duplication too).
`MallocAllocator` is a bit special, but passing `None` works just fine.
* typing
* ignore type instead of cast
2025-05-13 17:01:01 -07:00
Ignacio Sica
cfad139189
bump assembly debug to 7 ( #9662 )
2025-04-01 11:51:33 +08:00
nimlgen
1d06d61b16
from_blob for cuda ( #9223 )
...
* from_blob for cuda
* maybe docs?
* minor docs
* example
* waiting 9224
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-24 14:02:06 +03:00
nimlgen
9bc317d5d2
mockcuda ( #8503 )
...
* init mockcuda
* run gpu ocelot
* fix
* sfixes
* disable broken tests
* linter
* these fails as well
* pylint
* myypy
* this fails on real platforms as well
* mypy please
2025-01-05 01:23:57 +03:00
nimlgen
90f1f0c9d5
eh ( #8309 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-12-26 13:16:34 -05:00
George Hotz
62e5d96446
more typing work [pr] ( #8345 )
2024-12-19 21:46:35 -08:00
George Hotz
9c77e9f9b7
replace Tuple with tuple [pr] ( #8344 )
...
* replace Tuple with tuple [pr]
* replace List with list [pr]
* replace Dict with dict [pr]
* replace Set with set [pr]
2024-12-19 21:27:56 -08:00
George Hotz
c5d458ce02
BufferSpec and ProgramSpec [pr] ( #7814 )
...
* BufferSpec and ProgramSpec [pr]
* delete preallocate, it's unused
* Revert "delete preallocate, it's unused"
This reverts commit dcfcfaccde .
2024-11-21 12:18:05 +08:00
George Hotz
6688539bc9
rename device to dev so Buffer can be Allocator [pr] ( #7799 )
...
* rename device to dev to Buffer can be Allocator [pr]
* missed those
* update the Program classes also
* more renames
* oops
2024-11-20 15:47:26 +08:00
George Hotz
d71fe7faa5
rename allocator methods to not conflict [pr] ( #7788 )
...
* rename allocator methods to not conflict [pr]
* forgot those
* transfer + offset
2024-11-20 00:10:29 +08:00
chenyu
348d37df46
a few more unused type ignore [pr] ( #7568 )
2024-11-06 10:17:19 -05:00
nimlgen
99fb115791
cuda correct pointer type ( #7153 )
2024-10-18 22:39:59 +03:00
George Hotz
ca0dca35f7
move ptx renderer [pr] ( #7118 )
2024-10-17 14:50:32 +08:00
Francis Lam
b0dd407cdd
ops_cuda: add optional dynamic smem parameter ( #6956 )
...
* ops_cuda: add optional dynamic smem parameter
This is required to enable larger than 48kb shared memory usage on
a per-kernel basis.
* move setting max dynamic smem size to init
2024-10-11 21:51:06 +03:00
nimlgen
1903542c2d
nv/cuda compilers touchup ( #5759 )
...
* nv/cuda compilers touchup
* fix cuda check + move nv disasm
* remove includes
* fix nvrtc_check
2024-07-28 00:15:28 +03:00
chenyu
9838c1a6ff
update import style in runtime ( #5735 )
2024-07-26 14:00:23 -04:00
George Hotz
5c688560bc
move CUDA/HIP compilers to their own files [run_process_replay] ( #5732 )
2024-07-26 10:00:15 -07:00
nimlgen
baface413a
nv better nvdisasm fail message ( #5682 )
...
* nv better nvdisasm message
* cuda
2024-07-24 16:19:26 +03:00
nimlgen
b4c49ae3fa
remove cudacpu in favour of mockgpu ( #5225 )
...
* remove cudacpu in favour of mockgpu
* remove unused import
* not used as well
2024-06-29 11:05:16 +03:00
nimlgen
ee02dcb98e
nv supports PTX=1 ( #5222 )
...
* nv supports PTX=1
* not needed
* split nv compiler into nvrtc autogen
* remove to_c_array
* test
* Revert "test"
This reverts commit f0b56f308b .
2024-06-29 10:46:29 +03:00
chenyu
a8e9307e0b
pylint runtime/ and shape/ ( #5044 )
...
as pointed out by #4877 , need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime
2024-06-18 19:48:18 -04:00
Roelof van Dijk
0eebb8e998
fix: _free should not return ( #4880 )
2024-06-08 14:45:06 +02:00
Roelof van Dijk
1785a70e77
fix: else-return on runtime ( #4881 )
...
* fix: add init file
* fix: no else-return
* fix: remove file again
2024-06-08 14:44:24 +02:00
Szymon Ożóg
f7201b6852
Remove deprecated code ( #4724 )
2024-05-25 03:02:12 -04:00
chenyu
286b4dbdf2
compile raise CompileError and skip only RuntimeError in multiprocess… ( #4646 )
...
* compile raise CompileError and skip only RuntimeError in multiprocess beam
renderer error with multiprocess should not be skipped by beam
* use `==` for dtype to dtype comparison
* that needs to be is
* typo
2024-05-19 00:25:25 -04:00
George Hotz
347a3acb37
add renderer class ( #4524 )
...
* add renderer class
* tests pass
* fix pylint
* fix tensor cores
2024-05-10 21:40:02 -07:00
George Hotz
d438d5698d
bring buffer back to device ( #4517 )
2024-05-10 11:22:31 -07:00
George Hotz
4eef1ee9bf
move renderer into options ( #4514 )
...
* move renderer into options
* fix tests
* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz
89e119bc58
move Allocator to buffer.py ( #4502 )
...
* move Allocator to buffer.py
* move those to realize
* memory file
* cleanup
2024-05-09 19:45:56 -07:00
George Hotz
9fc4465557
subbuffer support ( #4397 )
...
* subbuffer support
* diskbuffer offset
* cuda subbuffer works
* use subbuffer
* more subbuffer tests
* consecutive
* cast
* consec
* offset
* view is a better name
* offset is in nbytes
* fix view + memory planner
* delete unused DiskRunner
* reverse order
* no subbuffers on unrealized consts
* only enabled for disk
* don't reverse memory
* view supported devices
* pickle buffer view
* ring jit
* support extra view inputs in jit
* fix JIT=2 issue
* test copy jit
* p2p isn't an option anymore
* fix dep tracking issue
* fix mypy
* fix pickle
* from_nv is contents now
2024-05-03 18:05:57 -07:00
George Hotz
60e3aa5cb1
more docs ( #4271 )
...
* more work on docs
* CompilerOptions is dataclass
2024-04-24 10:52:42 +08:00
Micah Zoltu
7bc862767c
Improves error message when CUDA module fails to load. ( #4243 )
2024-04-21 11:10:14 -04:00
nimlgen
5a57b48134
cuda p2p enable when available ( #4153 )
2024-04-12 16:21:54 +03:00
George Hotz
af5984df43
cudagraph memcpy through host ( #4137 )
2024-04-10 13:17:17 -07:00
chenyu
1de9778949
import Buffer and BufferOption from tinygrad.buffer ( #4076 )
2024-04-04 22:12:23 -04:00
chenyu
b47f6cebb2
LinearizerOptions -> CompilerOptions ( #3978 )
2024-03-28 17:50:23 -04:00
nimlgen
e2d6f76723
_alloc and _free with options ( #3934 )
...
* _alloc has options
* linter
* fix hsa
2024-03-26 09:11:41 -07:00
nimlgen
739f47eb0f
check on cuEventSynchronize ( #3933 )
2024-03-26 16:14:38 +03:00
nimlgen
f2a9ea4ea9
lru allocator for copyin host buffers ( #3918 )
...
* lru allocator for copyin host buffers
* linter happy
2024-03-25 15:57:18 +03:00