tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-06 04:35:00 -05:00

Author	SHA1	Message	Date
nimlgen	1903542c2d	nv/cuda compilers touchup (#5759 ) * nv/cuda compilers touchup * fix cuda check + move nv disasm * remove includes * fix nvrtc_check	2024-07-28 00:15:28 +03:00
chenyu	9838c1a6ff	update import style in runtime (#5735 )	2024-07-26 14:00:23 -04:00
George Hotz	5c688560bc	move CUDA/HIP compilers to their own files [run_process_replay] (#5732 )	2024-07-26 10:00:15 -07:00
nimlgen	baface413a	nv better nvdisasm fail message (#5682 ) * nv better nvdisasm message * cuda	2024-07-24 16:19:26 +03:00
nimlgen	b4c49ae3fa	remove cudacpu in favour of mockgpu (#5225 ) * remove cudacpu in favour of mockgpu * remove unused import * not used as well	2024-06-29 11:05:16 +03:00
nimlgen	ee02dcb98e	nv supports PTX=1 (#5222 ) * nv supports PTX=1 * not needed * split nv compiler into nvrtc autogen * remove to_c_array * test * Revert "test" This reverts commit `f0b56f308b`.	2024-06-29 10:46:29 +03:00
chenyu	a8e9307e0b	pylint runtime/ and shape/ (#5044 ) as pointed out by #4877, need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime	2024-06-18 19:48:18 -04:00
Roelof van Dijk	0eebb8e998	fix: _free should not return (#4880 )	2024-06-08 14:45:06 +02:00
Roelof van Dijk	1785a70e77	fix: else-return on runtime (#4881 ) * fix: add init file * fix: no else-return * fix: remove file again	2024-06-08 14:44:24 +02:00
Szymon Ożóg	f7201b6852	Remove deprecated code (#4724 )	2024-05-25 03:02:12 -04:00
chenyu	286b4dbdf2	compile raise CompileError and skip only RuntimeError in multiprocess… (#4646 ) * compile raise CompileError and skip only RuntimeError in multiprocess beam renderer error with multiprocess should not be skipped by beam * use `==` for dtype to dtype comparison * that needs to be is * typo	2024-05-19 00:25:25 -04:00
George Hotz	347a3acb37	add renderer class (#4524 ) * add renderer class * tests pass * fix pylint * fix tensor cores	2024-05-10 21:40:02 -07:00
George Hotz	d438d5698d	bring buffer back to device (#4517 )	2024-05-10 11:22:31 -07:00
George Hotz	4eef1ee9bf	move renderer into options (#4514 ) * move renderer into options * fix tests * renders are functions	2024-05-10 10:01:51 -07:00
George Hotz	89e119bc58	move Allocator to buffer.py (#4502 ) * move Allocator to buffer.py * move those to realize * memory file * cleanup	2024-05-09 19:45:56 -07:00
George Hotz	9fc4465557	subbuffer support (#4397 ) * subbuffer support * diskbuffer offset * cuda subbuffer works * use subbuffer * more subbuffer tests * consecutive * cast * consec * offset * view is a better name * offset is in nbytes * fix view + memory planner * delete unused DiskRunner * reverse order * no subbuffers on unrealized consts * only enabled for disk * don't reverse memory * view supported devices * pickle buffer view * ring jit * support extra view inputs in jit * fix JIT=2 issue * test copy jit * p2p isn't an option anymore * fix dep tracking issue * fix mypy * fix pickle * from_nv is contents now	2024-05-03 18:05:57 -07:00
George Hotz	60e3aa5cb1	more docs (#4271 ) * more work on docs * CompilerOptions is dataclass	2024-04-24 10:52:42 +08:00
Micah Zoltu	7bc862767c	Improves error message when CUDA module fails to load. (#4243 )	2024-04-21 11:10:14 -04:00
nimlgen	5a57b48134	cuda p2p enable when available (#4153 )	2024-04-12 16:21:54 +03:00
George Hotz	af5984df43	cudagraph memcpy through host (#4137 )	2024-04-10 13:17:17 -07:00
chenyu	1de9778949	import Buffer and BufferOption from tinygrad.buffer (#4076 )	2024-04-04 22:12:23 -04:00
chenyu	b47f6cebb2	LinearizerOptions -> CompilerOptions (#3978 )	2024-03-28 17:50:23 -04:00
nimlgen	e2d6f76723	_alloc and _free with options (#3934 ) * _alloc has options * linter * fix hsa	2024-03-26 09:11:41 -07:00
nimlgen	739f47eb0f	check on cuEventSynchronize (#3933 )	2024-03-26 16:14:38 +03:00
nimlgen	f2a9ea4ea9	lru allocator for copyin host buffers (#3918 ) * lru allocator for copyin host buffers * linter happy	2024-03-25 15:57:18 +03:00
George Hotz	e0e234bf94	hotfix, str compare version for cuda	2024-03-24 20:35:24 -07:00
Arseny Kapoulkine	715850aef9	Fix sm89 PTX=1 compilation (#3915 ) * Fix sm89 PTX=1 compilation The minimum PTX version that supports sm89 is 7.8 (same version also supports sm90); without this ptxas fails when running tinygrad with PTX=1 on RTX 4090. * Use int(arch[3:]) for forward compat with SM10.0 if that happens	2024-03-24 20:32:29 -07:00
sekstini	7c3632fd1e	add --minimal flag to nvrtc (#3899 )	2024-03-23 16:38:31 -07:00
George Hotz	46a3501cec	nv ioctl sniffer (#3892 ) * nv ioctl sniffer * unused import * Update __init__.py * that work * that fix it	2024-03-23 00:29:30 -07:00
chenyu	1c51d586ea	replace raise Exception with specific errors (#3874 )	2024-03-22 12:32:21 -04:00
nimlgen	8ef5490ec8	cuda tranfer + async copyin (#3873 )	2024-03-22 09:01:37 -07:00
Szymon Ożóg	624bc89910	PTX - implement float 4, ptr arithmetics and other speed improvements (#3775 ) * ptx float4 implementation * remove from cache when trimming uops * Gate for float4 * Linting fix * disable test reasonable time for ptx * import getenv * Update uops.py * linter * Add div test for half * upcast if op does not support operation * fix offset * Run only if dtype supported * zero out registers when accessing by pred + cleanup * Remove trailing whitespace * revert * spacing fix * move cache clearing outside loop * did this suddenly start working? * unused import removed * Remove cast * Use pattern matching * linting --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-22 08:54:02 -07:00
nimlgen	b78352b423	do not create structs every call in CUDAProgram (#3855 ) * do not create structs in cuda * fix graph * linter * do not exec twice * fix graph	2024-03-21 17:51:40 +03:00
Francis Lam	b6e2495fdd	kernel: limit shared memory usage when adding opts (#3705 ) * kernel: limit shared memory usage when adding opts * search: remove unnecessary limit on search space apply_opt will do the more correct check	2024-03-12 17:06:21 -04:00
George Hotz	ac02e7347d	ptx timing vs cuda timing (#3659 )	2024-03-08 10:17:49 -08:00
George Hotz	6e50582e62	working to improve ptx (#3647 ) * working to improve ptx * fix compile fail	2024-03-07 12:39:31 -08:00
George Hotz	81baf3eed3	bring ptx back (#3623 ) * bring ptx back * ptx back * fix define var * fix a few bugs * bugfixes * fixes * fix llvm bug * fix test bug	2024-03-06 13:34:21 -08:00
Francis Lam	e17f1821a7	wmma: add CUDA tensor core and fix test_speed_v_torch failure (#3544 )	2024-03-01 17:51:02 -08:00
nimlgen	94b7ac7a29	no cuda compile helper (#3512 )	2024-02-28 01:50:10 +01:00
George Hotz	7698781389	Revert "wmma: add CUDA tensor core (#3464 )" (#3474 ) This reverts commit `e9cef13f0b`.	2024-02-22 11:58:16 +01:00
Francis Lam	e9cef13f0b	wmma: add CUDA tensor core (#3464 )	2024-02-22 11:57:08 +01:00
George Hotz	3c728d1082	compiler support (#3260 ) * compiler support * revert that * fix tests	2024-01-26 23:36:40 -08:00
George Hotz	03a6bc59c1	move autogen to runtime/autogen (#3254 )	2024-01-26 12:44:19 -08:00
George Hotz	a3869ffd46	move gpuctypes in tree (#3253 ) * move gpuctypes in tree * fix mypy * regex exclude * autogen sh * mypy exclude * does that fix it * fix mypy * add hip confirm * verify all autogens * build clang2py * opencl headers * gpu on 22.04	2024-01-26 12:25:03 -08:00
George Hotz	cb372b053f	add device speed test (#3244 )	2024-01-25 12:01:22 -08:00
nimlgen	3205fd8481	fix cuda device var rewrite (#3233 )	2024-01-24 16:57:49 -05:00
George Hotz	83d614295e	reduce lines (#3230 )	2024-01-24 10:35:59 -08:00
George Hotz	23b084e70a	add device name to device, all are constructed (#3221 )	2024-01-23 20:34:56 -08:00
nimlgen	81ae4ea179	compile cache for several devices (#3148 ) * compile cache for several devices * ops_gpu uses hash to not care about sql * hip rdna test with device * linter happy * no device passed where possible * arch is optional to compile_{hip\|cuda}	2024-01-16 11:45:26 -08:00
George Hotz	120c8b1841	update llvm api + add cache key (#3140 ) * update llvm api + add cache key * use_xcode is a different function * types	2024-01-15 17:25:32 -08:00

1 2 3

128 Commits