tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-16 01:26:29 -05:00

Author	SHA1	Message	Date
George Hotz	9c77e9f9b7	replace Tuple with tuple [pr] (#8344 ) * replace Tuple with tuple [pr] * replace List with list [pr] * replace Dict with dict [pr] * replace Set with set [pr]	2024-12-19 21:27:56 -08:00
George Hotz	adcdc583a2	small cleanups [pr] (#8343 ) * small cleanups [pr] * GPU suppress	2024-12-19 21:20:46 -08:00
George Hotz	8f4299fcc8	hotfix: suppress shutdown errors in CLProgram	2024-12-11 08:08:32 -08:00
George Hotz	c5d458ce02	BufferSpec and ProgramSpec [pr] (#7814 ) * BufferSpec and ProgramSpec [pr] * delete preallocate, it's unused * Revert "delete preallocate, it's unused" This reverts commit `dcfcfaccde`.	2024-11-21 12:18:05 +08:00
George Hotz	6688539bc9	rename device to dev so Buffer can be Allocator [pr] (#7799 ) * rename device to dev to Buffer can be Allocator [pr] * missed those * update the Program classes also * more renames * oops	2024-11-20 15:47:26 +08:00
George Hotz	d71fe7faa5	rename allocator methods to not conflict [pr] (#7788 ) * rename allocator methods to not conflict [pr] * forgot those * transfer + offset	2024-11-20 00:10:29 +08:00
nimlgen	8bbf6fb88c	use mv_address in ops_gpu (#6856 )	2024-10-02 22:31:51 +03:00
vladov	501cfde7e6	Fix GPT2 with OpenCL backend. (#6821 ) * Fix GPT2 with OpenCL backend. * Add test for unaligned copies into OpenCL buffers.	2024-10-01 16:57:22 +08:00
George Hotz	d02bb270b7	add copyin copyout for image on GPU [run_process_replay] (#6580 ) * add copyin copyout for image on GPU [run_process_replay] * add timing * enqueue vs total run * it's failing but that's fine	2024-09-18 16:06:20 +08:00
CaltropHungerton	38fb1e14a2	Intel XMX Tensor Core Support (#5622 ) * fixed xmx demo * i think i'm invoking the DPAS but it's slow * compiler build arg to stop register spilling, indicated where to fix flop counter * don't mind this * do NOT mind me * do not mind me * do not view * i will add bf16 later * in process of figuring out tc fields * we figured out the fields!!! * added check for cl device vendor, added seperate IntelRenderer * remove tc thread_local_aliases * cleaning debris before draft pr * edits for linter * deduping and checking device extensions * i will find more line reductions in other places * before merge upstream * double grf size in compiler to fix register spilling (bandaid), device checking changes * tc python emulation * fixed emulation * tests for emulated intel tensor core * TC=0, 1 working on upstream, fixed perf * test * debris * check for specialized cl device when we canonicalize device * bf16 support, tc=3 test added * address tests * revert half2 loads on intel tc, cleanup * linter * fold_expanded revert * lint, whitespace fix * cuda bf16 (only one with bf16) is skipped in test tensor cores, so i will skip for intel bf16 too * make line shorter, no need for noqa E501 * removed device intel * fix python emulation --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-08-16 09:19:21 -07:00
nimlgen	7ab531aede	autogen cleanup (#6064 ) * start autogen cleanup * nvgpu * better? * better * amd part * gpu regen * fix mockgpu amd * nv * amd fix linter * remove import * ugh * nv on master * amd on master	2024-08-14 20:20:35 +03:00
chenyu	e3af273fa1	touchup cl_errors (#6058 ) * touchup cl_errors * update test	2024-08-13 13:06:59 -04:00
tyoc213	0c4e9dbe71	retrieve defined opencl error codes (#5792 )	2024-08-07 10:46:24 -07:00
George Hotz	e347f10d33	hotfix: print which opencl device we are using	2024-08-01 12:39:46 -07:00
chenyu	9838c1a6ff	update import style in runtime (#5735 )	2024-07-26 14:00:23 -04:00
chenyu	a8e9307e0b	pylint runtime/ and shape/ (#5044 ) as pointed out by #4877, need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime	2024-06-18 19:48:18 -04:00
Roelof van Dijk	1785a70e77	fix: else-return on runtime (#4881 ) * fix: add init file * fix: no else-return * fix: remove file again	2024-06-08 14:44:24 +02:00
chenyu	286b4dbdf2	compile raise CompileError and skip only RuntimeError in multiprocess… (#4646 ) * compile raise CompileError and skip only RuntimeError in multiprocess beam renderer error with multiprocess should not be skipped by beam * use `==` for dtype to dtype comparison * that needs to be is * typo	2024-05-19 00:25:25 -04:00
George Hotz	347a3acb37	add renderer class (#4524 ) * add renderer class * tests pass * fix pylint * fix tensor cores	2024-05-10 21:40:02 -07:00
George Hotz	d438d5698d	bring buffer back to device (#4517 )	2024-05-10 11:22:31 -07:00
George Hotz	4eef1ee9bf	move renderer into options (#4514 ) * move renderer into options * fix tests * renders are functions	2024-05-10 10:01:51 -07:00
George Hotz	89e119bc58	move Allocator to buffer.py (#4502 ) * move Allocator to buffer.py * move those to realize * memory file * cleanup	2024-05-09 19:45:56 -07:00
Sohaib	61c97d5305	refactor ops_gpu ctypes (#4331 ) * refactor ops_gpu ctypes - remove redundant byref as ctypes automatically handles passing `type` as `POINTER(type)` - use walrus operator instead of init_c_var when possible * clSetKernelArg argtype is POINTER(None)	2024-04-30 01:33:34 +08:00
chenyu	1de9778949	import Buffer and BufferOption from tinygrad.buffer (#4076 )	2024-04-04 22:12:23 -04:00
chenyu	b47f6cebb2	LinearizerOptions -> CompilerOptions (#3978 )	2024-03-28 17:50:23 -04:00
nimlgen	e2d6f76723	_alloc and _free with options (#3934 ) * _alloc has options * linter * fix hsa	2024-03-26 09:11:41 -07:00
qazal	27f4de2ce4	delete half_prekernel (#3388 ) * generic rendering of half and bf16 hotfix * fix uops + regression test * fix the test for metal's half4 * uop.uop fixup * mypy with --strict-equality, fix ops_gpu	2024-02-14 15:40:48 +01:00
George Hotz	3c728d1082	compiler support (#3260 ) * compiler support * revert that * fix tests	2024-01-26 23:36:40 -08:00
George Hotz	03a6bc59c1	move autogen to runtime/autogen (#3254 )	2024-01-26 12:44:19 -08:00
George Hotz	a3869ffd46	move gpuctypes in tree (#3253 ) * move gpuctypes in tree * fix mypy * regex exclude * autogen sh * mypy exclude * does that fix it * fix mypy * add hip confirm * verify all autogens * build clang2py * opencl headers * gpu on 22.04	2024-01-26 12:25:03 -08:00
George Hotz	cb372b053f	add device speed test (#3244 )	2024-01-25 12:01:22 -08:00
George Hotz	ed8a32722a	hip mutex signal (#3234 ) * hip mutex * hip mutex 2 * sync	2024-01-24 13:23:09 -08:00
George Hotz	23b084e70a	add device name to device, all are constructed (#3221 )	2024-01-23 20:34:56 -08:00
George Hotz	4a07ea355d	buffer options should work (#3211 ) * buffer options should work * minor * fix dtype	2024-01-22 19:23:55 -08:00
nimlgen	992067399e	clean up exceptions in __del__ everywhere (#3165 )	2024-01-18 08:34:09 -08:00
nimlgen	81ae4ea179	compile cache for several devices (#3148 ) * compile cache for several devices * ops_gpu uses hash to not care about sql * hip rdna test with device * linter happy * no device passed where possible * arch is optional to compile_{hip\|cuda}	2024-01-16 11:45:26 -08:00
George Hotz	120c8b1841	update llvm api + add cache key (#3140 ) * update llvm api + add cache key * use_xcode is a different function * types	2024-01-15 17:25:32 -08:00
chenyu	0fe6904351	use device from LinearizerOptions in kernel search (#3090 ) * use device from LinearizerOptions in kernel search removed all Device.DEFAULT in search.py * pass device string for parallel pickle * device for interpreted backends in LinearizerOptions	2024-01-11 14:46:03 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	56f44bd10e	move the compiler cache to be global (#2957 ) * move the compiler cache to be global * remove non robust test * remove dead code	2024-01-01 10:59:56 -08:00
Marcus Asteborg	1fa4f161fe	Update CLProgram to use unsigned long long for event profiling (#2808 ) On Windows, the unsigned long type is 32-bit, which is not compatible with the required data size for event profiling.	2023-12-16 23:48:44 -08:00
George Hotz	6d6eb9302d	ruff checks the max line length is 150 (#2734 ) * ruff checks the max line length is 150 * fix tensor.py * a lot more * done	2023-12-12 17:34:47 -08:00
George Hotz	c53e854687	cast image doesn't work on nvidia (#2626 ) * cast image doesn't work on nvidia * hmm, interpreteds use buffer size 0 * fix type * no lru	2023-12-05 12:48:19 -08:00
George Hotz	664475f247	vals is an argument (#2599 ) * vals is an argument * don't even know how that's legal python	2023-12-03 21:50:43 -08:00
George Hotz	fcd0b2ee6c	fix multigpu on tinybox (#2595 ) * fix multigpu on tinybox * fixed multigpu	2023-12-03 16:48:07 -08:00
George Hotz	171543fc8d	cleanups to save lines and files (#2577 ) * runtime/graph -> features/graph * put all the cstyle renderers in cstyle * same line for those * how did that pass mypy	2023-12-02 16:29:56 -08:00
nimlgen	065495e0c9	save a few lines in ops_gpu (#2564 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-12-02 15:05:22 -08:00
George Hotz	d6b404ac11	No dtype alloc (#2570 ) * fix all allocs * improve docs * ugh fix fake alloc	2023-12-02 13:29:40 -08:00
George Hotz	5068e99d18	refactor to remove extra kernel params (#2563 ) * refactor to have compiled kernel * bugfixes * docs/beautiful.py * revert that * fix tests	2023-12-02 00:32:25 -08:00
George Hotz	27481b9206	Switch ops_gpu -> gpuctypes (#2532 ) * ops_gpu is go * fix size 0 * fix image, and add more tests * nerf openpilot test, doesn't test thneed * run the schedule * better * oops, new inputs * delete pyopencl * Update ops_gpu.py	2023-12-01 22:30:21 -08:00

1 2 3

117 Commits