tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-13 08:05:10 -05:00

Author	SHA1	Message	Date
George Hotz	60e3aa5cb1	more docs (#4271 ) * more work on docs * CompilerOptions is dataclass	2024-04-24 10:52:42 +08:00
nimlgen	f3b4dff7c9	KFDProgram -> AMDProgram (#4268 )	2024-04-24 00:29:50 +03:00
George Hotz	9a95781d51	renamed (#4260 )	2024-04-23 09:00:28 +04:00
George Hotz	2ae4f45272	WIP PM4 Support (#4110 ) * pm4 kernel launch works * disable USE_THREAD_DIMENSIONS * add kernel code * work on real pm4 * pm4 signal * same * gate pm4 * hcq tests pass * ops passes * pm4 is closer * pm4 debug (#4165) * start debug tests passing * prg * smth * hdp flush * cleaner 1 * do not need this * logs not need * small things * linter * remove AQL * test hcq * fix tests * it's subtracting, it shouldn't be -1 * pm4 changes (#4251) * not need this anymore * sdma signal with non atomic --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-04-23 08:31:27 +04:00
nimlgen	e6227bdb15	nv driver (#4044 ) * start * fix err 93 * gpu * ioctl mappings * alloc like cuda * semaphores * wait for semaphores value * start ops_nv * very simple kernels work * init several gpus * qmd dumper * dirty, but most of kernels work * always all test_ops * progress, more tests, stable * test_ops passes, gpt2 works but wth big fifo, wrap of fifo doesn't work, i think it's something coherency releated * need better sync * fix sync * alloc2 * all tests pass! * cleanup 1 * cleanup * multigpu, simple transfer * fix sync * correct init * nv_gpu autogen + sync bug fix * clean extra/nv_gpu_driver * p2p * clean up * remove old gen * small fixes * cleanup * cleanup 2 * small fixes * bigger queue size * cleanups * wait * fixed signals for devs * fix hang + parallel beam * small fixes * detect when local memory is big in kernel * correct assert * small fixes * correct tls size est * one va space * less lines * shorter * save 2 lines * save some lines * remove type ignores --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-22 19:50:20 +04:00
Micah Zoltu	7bc862767c	Improves error message when CUDA module fails to load. (#4243 )	2024-04-21 11:10:14 -04:00
George Hotz	b9570d6100	clean up update stats (#4226 ) * WIP: clean up update stats * line savings now * fix graphs * fix tests * tighter prints * remove extra jit=false * debug=2 means wait * that won't update stats * still wait	2024-04-19 15:41:30 +04:00
nimlgen	4ed6b42a8a	fix kernargs check in kfd (#4194 )	2024-04-17 00:44:50 +03:00
George Hotz	b6e7243bfa	hotfix: skip slow pre-commit test	2024-04-16 11:48:43 +04:00
nimlgen	24a27a01a9	hotfix: CUDA_P2P works (#4155 )	2024-04-12 18:20:12 +03:00
nimlgen	5a57b48134	cuda p2p enable when available (#4153 )	2024-04-12 16:21:54 +03:00
George Hotz	bbda20c0db	CompiledASTRunner -> CompiledRunner (#4148 )	2024-04-11 08:49:52 -07:00
George Hotz	b7e281cf10	JitItem -> ExecItem (#4146 ) * JitItem -> ExecItem * execitem in realize * cleaner * JITRunner -> Runner	2024-04-11 08:24:57 -07:00
George Hotz	081dd1573f	hotfix: keep CUDA D2D copy behind the CUDA_P2P flag	2024-04-10 21:36:48 +00:00
George Hotz	af5984df43	cudagraph memcpy through host (#4137 )	2024-04-10 13:17:17 -07:00
George Hotz	ee457a4b20	no more underlying diskbuffer, that's just the device (#4129 )	2024-04-10 08:32:25 -07:00
Felix Kuehling	38ae4194a6	Fixes for ops_kfd (#4105 ) * kfd_ops: Fix GPU node discovery on NUMA systems Ignore potentially multiple CPU NUMA nodes and any GPU nodes that are not accessible because of device cgroups. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> * kfd_ops: Format the GFX arch target name correctly The target version in sysfs properties is a decimal representation with two digits per component. The format for LLVM GFX target names is a bit quirky for historical reasons. It uses one digit for the minor version and stepping. When it ran out of decimal digits for the stepping on gfx90X it started using hexadecimal there. But the major version is still decimal and went double digit in GFX10. Make sure to parse and format it accordingly for all supported GPUs. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> --------- Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>	2024-04-09 13:21:21 -07:00
George Hotz	ae849d12d7	numpy device + pickle it (#4120 )	2024-04-09 13:19:30 -07:00
andresgit	7fd12aba85	graph remove input buffer references (#4100 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-08 16:49:16 -04:00
George Hotz	444d2a7487	hotfix: fix SDMA read_pointer_address in KFD	2024-04-07 13:13:15 +00:00
uuuvn	bb7567b365	Fix metal (#4101 )	2024-04-07 05:21:19 -07:00
George Hotz	8739d33fe9	kfd: disable copy_from_fd while debugging (#4091 ) * kfd: disable copy_from_fd while debugging * increase timeout to a minute	2024-04-05 18:02:58 -07:00
George Hotz	164329a8ea	address kfd feedback (#4087 ) * address kfd feedback * signals cleanup * signals cleanup * handle 2 doorbell pages correctly * signal reset cleanup * signals cleanup * more GTT * cleanups * minor cleanups	2024-04-05 15:24:41 -07:00
George Hotz	a337922c44	more work on kfd (#4079 ) * more work on kfd * fix multitensor test on kfd * stuff	2024-04-05 08:36:36 -07:00
George Hotz	28ec6c67be	hotfix: hlb_cifar KFD works	2024-04-05 02:19:14 +00:00
chenyu	1de9778949	import Buffer and BufferOption from tinygrad.buffer (#4076 )	2024-04-04 22:12:23 -04:00
George Hotz	3de855ea50	don't use SVM memory in KFD (#4072 ) * don't use SVM memory in KFD * copy from fd * cleanups * transfer * hacks * ops_hsa * tighter API	2024-04-04 17:33:21 -07:00
George Hotz	3e72d745ea	hotfix: make KFD timings right	2024-04-04 05:55:29 +00:00
George Hotz	58d162315c	Continuing KFD work (#4065 ) * cleanups * fix kernargs ptr * mypy passes	2024-04-03 22:48:13 -07:00
chenyu	d219aba962	prepend CLANG_PROGRAM_HEADER in ClangCompiler.render instead of compile (#4063 ) src header should be part of the rendered output, and DEBUG=4 includes the header this way	2024-04-03 23:17:56 -04:00
George Hotz	7181ffd630	HWCopyQueue in KFD (#4042 ) * HWCopyQueue in KFD * hw compute queue * test * move test * more tests * fix wait * fix multimap * mes crash * tests pass but slow * stuff is working * one more test	2024-04-03 20:14:24 -07:00
Léo	e879e16c48	docs: add warning message for conda users when using METAL (#3917 ) * docs: add warning message for conda users when using METAL * fix: conda metal warning too long. disabled line length check * docs: changed conda METAL warning to include DISABLE_COMPILER_CACHE=1 * fix(metal): now detecting invalid library magic * format: removed noqa E501 * fix(metal): conda error line len * fix: typo --------- Co-authored-by: Léo Paillé <leo.paille@enseirb-matmeca.fr>	2024-04-02 09:22:24 -07:00
George Hotz	506b1c5892	multigpu works (#4040 )	2024-04-02 08:29:37 -07:00
George Hotz	7425a0c646	CommandQueue is the future (#3950 ) * start of command queue * cq work * runs * cleanup * outs set * read is gone * future buffer work * command queue is better * command queue works * loadops * delete unneeded * command queue works * upd * fix tests * use CommandQueue in compile * delay sync	2024-04-01 17:35:48 -07:00
chenyu	0a34d6016b	move exec_alu from uops to ops (#4033 ) will use this for const folding in lazy too	2024-04-01 17:20:53 -07:00
nimlgen	d6ba44bc1e	kfd free buffers (#4027 ) * kfd free buffers * unmap * all test passes * better pm4 * forgot these * invalidate only range * better cache * forgot * comments * fixes	2024-04-01 15:50:58 -07:00
nimlgen	7fa233e8c9	kfd fix kernels with private memory (#4018 ) * kfd fix kernels with private memory * linter	2024-04-01 00:01:30 +03:00
George Hotz	2abb474d43	kfd driver wip (#3912 ) * kfd driver wip * cleanups * kfd almost ready to ring doorbell * ding dong? * issues with signals * something * works * ops kfd * add amd_signal_t * works...sometimes * program runs * _gpu_alloc cleanup * cleanups * work * header + enable profiling (#3959) * header + enable profiling * just cleaner * measure * only local time domain * remove old comments * fix with master * elf parsing (#3965) * elf parsing * fix kernels with private * not used * clean up * clean up 2 * add flags * kfd sdma (#3970) * working sdma * remove driver, shorter * all commands we might need * svm * kfd remove hardcoded values (#4007) * remove hardcoded values * match above line * 7k lines + revert hsa * update that from origin * fix sdma reg gen * not the updated SDMA * compiler_opts * don't require kfd_ioctl * get ioctls from python * get ioctls from python * remove build_sdma_command * merge into 64-bit fields * shorter * fix property spelling and off by one --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-03-30 15:08:12 -07:00
nimlgen	478c040e1c	hsa terminate without exceptions (#4006 ) * hsa terminate without exceptions * cleaner * linter	2024-03-30 16:03:46 +03:00
chenyu	d9ff636cf5	use is to compare with enum (#3993 ) * use is to compare with enum currently it's mixed between `==` and `is`, moved all to `is` * more	2024-03-29 13:02:56 -04:00
uuuvn	8a40d7d423	Shape changing bitcast and assert bitcast in disk (#3973 ) * Shape changing bitcast * only support it on disk * basic test * more tests * RuntimeError instead of assert * create unique temp files * move tests that use disk to test_disk_tensor * linter * remove assert on error messages * that's RuntimeError now --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-28 21:49:10 -07:00
chenyu	b47f6cebb2	LinearizerOptions -> CompilerOptions (#3978 )	2024-03-28 17:50:23 -04:00
George Hotz	2cfcb5623a	hotfix: d was removed from buffer	2024-03-28 13:39:02 -07:00
George Hotz	42b9d999ea	Buffer isn't always allocated (#3974 ) * buffer alloc * allocate * missing allocates * last one	2024-03-28 13:33:47 -07:00
Francis Lam	7c5729a3bd	wmma: refactor to remove wmma_func and create TC funcs as needed (#3945 ) * wmma: refactor to remove wmma_func and create TC funcs as needed * test_linearizer: disable bf16 CUDA during emulation testing * cstyle: clean up creation of CUDA vec dtypes * extra/gemm: add option to accumulate to bfloat16 * cleanups * benchmark: add CUDA bfloat16 matmul * more cleanups	2024-03-27 16:43:09 -04:00
chenyu	6c7df1445b	enforce UOps.CONST arg has python type based on dtype (#3952 ) added an assert in uops, remove the cast in renderer	2024-03-27 01:41:38 -04:00
George Hotz	150ea2eb76	create engine folder and move code (#3948 ) * retry * older tf * that	2024-03-26 20:38:03 -07:00
nimlgen	e2d6f76723	_alloc and _free with options (#3934 ) * _alloc has options * linter * fix hsa	2024-03-26 09:11:41 -07:00
nimlgen	739f47eb0f	check on cuEventSynchronize (#3933 )	2024-03-26 16:14:38 +03:00
chenyu	4ecd5789ab	#include <tgmath.h> in ops_clang (#3927 ) * different clang sqrt/log2/exp2/sin function based on dtype fixed softmax_argmax issue in #3552 for clang. * tgmath.h * revert those	2024-03-25 17:48:57 -04:00

1 2 3 4 5 ...

525 Commits