tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-20 20:38:03 -05:00

Author	SHA1	Message	Date
nimlgen	fb96394ff5	auto-select available compilers (#12094 ) * device: auto select compilers * fix * metal+opencl * nv/cuda * test without ptx * ptx * fix tests * fix * fix test * rename * test + cleaner * xx * ops * better test * win? * um? * types * debug * win?? * sep rung * wtf? * debug * skip win * revert this * types	2025-09-10 19:52:01 +03:00
nimlgen	8a7be0a747	metal: workaround for transfers sync issue (#11622 ) * metal: workaround for transfers sync issue * metal tracsfer sync is broken * hm * rm it? * keep it	2025-08-12 16:16:34 +03:00
nimlgen	a5371f514b	cpu: copies in profile (#11392 ) * cpu: copies in profile * fix * rename to tiny?	2025-07-27 20:56:27 +03:00
qazal	3466a220de	viz: disassembly viewer (#11393 ) * test * CPU=1 disasm works * METAL=1 disasm works * fix that * work * can unwrap * work p2 * don't crash	2025-07-27 18:44:28 +03:00
chenyu	54924f9969	type remove Union and Optional [pr] (#11283 ) use `\|` for consistency	2025-07-19 14:05:52 -04:00
qazal	bde80c0cdf	record GraphEvents in metal graph (#11145 ) * record GraphEvents in metal graph * add TestProfiler.test_graph, revert old stuff * move profile capture to MetalGraph * comment * don't double record graph command buffers * wait_check * explicit delete	2025-07-10 21:32:06 +03:00
Pyry Kovanen	32117402dd	metal: fix incorrect _free on interpreter exit (#11158 )	2025-07-10 14:01:30 +03:00
qazal	3dfc0ff887	move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126 ) * move cpu_profile and shared ProfileEvents to helpers [pr] * TestProfiler.test_cpu_profile * update test_viz.py * TestProfiler.test_profile_multiops ordering, it's different streams now	2025-07-08 12:14:03 +03:00
simone-pietro	58252e3c49	Change type hint for init_c_struct_t and to_struct [pr] (#10878 ) * Change type hint for init_c_struct_t * Change type hint for to_struct	2025-06-19 13:22:44 +03:00
George Hotz	4b1f1a47bb	hotfix: allow ModuleNotFoundError in metal llvm import	2025-05-18 20:46:31 -07:00
uuuvn	7bc4864bc4	Make `dev` a property of `Allocator` (#10286 ) * Make `dev` a property of `Allocator` (this is a prereq refactor for #10285) At least `BufferXfer.copy` accesses it assuming it's always present, currently most devices just add this property on their own repeating the same code over and over again. This is also a bit footguny, see `RemoteAllocator` that named this property `device` instead of `dev`, i could obviously just change that in one place but doing it globally seems like a better solution (and it reduces code duplication too). `MallocAllocator` is a bit special, but passing `None` works just fine. * typing * ignore type instead of cast	2025-05-13 17:01:01 -07:00
uuuvn	82a6160ff7	Detect metal paravirtualization bug via device name instead of CI (#10225 )	2025-05-08 19:31:47 -07:00
uuuvn	dba073e5c0	Less messy broken graph on paravirtualized metal workaround (#10182 ) * Less messy broken graph on paravirtualized metal workaround GitHub CI macOS runners use paravirtualized metal which is broken with graph (some comments say that ICB in particular is broken but in my testing it was fine sometimes, but other times hitting an assert inside metal's code related to resouces, so not sure). > Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458. This can be reproduced locally with any virtualization software (like utm) that can create macOS VMs with apple's own virtualization framework. * unused import	2025-05-06 20:41:02 +03:00
George Hotz	5c7b549eab	use functools.cache instead of lru_cache(None) [pr] (#9714 ) * use functools.cache instead of lru_cache(None) [pr] * more cache	2025-04-03 11:47:13 +08:00
nimlgen	56288243e6	metal PyTorch interop (#9229 ) * add from_blob support to mps cuda * objc_id * metal pytorch interop * fix comments --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-02-24 22:36:08 +03:00
nimlgen	f986e12f91	metal: choose compile spec based on macos (#9188 ) * metal: choose compile spec based on macos * correction	2025-02-21 00:43:39 +03:00
uuuvn	9b9c1e14da	Late MTLCompiler load (#8963 ) Moved loading MTLCompiler (and trying to load normal llvm before it) to MetalCompiler, like in CPUProgram with helper	2025-02-08 17:29:23 +08:00
uuuvn	6090cbe3be	Try to open llvm first when opening metal (#8949 ) * Try to open llvm first when opening metal * Use more specific FileNotFoundError	2025-02-07 18:58:37 +08:00
uuuvn	67b70e4f6c	Fix incorrect __del__ (#8950 ) CPython doesn't make any guarantees about order in which globals like `msg` or `libobjc` are destroyed when the interpreter shuts down https://github.com/tinygrad/tinygrad/pull/8949 triggered the unlucky ordering which lead to a bunch of errors at exit There is also a bunch of other places where similar problems exist	2025-02-07 18:21:44 +08:00
George Hotz	1249e8dd3b	objc fast msg, try 2 [pr] (#8927 )	2025-02-06 19:06:21 +08:00
George Hotz	1c53e8bf27	Revert "objc fast msg (#8922 )" (#8926 ) This reverts commit `c3f99a727e`.	2025-02-06 17:50:49 +08:00
George Hotz	c3f99a727e	objc fast msg (#8922 ) * benchmark kernel launch * don't realize unneeded * faster * faster metal * fix mypy * new objc message style [pr] * without sync * no div 0 * lru cache that * no sync in the profile * fix * update all to new style * remove comment * graph one kernel * fix graph one kernel * remove that sync	2025-02-06 17:49:06 +08:00
George Hotz	a8e54df363	benchmark single kernel launch (#8921 ) * benchmark kernel launch * don't realize unneeded * faster * faster metal * fix mypy * without sync * no div 0 * lru cache that * no sync in the profile	2025-02-06 13:35:34 +08:00
nimlgen	5afb0a4a81	metal: fix transfer profiling (#8659 )	2025-01-17 23:47:01 +03:00
uuuvn	615d5276b1	Suppress 'X warnings generated.' in MTLCompiler (#8489 ) '-fno-caret-diagnostics' is what clang-tidy uses when user passes --quiet	2025-01-04 10:22:37 -05:00
chenyu	7ea633f94f	remove from __future__ import annotations from runtimes [pr] (#8373 ) it's not needed if we move the Device before Program and Allocator, which need Device. not updating hcq because it has a lot more stuff, and CLDevice requires CLDevice	2024-12-21 23:46:07 -05:00
George Hotz	9c77e9f9b7	replace Tuple with tuple [pr] (#8344 ) * replace Tuple with tuple [pr] * replace List with list [pr] * replace Dict with dict [pr] * replace Set with set [pr]	2024-12-19 21:27:56 -08:00
nimlgen	777d2aec05	metal profiler + cpu_profile (#8291 ) * metal + cpu_profile * gpt example * linter + revert gpt2 for now * a bit of readme * linter * unrelated * tests * linter * b	2024-12-18 00:06:56 +03:00
chenyu	2e4c7d4cfb	add "tinygrad" to be part of cache_dir [pr] (#8188 ) instead of having sqlite / http download / metal compile to add "tinygrad" separately. also make it non-private since it's used in metal	2024-12-12 12:09:44 -05:00
nimlgen	e180a31c5e	tiny metal cleanup (#8089 ) * tiny metal cleanup * cast * sry	2024-12-06 21:44:32 +03:00
uuuvn	e9c5b23ba1	Use MTLCompiler directly (v2) (#7920 ) * Use MTLCompiler directly (v2) * to_block_literal and REQUEST_TYPE_COMPILE * Rewrite command encoding * Revert to_block_literal * Maybe that's more readable to some people? * Typo and comment about stdlib caching * Update ops_metal.py * Update ops_metal.py * Update ops_metal.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-04 16:36:48 +08:00
chenyu	04bee97d2a	hotfix ctypes.c_ulong(size) for metal _alloc (#7902 ) fix `Tensor.ones(1000, 1000, 1000).contiguous().realize()` on METAL	2024-11-25 18:25:33 -05:00
George Hotz	eb0bb7dc0b	final dname to device [pr] (#7806 ) * final dname to device [pr] * oops, fix nv	2024-11-20 20:20:28 +08:00
George Hotz	6688539bc9	rename device to dev so Buffer can be Allocator [pr] (#7799 ) * rename device to dev to Buffer can be Allocator [pr] * missed those * update the Program classes also * more renames * oops	2024-11-20 15:47:26 +08:00
George Hotz	913a27ee27	from_buffer on metal was never called [pr] (#7791 )	2024-11-20 00:35:17 +08:00
George Hotz	d71fe7faa5	rename allocator methods to not conflict [pr] (#7788 ) * rename allocator methods to not conflict [pr] * forgot those * transfer + offset	2024-11-20 00:10:29 +08:00
chenyu	573f145dcf	METAL raise RuntimeError with no compiler and bad src (#7603 ) fixed BEAM if src is invalid on METAL. it currently only accept RuntimeError in `_time_program`	2024-11-08 17:09:12 -05:00
George Hotz	6bb230287b	pass the src into Metal [pr] (#7518 ) * pass the src into Metal [pr] * put that comment back * keep old functionality * move all to disassembler * metal supports parallel beam * touchups * comment in correct place	2024-11-04 12:35:30 +08:00
chenyu	6021bf87f4	unify `T = TypeVar("T")` (#7342 )	2024-10-28 18:43:44 -04:00
George Hotz	9f32a6f496	Revert "move metal tc check to renderer [pr] (#7248 )" (#7251 ) This reverts commit `72ddcdb4d1`.	2024-10-24 10:57:09 +08:00
George Hotz	72ddcdb4d1	move metal tc check to renderer [pr] (#7248 )	2024-10-24 10:38:57 +08:00
mesozoic-egg	0e8bcda07e	get readable error from wait_check (#6965 ) Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>	2024-10-09 17:28:58 +03:00
mesozoic-egg	d2e02b47e1	Construct c_ulong in blitCommandEncoder copy method (#6793 ) * Construct c_ulong in blitCommandEncoder copy method * line too long --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>	2024-10-02 11:09:37 +08:00
mesozoic-egg	992cde05d7	Metal with CDLL instead of py-objc (#6545 ) * Add CDLL interface for metal * remove two unused functions * Cover most of the API methods * switch to cdll * directly call objc message in ops_metal * keep only obj interface * Use direct message sending for graph * may have found a solution to the memoryview on ctypes pointer * buf indexing bug fixed * fix c_int * fix c int to bytes * fix gpu time bug * line savings for cdll metal core * wip * c int bug * fix buf casting * dedup for c_void_p * dedup for c_void_p * linter fix * remove unused stuff * my py fix * more mypy error fix * line savings * line savings * rename send_message to msg; add __hash__ and __eq__ for dedup * wip * refactor * refactor * remove named import from ctypes * forgot to change variable name * file reorg, put support.py to ops_metal * refactor * hash error * remove to_ns_array * test oom exception, fix exception change * typevar for msg * add back dedup * test for compile error * move constant to graph * move header constant around * get label for icb buffer * check icb label using "in" * wip fixing mypy reported error * fixed mypy error * code formatting * all_resources dedup match previous * code formatting * code formatting; buffer set to objc_id * revert changes on buf for the manual release, seems like _free is not always called * skip unless on metal, for test_metal * fix premature mem release causing seg fault * test_metal check for device before importing * Buffer should only be released under _free explicitly * mypy fixes * change object ownership * test compile success * lint fixes * remove load_library * wrap sel_register in cache * simplify to_struct * swap lines * fix type error in to_struct * bump line to 9800 * remove pyobjc from setup.py * command buffer should be objc_instance and get released * stringWithUTF8String: returns objc_instance * Use constant for MTLPipelineOptionNone * better explanation for [MTLBuffer contents:] return * Use dyld_find in case the path differs * trailing whitespace * handle exception for methods that take error: * load /System/Library instead of /Library * Init c_void_p with None instead of zero for error objects --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-09-25 17:43:01 +08:00
George Hotz	638b4843da	fix for metal ICB issue on M1/M2 [run_process_replay] (#6313 ) * this is a working fix * better comment * repro	2024-08-28 21:31:14 -07:00
nimlgen	e9024c691f	metal raise when command queue is not created (#6044 ) * metal raise when command queue is not created * dont do that	2024-08-12 18:30:37 +03:00
nimlgen	98df648a79	metal sync queues in transfer (#5308 ) * metal sync queues * cleaner * need this * oops	2024-08-05 18:43:22 +03:00
nimlgen	8a548b0b6e	metal support offset (#5293 )	2024-07-05 16:13:05 +03:00
gip	04ef0fd328	fix: message when applegpu tools missiong (#5236 )	2024-07-03 09:07:09 -07:00
chenyu	a8e9307e0b	pylint runtime/ and shape/ (#5044 ) as pointed out by #4877, need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime	2024-06-18 19:48:18 -04:00

1 2 3 4

152 Commits