tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-19 02:44:40 -05:00

Author	SHA1	Message	Date
nimlgen	4ed2c40d48	qcom a bit cleaner (#7380 )	2024-10-29 23:50:28 +03:00
George Hotz	2cfc7b6695	Index everywhere 2 (#7363 ) * indexing everywhere [pr] * fix tests	2024-10-29 19:29:40 +08:00
George Hotz	572499c71a	add indexing to ops_python (#7358 ) * add indexing to ops_python * fix image	2024-10-29 18:11:03 +08:00
chenyu	6021bf87f4	unify `T = TypeVar("T")` (#7342 )	2024-10-28 18:43:44 -04:00
nimlgen	68cd2c0669	nv correct local memory based on device (#7307 ) * nv correct local memory based on device * linter * oops * oops2	2024-10-25 22:23:42 +03:00
nimlgen	98f8d0ccf9	nv limit max local memory with envvar (#7265 )	2024-10-24 16:01:50 +03:00
George Hotz	de7b9d7c42	improve pre-commit [pr] (#7256 ) * improve pre-commit [pr] * mypy passes on windows	2024-10-24 15:38:47 +08:00
George Hotz	9f32a6f496	Revert "move metal tc check to renderer [pr] (#7248 )" (#7251 ) This reverts commit `72ddcdb4d1`.	2024-10-24 10:57:09 +08:00
George Hotz	72ddcdb4d1	move metal tc check to renderer [pr] (#7248 )	2024-10-24 10:38:57 +08:00
nimlgen	ea11382087	nv fix shared_memory_size (#7239 )	2024-10-23 21:59:47 +03:00
qazal	aeeb917b6e	mask out writable bufs in runtime access_resources (#7234 )	2024-10-23 16:13:50 +03:00
nimlgen	cef7078c14	nv limit mappings debug (#7215 )	2024-10-22 16:41:43 +03:00
nimlgen	21acfc39d4	qcom cleanup allocs (#7200 ) * qcom cleanup allocs * oops	2024-10-21 23:20:15 +03:00
nimlgen	81349213c0	nv min regs count is 16 (#7166 )	2024-10-20 20:03:55 +03:00
chenyu	11beb67400	fix import of truncate (#7157 ) truncate was moved to dtype	2024-10-18 18:41:41 -04:00
nimlgen	99fb115791	cuda correct pointer type (#7153 )	2024-10-18 22:39:59 +03:00
Jacky Lee	c8b59416d0	fix: find_library can be None (#7145 )	2024-10-18 20:50:52 +03:00
nimlgen	211d9753f8	nv more lc checks (#7139 ) * nv more lc checks * revert * linter	2024-10-18 00:21:53 +03:00
George Hotz	ca0dca35f7	move ptx renderer [pr] (#7118 )	2024-10-17 14:50:32 +08:00
nimlgen	d1094fce5e	amd reports on hang (#7101 )	2024-10-16 21:32:44 +03:00
nimlgen	83e7dbd89e	nv fix reallocation local memory when oom (#7098 )	2024-10-16 18:17:50 +03:00
George Hotz	cd61e81f55	beautiful mnist works on windows (#7100 ) * beautiful mnist works on windows [pr] * add comment for that (no pr)	2024-10-16 23:00:05 +08:00
nimlgen	9f00eacde5	nv tagged memory + resnet failed kernel (#7061 ) * nv tagged memory * linter * metal fix?	2024-10-15 18:19:58 +03:00
nimlgen	586ff4c910	nv record uvm mappings (#7059 ) * nv record uvm mappings * linteeer * smth * ooops	2024-10-15 00:12:49 +03:00
nimlgen	8094340221	nv print info about faults (#7057 ) * nv print info about faults * unrelated changes * nv_gpu.GT200_DEBUGGER in mockgpu * regen with ocrrect version * spacing	2024-10-14 21:49:38 +03:00
nimlgen	942a17109a	qcom use QCOMBuffer for all allocated buffers (#7023 ) * qcom use QCOMBuffer for all allocated buffers * checks	2024-10-12 23:44:36 +03:00
George Hotz	a71bb09ec3	remove symbolic file [pr] (#7012 )	2024-10-12 18:44:44 +08:00
Francis Lam	b0dd407cdd	ops_cuda: add optional dynamic smem parameter (#6956 ) * ops_cuda: add optional dynamic smem parameter This is required to enable larger than 48kb shared memory usage on a per-kernel basis. * move setting max dynamic smem size to init	2024-10-11 21:51:06 +03:00
George Hotz	f50d0e0ee0	cloud device [pr] (#6964 ) * first try at cloud device [pr] * real separation * we're free * clang works * unhappy with timeout * better timeouts and free * unrelated * use http verbs + add test * lines + better test * fix DELETE * shorter cloud * split key * fix sending renderer * PTXRenderer serialization * add sessions * http.client * minor timeout bump * fix keep-alive * inc server timeout * real fix timeout * that one too	2024-10-11 12:24:06 +08:00
nimlgen	f9d454aed5	correct kernargs alignment (#6984 )	2024-10-11 00:06:28 +03:00
nimlgen	fad575ec76	qcom tiny cleanups (#6973 )	2024-10-10 12:26:41 +03:00
nimlgen	f90d8493cc	add HCQDEV_WAIT_TIMEOUT_MS (#6968 )	2024-10-09 19:50:00 +03:00
mesozoic-egg	0e8bcda07e	get readable error from wait_check (#6965 ) Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>	2024-10-09 17:28:58 +03:00
nimlgen	137ad5519f	amd fix cwsr for gfx11 (#6950 ) * amd cwsr * ()	2024-10-08 17:44:29 +03:00
nimlgen	0d526e251e	nv sync on gpu before local update (#6954 )	2024-10-08 17:43:58 +03:00
vladov	20a9683403	Make self.fd Optional. (#6855 ) * Make self.fd Optional. * Fix io_uring when missing fd. * Compress io_uring fast path code.	2024-10-08 13:25:34 +08:00
nimlgen	42609300ff	hcq no timeline signals in init (#6944 )	2024-10-07 23:36:19 +03:00
nimlgen	707c805a68	nv set localmem sm count to max (#6890 )	2024-10-04 23:29:46 +03:00
George Hotz	6b063450df	move hcq device to runtime [pr] (#6879 ) * things that are only used in one place don't belong in helpers [pr] * start moving hcq device [pr] * fix paths	2024-10-04 22:26:50 +08:00
ignaciosica	8931f20765	CLANG fixed ops python [run_process_replay] (#6866 ) * hotfix: fixed values in ops_python for AMX * hotfix: remove unused import	2024-10-03 20:40:04 +08:00
nimlgen	8bbf6fb88c	use mv_address in ops_gpu (#6856 )	2024-10-02 22:31:51 +03:00
mesozoic-egg	d2e02b47e1	Construct c_ulong in blitCommandEncoder copy method (#6793 ) * Construct c_ulong in blitCommandEncoder copy method * line too long --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>	2024-10-02 11:09:37 +08:00
vladov	501cfde7e6	Fix GPT2 with OpenCL backend. (#6821 ) * Fix GPT2 with OpenCL backend. * Add test for unaligned copies into OpenCL buffers.	2024-10-01 16:57:22 +08:00
nimlgen	e213bea426	nv shorter (#6819 )	2024-09-30 19:39:32 +03:00
nimlgen	b95f47784a	qcom sleep when sync (#6785 ) * qcom sleep when sync * linter * short	2024-09-27 19:14:10 +08:00
nimlgen	3c56aeee70	add Tensor.from_blob (#6765 ) * draft tensor from pointer init * some docs and types * comment * cleaner * test * malloc * qcom cl interop * jit example * cleaner * dealoc * wording * docs	2024-09-26 18:33:19 +08:00
mesozoic-egg	992cde05d7	Metal with CDLL instead of py-objc (#6545 ) * Add CDLL interface for metal * remove two unused functions * Cover most of the API methods * switch to cdll * directly call objc message in ops_metal * keep only obj interface * Use direct message sending for graph * may have found a solution to the memoryview on ctypes pointer * buf indexing bug fixed * fix c_int * fix c int to bytes * fix gpu time bug * line savings for cdll metal core * wip * c int bug * fix buf casting * dedup for c_void_p * dedup for c_void_p * linter fix * remove unused stuff * my py fix * more mypy error fix * line savings * line savings * rename send_message to msg; add __hash__ and __eq__ for dedup * wip * refactor * refactor * remove named import from ctypes * forgot to change variable name * file reorg, put support.py to ops_metal * refactor * hash error * remove to_ns_array * test oom exception, fix exception change * typevar for msg * add back dedup * test for compile error * move constant to graph * move header constant around * get label for icb buffer * check icb label using "in" * wip fixing mypy reported error * fixed mypy error * code formatting * all_resources dedup match previous * code formatting * code formatting; buffer set to objc_id * revert changes on buf for the manual release, seems like _free is not always called * skip unless on metal, for test_metal * fix premature mem release causing seg fault * test_metal check for device before importing * Buffer should only be released under _free explicitly * mypy fixes * change object ownership * test compile success * lint fixes * remove load_library * wrap sel_register in cache * simplify to_struct * swap lines * fix type error in to_struct * bump line to 9800 * remove pyobjc from setup.py * command buffer should be objc_instance and get released * stringWithUTF8String: returns objc_instance * Use constant for MTLPipelineOptionNone * better explanation for [MTLBuffer contents:] return * Use dyld_find in case the path differs * trailing whitespace * handle exception for methods that take error: * load /System/Library instead of /Library * Init c_void_p with None instead of zero for error objects --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-09-25 17:43:01 +08:00
nimlgen	e31552e2e0	qcom reinit queue on exec (#6728 ) * qcom setup on exec as gpu=1 * linter * gpulike * offsets	2024-09-25 16:08:50 +08:00
nimlgen	e1caa24a92	qcom fix binded queue might be overwritten (#6712 )	2024-09-25 12:45:23 +08:00
nimlgen	75b7627db7	qcom do not recreate memoryviews on updates (#6701 )	2024-09-24 15:36:22 +08:00

1 2 3 4 5 ...

823 Commits