tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 14:28:09 -05:00

Author	SHA1	Message	Date
George Hotz	2b5bc5d4a1	factor out image_idx	2023-01-28 00:22:54 -08:00
George Hotz	bd8a5c2ced	Simple CUDA Runtime (#480 ) * factor out opencl runtime * don't use CL outside the runtime * cuda runtime adds * final_dimension * tests pass with CUDA backend * more cuda * cuda simpler * retain old functionality * linter and typing * move globalcounters out of runtimes * oops, GlobalCounters in cuda * MAX_OUTPUT_SHAPE=3 is fine for CUDA	2023-01-27 16:26:24 -08:00
George Hotz	6d5e1a8029	GEMM kernel search	2023-01-27 10:08:57 -08:00
George Hotz	123993156d	refactor group_for_reduce a little	2023-01-27 08:51:23 -08:00
George Hotz	82e58108e3	add flake8 to precommit	2023-01-26 22:31:45 -08:00
George Hotz	f4b571039b	fix shape types	2023-01-26 22:29:20 -08:00
Jacky Lee	026ba78526	Add commit hooks (#478 ) * Add pre-commit hook * We need ret * Fix some type definitions	2023-01-26 22:24:31 -08:00
George Hotz	c07bc39941	fix mypy, plz add commit hooks	2023-01-26 14:25:42 -08:00
Comma Device	f08e740957	factor out hand coded opt	2023-01-26 14:54:06 -06:00
George Hotz	5e8a36a18b	real op kernel	2023-01-26 09:51:32 -08:00
George Hotz	e0600f537a	op kernel in kernel search	2023-01-26 09:47:01 -08:00
George Hotz	60acb2641f	ugh, don't use os	2023-01-25 19:41:21 -08:00
George Hotz	b1dec64815	new types and fixup ShapeTracker type mismatches	2023-01-25 19:39:36 -08:00
George Hotz	1b624a5051	DeviceBuffer has abstract methods	2023-01-25 19:16:23 -08:00
George Hotz	faab6461dd	that lambda is required	2023-01-25 18:46:56 -08:00
George Hotz	44e96c58b4	touch up pytorch speed tests	2023-01-25 18:11:26 -08:00
George Hotz	8db345d846	functools.partialmethod -> lambda fixes Python 3.11	2023-01-25 18:08:38 -08:00
calledit	a0af1045bf	Some new tests (#440 ) * Make test run * Added new tests: sub pow constant_sub * Fix indentation * Added one to many lines * Fix indentation * Update test_cl_tiler.py * Delete test_cl_tiler.py	2023-01-25 15:40:19 -08:00
George Hotz	aafc29484a	cleanups	2023-01-25 12:37:10 -08:00
George Hotz	919e943867	decent search	2023-01-25 12:20:53 -08:00
George Hotz	7f3da91f8b	kernel_search	2023-01-25 12:05:09 -08:00
George Hotz	e37424424f	first little attempt at search	2023-01-25 11:49:29 -08:00
George Hotz	c15e9c3c7a	comment where future perf should go	2023-01-25 11:13:57 -08:00
George Hotz	ee1f6ab3ca	flip output shape extra dimension indexing for speed	2023-01-25 11:00:37 -08:00
George Hotz	335a261a2e	test for slow kernel	2023-01-25 10:25:22 -08:00
George Hotz	0d594ccc51	mps option in torch (note: it's broken)	2023-01-25 10:10:39 -08:00
George Hotz	66da3bc3c0	reset the benchmark timer	2023-01-25 09:20:34 -08:00
George Hotz	f5be4043ac	fix OSX CL kernel timing	2023-01-25 08:37:18 -08:00
George Hotz	f6fc2a0d98	huh, this prevents an extra kernel	2023-01-25 07:53:35 -08:00
George Hotz	487685919b	Revert "Rename Normalize and move to nn (#415 )" (#474 ) This reverts commit `d768acb6a9`.	2023-01-25 07:50:04 -08:00
Jacky Lee	d768acb6a9	Rename Normalize and move to nn (#415 ) * Rename Normalize and move to nn * Fix comparison to None error * Add test for GroupNorm * Rename test case * Flip parameters to match PyTorch * Increase error tolerance * Fix elementwise_affine on channels * Match arguments with PyTorch * Initialize weight and bias only when affine is true * Is this it? * A bit cleaner * Handle case where weight or bias is None	2023-01-25 07:47:59 -08:00
George Hotz	baf64c14ac	cleanups, simple padding in the processing op	2023-01-25 07:37:52 -08:00
George Hotz	3acf62d489	cleanups for IMAGE=2 conv	2023-01-25 07:18:34 -08:00
George Hotz	6d7658db12	delete opencl <celebration>	2023-01-24 14:18:35 -08:00
George Hotz	e313c8af20	update openpilot tests from OPENCL to GPU	2023-01-24 14:05:20 -08:00
George Hotz	2e1d47b166	there's a bug in scc for empty string	2023-01-24 12:06:06 -08:00
George Hotz	e9c293361b	fix typo	2023-01-24 12:03:58 -08:00
Comma Device	9e2af0a972	too far with the OPTWG	2023-01-24 13:14:59 -06:00
Comma Device	3590848b93	a little more local workgroup options	2023-01-24 12:50:27 -06:00
Comma Device	4b74752c42	fix hotspots by improving the workgroup optimizer	2023-01-24 12:46:28 -06:00
George Hotz	fd760a390a	fix incremental time	2023-01-24 10:19:04 -08:00
George Hotz	7a369b856b	nope, no default NATIVE_EXPLOG	2023-01-24 10:01:52 -08:00
George Hotz	78fedc13d1	native_explog is default	2023-01-24 08:09:43 -08:00
George Hotz	5d350d4883	the ast test is actually a test now	2023-01-24 07:53:24 -08:00
George Hotz	7a159b9b04	tinygrad got big...make it tiny again	2023-01-23 21:33:56 -08:00
George Hotz	6286ace4f1	does this work yet (#471 )	2023-01-23 20:36:17 -08:00
George Hotz	c22554f44a	floats for nvidia	2023-01-23 16:36:10 -08:00
George Hotz	6fe9edf30f	torch cuda is very fast	2023-01-23 16:24:46 -08:00
George Hotz	a949de873b	reduce 2.0 (#469 ) * reduce 2.0 * works * hacks * DEBUG=3 for shapes * fix types * 0s weren't being folded * cleaner * last_reduce is no longer needed * comments and cleanup	2023-01-23 15:11:13 -08:00
George Hotz	a6de94b444	test partial sum	2023-01-22 21:28:40 -08:00

... 181 182 183 184 185 ...

10417 Commits