tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
George Hotz	cd97b036cc	A Triton backend for tinygrad (#470 ) * triton can add * print stuff from triton * write out file * ops triton working * reduce ops * sort of works * Triton bugfixes & implementation of remaining ops (#490) * padding * support pow, max, relu, gt0 * allocate return buffer * Fix reduce * Add tests for power op * Fix triton illegal memory accesses and memory leak (#512) * Fix mypy issue * Add triton to setup.py * Replace torch with pycuda * Use one cuda stream for data transfer and kernels * Remove triton submodule * Fix memory leak by using weakrefs for caching * Fix memory access by adding valid as mask for load * Fix invalid kernel launches by flattening the grid (#515) --------- Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>	2023-02-01 11:53:57 -08:00
George Hotz	4e24002bbe	no generic exceptions	2023-02-01 11:14:37 -08:00
Jacky Lee	54c68defc7	Replace SIGN with GT0 (#511 ) * Replace sign with gt0 * Replace sign with gt0 * GT0 works on GPU * Fix brackets --------- Co-authored-by: Tom Finet <tom.codeninja@gmail.com>	2023-02-01 11:01:39 -08:00
Jacky Lee	799b3f185a	Refactor getenv into helpers (#508 ) * Refactor getenv into helpers * Remove unused os * Fix default value * Fix more defaults for CI * Fix bracket * Revert changes to openpilot/compile.py * Use getenv from helpers when possible	2023-01-31 15:09:09 -08:00
George Hotz	d91b6711ea	oops, broke BN	2023-01-31 08:18:48 -08:00
George Hotz	21f2af08d5	getenv + graphing	2023-01-30 19:15:03 -08:00
Jacky Lee	491e78d203	Add symbolic tests for correctness (#494 ) * [WIP] Add symbolic tests for correctness * Fix typo * Fix expected value for test_and_fold * Add more tests for symbolic * It is indeed right * Clean up * Check all strings * Put TODO back	2023-01-30 18:40:16 -08:00
George Hotz	60ccddb58b	reenable SWAP	2023-01-30 17:32:02 -08:00
George Hotz	c1a769b68b	fix bug in gpu copy out	2023-01-30 16:51:28 -08:00
George Hotz	e87410c531	fix multiple accumulators	2023-01-30 16:22:26 -08:00
George Hotz	aea55eb196	found failing upcast	2023-01-30 16:12:56 -08:00
George Hotz	b67f997864	tests pass w/o float4	2023-01-30 15:40:49 -08:00
George Hotz	c6f570a2e6	improve progress bar	2023-01-30 14:50:28 -08:00
Kevin Gilpin	4685c9c095	Big changes (#498 ) Use make_pair	2023-01-30 14:42:22 -08:00
George Hotz	7118602c97	goat progress bar	2023-01-30 14:37:26 -08:00
George Hotz	7ee0d99c70	CLCACHE	2023-01-30 14:02:06 -08:00
George Hotz	7457f0d755	KOPT=2	2023-01-30 13:28:06 -08:00
George Hotz	cccfea4b25	factor out KOPT code	2023-01-30 13:13:55 -08:00
George Hotz	de2c419fd4	make_pair and first attempt at hlb_cifar10	2023-01-30 11:07:23 -08:00
AllentDan	7b6b1f32b1	[Fix] fix typo: test_mnist -> datasets (#492 ) * test_mnist -> datasets * fix mnist_gan	2023-01-29 21:30:47 -08:00
George Hotz	2db272c7f7	Kernel Optimizer (#489 ) * kernel optimizer * 10x faster, but wrong. not good deal * move test -> extra * print x speedup * clcache * fix clcache + DEBUG * GFLOPS estimate * i==3	2023-01-29 17:15:00 -08:00
Martin Loretz	43abbd3d00	Use force_create to allocate return buffer (#491 )	2023-01-29 17:13:10 -08:00
George Hotz	bb0cdc2442	111.51x speedup for reduce	2023-01-29 03:06:00 -08:00
George Hotz	45c0aa6e2d	search with SHIFT, REDUCE	2023-01-29 02:42:20 -08:00
George Hotz	87879cf4b6	improve search more	2023-01-29 02:08:57 -08:00
George Hotz	f6bbd43cb8	improve search	2023-01-29 01:33:47 -08:00
George Hotz	ebdec2b72f	fix optimizer	2023-01-29 00:23:06 -08:00
George Hotz	a9cabce791	oops, broke mem estimates	2023-01-28 20:21:31 -08:00
George Hotz	a500e79bd1	don't OPTWG on OS X, it's way slower	2023-01-28 20:02:33 -08:00
George Hotz	b0df4d99a0	os x profiling: this ratio is exact i believe	2023-01-28 19:02:51 -08:00
George Hotz	c0963b723e	should fix tests	2023-01-28 15:13:03 -08:00
George Hotz	b134a4f3d1	don't upcast already upcasted	2023-01-28 14:58:28 -08:00
George Hotz	2f194aadad	loop unrolling upcast	2023-01-28 14:51:24 -08:00
George Hotz	381f3e92da	fix prints, add third conv	2023-01-28 14:10:27 -08:00
George Hotz	92001a06e1	openpilot/go.sh	2023-01-28 13:57:43 -08:00
George Hotz	aea29f8a6e	fix CUDA reduce	2023-01-28 13:38:58 -08:00
George Hotz	0f34c24aeb	move expr_idxs to shapetracker	2023-01-28 12:25:05 -08:00
George Hotz	f2e81f7208	line reduction and cleanups	2023-01-28 12:17:40 -08:00
George Hotz	03dd1201dc	local buffer implied	2023-01-28 12:06:28 -08:00
George Hotz	b3e4e678e8	Use ShapeTracker for tracking shapes in kernels (#485 ) * local is a normal buffer * remove extra shapes and strides * fix opt * fix llvm	2023-01-28 11:56:32 -08:00
George Hotz	259c48f235	discord image is invite link	2023-01-28 11:42:11 -08:00
George Hotz	d748000ada	tinygrad discord	2023-01-28 11:36:15 -08:00
George Hotz	ae810eb558	minor cleanups	2023-01-28 08:59:15 -08:00
George Hotz	713318745d	padding size in get_conv_args	2023-01-28 08:47:18 -08:00
George Hotz	299d1cdc9c	lil cleanup of load ldr	2023-01-28 00:31:57 -08:00
George Hotz	2b5bc5d4a1	factor out image_idx	2023-01-28 00:22:54 -08:00
George Hotz	bd8a5c2ced	Simple CUDA Runtime (#480 ) * factor out opencl runtime * don't use CL outside the runtime * cuda runtime adds * final_dimension * tests pass with CUDA backend * more cuda * cuda simpler * retain old functionality * linter and typing * move globalcounters out of runtimes * oops, GlobalCounters in cuda * MAX_OUTPUT_SHAPE=3 is fine for CUDA	2023-01-27 16:26:24 -08:00
George Hotz	6d5e1a8029	GEMM kernel search	2023-01-27 10:08:57 -08:00
George Hotz	123993156d	refactor group_for_reduce a little	2023-01-27 08:51:23 -08:00
George Hotz	82e58108e3	add flake8 to precommit	2023-01-26 22:31:45 -08:00

1 2 3 4 5 ...

1362 Commits