tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 05:48:08 -05:00

Author	SHA1	Message	Date
George Hotz	bfd4f4e35c	testdocker	2023-01-09 12:41:52 -08:00
George Hotz	4885fce56e	shapetracker from newgpu (#456 ) * shapetracker from newgpu * touchup ops * test * testst * thneed deletes unused inputs * test * bugfix	2023-01-09 12:40:01 -08:00
Faisal Memon	538b1d7f5b	Print out the tensor using numpy(). (#454 ) This commit resolves issue https://github.com/geohot/tinygrad/issues/453 In the example code in the README.md, when it is run, it prints for Tiny Grad the tensors as: <Tensor <LB (3, 3) op:MovementOps.RESHAPE> with grad None> <Tensor <LB (1, 3) op:MovementOps.RESHAPE> with grad None> But to be equivalent to the output of the Torch example, we need to use numpy() to get it to show: [[ 2. 2. 2.] [ 0. 0. 0.] [-2. -2. -2.]] [[1. 1. 1.]]	2023-01-09 10:08:05 -08:00
nogira	2e744ef2f2	confirmed (#449 ) w/ a bunch of print statements in the official model here: `ce05de2819/ldm/modules/diffusionmodules/openaimodel.py (L413)`	2023-01-07 08:41:06 -08:00
Nicolai Stoianov	8dbf76268d	Add step for setting up Stable Diffusion (#452 )	2023-01-07 08:40:12 -08:00
cloud11665	4fb97b8de0	don't fail when termcolor is not installed (#436 )	2022-11-14 16:45:06 -08:00
George Hotz	5e07d4669d	the speedy chonker is going to replace the old chonker (#432 ) * bringing back reshape and permute * done with E701 * 4x4 works in generic way * max and sum not vectorizing... * special case single float * support comparing to MPS * improve matmul speed, consider generic principles * GlobalCounter * fix op tracking * faster * comment that out for now * err, it needs that * fix minor issues * fix global_mem	2022-11-11 18:34:24 -08:00
George Hotz	d2273d2cc4	s/contiguous_op/contiguous	2022-11-11 00:07:05 -08:00
George Hotz	b8c94a67c9	Simple chonker (#431 ) * chonker will make llvm fast * work * better speed tests, we will make them fast * with the cache add is the same speed * relu and neg are fast * fix sum speed * maximum maxnum? * hack for gemm opt * gemm very slow * zeros like * test_permute * shapetracker returns self * fix shapetracker factorization * err, int strides * permutes are faster now in tinygrad than pytorch * support -1 in expand * gemm unrolled * improve final test case * WIP GEMM * why isn't GEMM fast? * revert cache dim * ffp contract works on clang, not llvm? * ignore llvm ir * this makes fma work at least, but no faster * USE_4x4 * 63 GFLOPS * 87 GFLOPS * that wasn't matmul, 44 GFLOPS now * 82 GFLOPS permuted * this permute too * a little speed for the convs * 45 GFLOPS * speed tests pass again * clean up prints * fix FMA WHAT A WASTE OF TIME * colors * moar fair * GPU * useless on chonker * cleanups * improve factorized shapetracker * better threshold * label conv * work * ops test pass again * hot load the index * run the last view, no need to create * ZeroView needs a repr for the key to work * fix segfault on out of bounds * one more test * start amx, and llvm.initialize_native_asmparser * amx works * nice AMX class * nicer AMX class * refactor get_idxs * amx working * is slower... * useless flip * cache * SZ_X * AMX_SZ_X/Y work alone * Contiguous mlop * test gemm packed * PREPARE in packed * use_amx factor * prefetch isn't faster * loop * same 3ms * 2.24 ms * allow double on store in TG * amx reduce is the same speed as non amx reduce * include memory bandwidth * clean up shapetracker * flip returns stride * prepare for upstream * Update ops_llvm.py (#426) * permutes are yellow and green now * faster conv * llvm cleanups * Show optimised IR under debug 4 (#428) * ASTKernel class * Make tinygrad work with older python version (#427) * Make tinygrad work with older python version * Use partialmethod instead of partial * smiple chonker is chonking * remove junk from test speed vs torch * fix linker and types * AMX is only here now * add LLVM tests, it's a valid backend now * oops, run llvm test * contiguous_op * fix loadops compare * dedup reduceops Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>	2022-11-10 23:17:09 -08:00
George Hotz	bff47e9dc1	contiguous, and no strided for matmul	2022-11-09 16:56:26 -08:00
George Hotz	1271f19a2b	factorizing shapetracker from chonker	2022-11-09 16:36:38 -08:00
Daniel Davis	64ff1ddc10	Reduce line count (#424 ) * save a line, save a life * save a line, save a life * change order of tern	2022-11-09 10:07:22 -08:00
George Hotz	0994705166	contrib more	2022-11-08 19:14:37 -08:00
George Hotz	c0bba9649a	more that	2022-11-08 19:13:11 -08:00
George Hotz	5143da6a9f	contributing	2022-11-08 19:12:12 -08:00
Daniel Davis	4998bf49b3	Basic editorconfig support (#422 ) Almost every IDE or texteditor supports [editorconfig](https://editorconfig.org/). I've set it up to just enforce the 2 space python indents for now.	2022-11-08 10:34:25 -08:00
marcojob	c3d9c9b24c	Fix issue where batch_invstd not being set (#421 ) batch_invstd can be falsely assumed to be set, even though it is None since hasattr will not return false in this case BatchNorm2D a reshape will be attempted then, which causes an exception	2022-11-08 09:24:53 -08:00
Liam	8dc28dd733	Create python-publish.yml (#163 ) v0.4.0	2022-11-08 08:45:01 -08:00
George Hotz	92ed87b0a5	bump version to 0.4.0	2022-11-08 08:44:42 -08:00
George Hotz	9781b4c3af	rename test functions to helper_	2022-11-07 21:27:56 -08:00
George Hotz	9884be2ad5	ugh, that too	2022-11-07 21:21:35 -08:00
George Hotz	537a9eb414	fix termcolor import	2022-11-07 21:19:08 -08:00
George Hotz	2cc1d970c6	updates from the chonker branch	2022-11-07 21:12:08 -08:00
George Hotz	d878065ece	Gemm (#416 ) * gemm * off by factor of 5 * 50 GFLOPS * works * 91 gflops * working at 50G * works * iy * 150 GFLOPS * 150 GFLOPS * N=2048 is still fast * threading soon * multithread * pinning * throttling is sad * Align matrices to cacheline width (#361) Co-authored-by: cloud <Cloud11665@gmail.com>	2022-11-06 10:07:28 -08:00
George Hotz	caea34c529	1s are always mergable	2022-11-03 10:50:48 -07:00
George Hotz	c48fc47d01	fix type error	2022-10-31 09:56:56 -07:00
George Hotz	9585b6c0cf	comments and readability in lazy.py	2022-10-30 19:50:48 -07:00
George Hotz	db2da22a04	stop blowing up floats	2022-10-30 16:47:16 -07:00
George Hotz	8afc643bb1	fix bug in ops test, it was cheating somehow	2022-10-30 16:43:24 -07:00
George Hotz	b7a115e5e5	rewrite some strideds into reshapes	2022-10-30 16:31:27 -07:00
George Hotz	8c849e637c	that was in there twice, DEBUG>=4 to see loop opt	2022-10-30 15:31:39 -07:00
George Hotz	cfdf803b52	fix llvm vectorization by add analysis passes from the target machine	2022-10-30 15:28:36 -07:00
George Hotz	2f602a92ff	seperate STRIDED and EXPAND	2022-10-30 13:23:58 -07:00
George Hotz	544cb0a069	oops, remove while(1)	2022-10-29 14:05:13 -07:00
George Hotz	4b6097f81d	more amx notes	2022-10-29 14:04:10 -07:00
George Hotz	fdb43fe553	gemm is 1.7 TFLOPS on a single M1 core	2022-10-29 13:42:33 -07:00
George Hotz	52bfbc31be	vectorization	2022-10-29 12:47:52 -07:00
George Hotz	e473d35f90	llvm doesn't vectorize	2022-10-29 11:59:48 -07:00
George Hotz	86eb06eb76	accurate flop estimation	2022-10-28 19:13:20 -07:00
George Hotz	7909786dbf	one more opt test	2022-10-28 18:37:53 -07:00
George Hotz	dd543fbc7a	MovementOps is unused	2022-10-28 18:26:08 -07:00
George Hotz	71b336503f	no RESHAPEs in the AST	2022-10-28 18:25:30 -07:00
George Hotz	294ab9e2f8	more test opt	2022-10-28 18:04:12 -07:00
George Hotz	f885ceb695	test speed w/o bias	2022-10-28 11:22:15 -07:00
George Hotz	3735e26492	very minor	2022-10-28 09:39:30 -07:00
George Hotz	c0050fab8f	clean up movement_op in cpu and torch	2022-10-28 09:29:12 -07:00
George Hotz	df31dde174	hasattr and DeviceBuffer type fixups	2022-10-28 09:05:45 -07:00
George Hotz	e6b65f8e01	fix graph in openpilot/compile.py	2022-10-28 08:55:34 -07:00
George Hotz	1013540370	fix flake8	2022-10-28 08:52:53 -07:00
George Hotz	804b2dd001	move into graph.py	2022-10-28 08:50:11 -07:00

1 2 3 4 5 ...

1232 Commits