tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-18 02:21:40 -05:00

Author	SHA1	Message	Date
Lucas Keller	56a06280c5	Testing/utils (#548 ) * New unittest for utils.py Unit test fetch in basic ways. Would have tested more fetches, but downloading stuff for tests is annoying and mocking is more dependencies. * Remove unused imports	2023-02-10 12:08:20 -06:00
George Hotz	5de850f6d5	assign buffer reuse (#547 ) * assign buffer reuse works * fix assign for torch and cpu * allow assign from numpy * fix llvm output_buffer * add some assign tests * fix assignment test * test should fail without lazy * env var to disable assign	2023-02-09 11:53:02 -06:00
George Hotz	473bbd3e35	fix graphs	2023-02-09 09:40:46 -06:00
George Hotz	3d63934995	refactor to keep cl in the runtime (#545 ) * refactor to keep cl in the runtime * fix thneed, rename cl to _cl * bugfix + _cuda * fix tests * thneed more correct	2023-02-08 16:46:09 -06:00
Mitchell Goff	ae4f0aeb5f	NumPy-like semantics for Tensor.__getitem__ (#506 ) * Rewrote Tensor.__getitem__ to fix negative indices and add support for np.newaxis/None * Fixed pad2d * mypy doesn't know about mlops methods * normal python behavior for out-of-bounds slicing * type: ignore * inlined idxfix * added comment for __getitem__ * Better comments, better tests, and fixed bug in np.newaxis	2023-02-08 08:59:46 -06:00
George Hotz	aebe75d9a2	remove val expansion (#539 ) * remove val expansion * types for all shapetracker functions: * more typing * add all the parens to the test * more types * fix tests * very minor speedup	2023-02-07 15:14:05 -06:00
Jared Z	7604b17fbf	TestZeroViewShapeTracker fix test (#481 ) * TestZeroViewST test * updated to align with st naming conventions in file * Update test_shapetracker.py	2023-02-07 06:17:55 -06:00
George Hotz	c073271f20	more symbolic correctness	2023-02-07 00:03:14 -06:00
George Hotz	e961fd3a04	more symbolic test, ModNode is wrong	2023-02-06 23:43:21 -06:00
George Hotz	8cfeb118d6	symbolic new test	2023-02-06 23:27:26 -06:00
George Hotz	c3d81bba2a	test_train: Adam -> SGD	2023-02-06 08:55:41 -06:00
Jacky Lee	ad4f6aa2cf	Add test for quick_gelu (#526 ) * Add test for quick_gelu * Bump PyTorch version for approximate	2023-02-03 20:01:39 -08:00
Jacky Lee	486f023e81	Rename Normalize and move to nn (#513 ) * Rename Normalize and move to nn * Match PyTorch for dim>1	2023-02-01 11:55:03 -08:00
George Hotz	cd97b036cc	A Triton backend for tinygrad (#470 ) * triton can add * print stuff from triton * write out file * ops triton working * reduce ops * sort of works * Triton bugfixes & implementation of remaining ops (#490) * padding * support pow, max, relu, gt0 * allocate return buffer * Fix reduce * Add tests for power op * Fix triton illegal memory accesses and memory leak (#512) * Fix mypy issue * Add triton to setup.py * Replace torch with pycuda * Use one cuda stream for data transfer and kernels * Remove triton submodule * Fix memory leak by using weakrefs for caching * Fix memory access by adding valid as mask for load * Fix invalid kernel launches by flattening the grid (#515) --------- Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>	2023-02-01 11:53:57 -08:00
Jacky Lee	799b3f185a	Refactor getenv into helpers (#508 ) * Refactor getenv into helpers * Remove unused os * Fix default value * Fix more defaults for CI * Fix bracket * Revert changes to openpilot/compile.py * Use getenv from helpers when possible	2023-01-31 15:09:09 -08:00
Jacky Lee	491e78d203	Add symbolic tests for correctness (#494 ) * [WIP] Add symbolic tests for correctness * Fix typo * Fix expected value for test_and_fold * Add more tests for symbolic * It is indeed right * Clean up * Check all strings * Put TODO back	2023-01-30 18:40:16 -08:00
George Hotz	7457f0d755	KOPT=2	2023-01-30 13:28:06 -08:00
George Hotz	cccfea4b25	factor out KOPT code	2023-01-30 13:13:55 -08:00
George Hotz	de2c419fd4	make_pair and first attempt at hlb_cifar10	2023-01-30 11:07:23 -08:00
George Hotz	2db272c7f7	Kernel Optimizer (#489 ) * kernel optimizer * 10x faster, but wrong. not good deal * move test -> extra * print x speedup * clcache * fix clcache + DEBUG * GFLOPS estimate * i==3	2023-01-29 17:15:00 -08:00
George Hotz	ebdec2b72f	fix optimizer	2023-01-29 00:23:06 -08:00
George Hotz	b0df4d99a0	os x profiling: this ratio is exact i believe	2023-01-28 19:02:51 -08:00
George Hotz	2f194aadad	loop unrolling upcast	2023-01-28 14:51:24 -08:00
George Hotz	381f3e92da	fix prints, add third conv	2023-01-28 14:10:27 -08:00
George Hotz	bd8a5c2ced	Simple CUDA Runtime (#480 ) * factor out opencl runtime * don't use CL outside the runtime * cuda runtime adds * final_dimension * tests pass with CUDA backend * more cuda * cuda simpler * retain old functionality * linter and typing * move globalcounters out of runtimes * oops, GlobalCounters in cuda * MAX_OUTPUT_SHAPE=3 is fine for CUDA	2023-01-27 16:26:24 -08:00
George Hotz	1b624a5051	DeviceBuffer has abstract methods	2023-01-25 19:16:23 -08:00
George Hotz	44e96c58b4	touch up pytorch speed tests	2023-01-25 18:11:26 -08:00
calledit	a0af1045bf	Some new tests (#440 ) * Make test run * Added new tests: sub pow constant_sub * Fix indentation * Added one to many lines * Fix indentation * Update test_cl_tiler.py * Delete test_cl_tiler.py	2023-01-25 15:40:19 -08:00
George Hotz	e37424424f	first little attempt at search	2023-01-25 11:49:29 -08:00
George Hotz	335a261a2e	test for slow kernel	2023-01-25 10:25:22 -08:00
George Hotz	487685919b	Revert "Rename Normalize and move to nn (#415 )" (#474 ) This reverts commit `d768acb6a9`.	2023-01-25 07:50:04 -08:00
Jacky Lee	d768acb6a9	Rename Normalize and move to nn (#415 ) * Rename Normalize and move to nn * Fix comparison to None error * Add test for GroupNorm * Rename test case * Flip parameters to match PyTorch * Increase error tolerance * Fix elementwise_affine on channels * Match arguments with PyTorch * Initialize weight and bias only when affine is true * Is this it? * A bit cleaner * Handle case where weight or bias is None	2023-01-25 07:47:59 -08:00
George Hotz	6d7658db12	delete opencl <celebration>	2023-01-24 14:18:35 -08:00
George Hotz	5d350d4883	the ast test is actually a test now	2023-01-24 07:53:24 -08:00
George Hotz	6fe9edf30f	torch cuda is very fast	2023-01-23 16:24:46 -08:00
George Hotz	a949de873b	reduce 2.0 (#469 ) * reduce 2.0 * works * hacks * DEBUG=3 for shapes * fix types * 0s weren't being folded * cleaner * last_reduce is no longer needed * comments and cleanup	2023-01-23 15:11:13 -08:00
George Hotz	a6de94b444	test partial sum	2023-01-22 21:28:40 -08:00
George Hotz	708215d06b	Typing (#468 ) * we typing * types look good in theory * most tests pass * gpu tests pass * TEST_AST * delete comments * i must have written that bug so many times * bugfix * don't merge the small ones * add f to constants * commits from reduce * don't GCD the mod nodes * broken and a hack IMAGE=3 * group for reduce * fix linter + mypy * move out test ast * insource TENSOR_TYPE_TO_NP_TYPE * does this fix it? * move imports out	2023-01-21 09:09:22 -08:00
George Hotz	b29614592a	first conv/second conv	2023-01-19 13:26:11 -08:00
George Hotz	3d697577b2	print_ast	2023-01-19 13:22:03 -08:00
George Hotz	0881d504c1	move shapetracker (#466 ) * move shapetracker * shapetracker test * move ast * move a few things * fix print kernel * fix test * symbolic fixups	2023-01-19 09:56:31 -08:00
George Hotz	2b47ee401f	Symbolic for indexes (#464 ) * indexer * works * all use indexer * boolean in the indexer too * symbolic is a better name than indexer * better symbolic API * min and max * symbolic tests * work * more tests * fix demodder * __str__ in the superclass * NumNode * awesome that works * still works * fix up parens * fix zeroviews * dead lines * expr_node * works * still works * refactor to not use __new__ methods * ugh something went wrong a while ago * this fixes it * mod and div at the end * test * symbolic * working * one linter issue fixed * other division * more simplifys * works * validhacks * VALIDHACKS passes thneed * no str replace stuff * inline indexes * NATIVE_EXPLOG and factoring * factor both ways * cl indexing * split on mod, not just full * onnxlimit * fix output shape * op_estimate is a function of the program * no ones in the index * four_float4 * ALLOW_4FLOAT4 * test passes * compute then store * loads first * bugfix * better, but doesn't match * select xb in smart way * new test and bugfix * no change to lazy * Node fixes linter * fix opencl with op_estimate * fix mypy * revert valid * remove unused	2023-01-19 07:21:30 -08:00
George Hotz	9245f4650a	indexer changes for master	2023-01-18 18:02:02 -08:00
George Hotz	287699c32c	simplify ones after axis splitting	2023-01-14 10:51:43 -08:00
George Hotz	49c6e6d472	Latest attempt to add image (#462 ) * add image * load + store + boring stuff: * image tests pass * thneed print GFLOPS * op conv test * more debugging * hack for multiview image * shapetracker creates less views * disable image tests * working better * ugh, lkey not key * print in DEBUG, and allow views * works * simple padding conv2d * use index for image * that was bad code * debug print * fix types * less lines * save lines	2023-01-12 17:36:30 -08:00
George Hotz	fff1f046b0	Simple version of the new GPU backend (#458 ) * newgpu * more to delete * hmm, tests pass with constant folding * fix lint/type * fix constant folding * comment and rerun tests * lazy touchups * fix graph_batchnorm test * smaller transformer to fix OOM * Revert "smaller transformer to fix OOM" This reverts commit `a44ef8edc2`. * no func cache * introspect * touchups * CLASTKernel * ugh, it was lru_cache * codegen * spacing * old gpu still in opencl * typing fix	2023-01-10 19:16:02 -08:00
George Hotz	bfd4f4e35c	testdocker	2023-01-09 12:41:52 -08:00
George Hotz	4885fce56e	shapetracker from newgpu (#456 ) * shapetracker from newgpu * touchup ops * test * testst * thneed deletes unused inputs * test * bugfix	2023-01-09 12:40:01 -08:00
cloud11665	4fb97b8de0	don't fail when termcolor is not installed (#436 )	2022-11-14 16:45:06 -08:00
George Hotz	5e07d4669d	the speedy chonker is going to replace the old chonker (#432 ) * bringing back reshape and permute * done with E701 * 4x4 works in generic way * max and sum not vectorizing... * special case single float * support comparing to MPS * improve matmul speed, consider generic principles * GlobalCounter * fix op tracking * faster * comment that out for now * err, it needs that * fix minor issues * fix global_mem	2022-11-11 18:34:24 -08:00

... 7 8 9 10 11 ...

791 Commits