tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-28 00:08:16 -05:00

Author	SHA1	Message	Date
George Hotz	382f346523	clean up opt (#649 ) * clean up opt * don't let global kernels get too small * 8192 -> 1024 * disable local shape for clang * fix can_merge * unroll the 5x5 depthwise convs in op * load float4 check	2023-03-05 20:49:36 -08:00
George Hotz	0335cb86b9	refactor comparison. there's a bug in the method cache	2023-03-02 10:10:16 -08:00
George Hotz	11f257ddf9	clean up cmp tests	2023-03-02 09:35:16 -08:00
Diogo	52204a7b88	adding comparison operators (#616 ) * Less, LessOrEqual, Greater, GreaterOrEqual, Equal * lint fix * using built in functions * overriding __eq__ breaks things * backwards pass for less - foward only tests * one other spot * removing backwards for comparison ops to match pytorch * raise runtime error * more tests for comparison ops * fixed the lineup * added number upcast tests	2023-03-02 08:10:44 -08:00
George Hotz	e9e71fbfc4	remove mlop (#619 ) * remove mlop * lil simpler	2023-02-28 17:58:24 -08:00
George Hotz	4c4d88aad4	fix the last bug, and make HLOP the default	2023-02-28 17:04:28 -08:00
George Hotz	686a74de92	fast zeros and ones	2023-02-27 06:46:26 -08:00
George Hotz	8b96522e1d	instant identity removal	2023-02-25 09:46:04 -08:00
George Hotz	2c5e13a513	Reluless (#600 ) * replace relu for maximum * fix for other backend * clean up RELU and GT0 * tests for maximum * had to clean that up * why reverse a maximum?	2023-02-25 01:21:16 -08:00
George Hotz	f8f026e8bb	oversized expand for HLOP convs	2023-02-24 21:48:47 -08:00
George Hotz	2e56a4793e	rename log_softmax, support dim, fix onnx Softmax	2023-02-24 10:11:24 -08:00
George Hotz	758515dcc0	conv2d is an hlop (#589 ) * conv2d is an hlop * shorter conv * KOPT=-1 * alt imp * MULACC * smarter mulacc * pop conv * 7x7 -> 5x5 * didn't fix, that's not going to work * this is faster and matches old behavior * oh, non lazy just won't work with mulacc * mulacc in torch * bool types were creeping in * optimizer is actually better with hlop conv * fix pushing permutes issue * refactor einsum_mulacc * fix up readme * update readme * _image_conv2d * fix bias addition location * pushing permutes gets back to 200 kernels * conv cleanup * disable hlop conv * don't hide that in helpers	2023-02-23 17:52:31 -08:00
George Hotz	fd6082dcef	support all _pool2d. conv will eventually be an hlop	2023-02-23 08:19:47 -08:00
George Hotz	c8d89eb20e	avg/max pool strides	2023-02-22 18:00:48 -08:00
Connor Henderson	9670bf1fd1	Add unsqueeze (#574 ) * Add unsqueeze * remove UNSQUEEZE from llops part of readme * make it an hlop	2023-02-20 20:14:59 -08:00
Jacky Lee	ad4f6aa2cf	Add test for quick_gelu (#526 ) * Add test for quick_gelu * Bump PyTorch version for approximate	2023-02-03 20:01:39 -08:00
George Hotz	cd97b036cc	A Triton backend for tinygrad (#470 ) * triton can add * print stuff from triton * write out file * ops triton working * reduce ops * sort of works * Triton bugfixes & implementation of remaining ops (#490) * padding * support pow, max, relu, gt0 * allocate return buffer * Fix reduce * Add tests for power op * Fix triton illegal memory accesses and memory leak (#512) * Fix mypy issue * Add triton to setup.py * Replace torch with pycuda * Use one cuda stream for data transfer and kernels * Remove triton submodule * Fix memory leak by using weakrefs for caching * Fix memory access by adding valid as mask for load * Fix invalid kernel launches by flattening the grid (#515) --------- Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>	2023-02-01 11:53:57 -08:00
Jacky Lee	799b3f185a	Refactor getenv into helpers (#508 ) * Refactor getenv into helpers * Remove unused os * Fix default value * Fix more defaults for CI * Fix bracket * Revert changes to openpilot/compile.py * Use getenv from helpers when possible	2023-01-31 15:09:09 -08:00
George Hotz	de2c419fd4	make_pair and first attempt at hlb_cifar10	2023-01-30 11:07:23 -08:00
George Hotz	bd8a5c2ced	Simple CUDA Runtime (#480 ) * factor out opencl runtime * don't use CL outside the runtime * cuda runtime adds * final_dimension * tests pass with CUDA backend * more cuda * cuda simpler * retain old functionality * linter and typing * move globalcounters out of runtimes * oops, GlobalCounters in cuda * MAX_OUTPUT_SHAPE=3 is fine for CUDA	2023-01-27 16:26:24 -08:00
George Hotz	9245f4650a	indexer changes for master	2023-01-18 18:02:02 -08:00
George Hotz	49c6e6d472	Latest attempt to add image (#462 ) * add image * load + store + boring stuff: * image tests pass * thneed print GFLOPS * op conv test * more debugging * hack for multiview image * shapetracker creates less views * disable image tests * working better * ugh, lkey not key * print in DEBUG, and allow views * works * simple padding conv2d * use index for image * that was bad code * debug print * fix types * less lines * save lines	2023-01-12 17:36:30 -08:00
George Hotz	4885fce56e	shapetracker from newgpu (#456 ) * shapetracker from newgpu * touchup ops * test * testst * thneed deletes unused inputs * test * bugfix	2023-01-09 12:40:01 -08:00
George Hotz	2cc1d970c6	updates from the chonker branch	2022-11-07 21:12:08 -08:00
George Hotz	db2da22a04	stop blowing up floats	2022-10-30 16:47:16 -07:00
George Hotz	8afc643bb1	fix bug in ops test, it was cheating somehow	2022-10-30 16:43:24 -07:00
George Hotz	2f602a92ff	seperate STRIDED and EXPAND	2022-10-30 13:23:58 -07:00
George Hotz	52bfbc31be	vectorization	2022-10-29 12:47:52 -07:00
George Hotz	e473d35f90	llvm doesn't vectorize	2022-10-29 11:59:48 -07:00
George Hotz	b65b70812a	Exec AST (#404 ) * working exec ast * exec_ast is staticmethod * GenericExecAST * fold that sometimes * ExplicitExecAST * exec_ast for GPU * gpu working * get_lazyop_shape * now gpubuffer is ExplicitExecAST * dedup * add a type * RESHAPE in opencl code * fix linter * that too for linter * cleanups * remove dead code * GenericShape is less lines * add ALLOWED_KERNEL_COUNT to tests * fix mypy * that's gotta be recursive * fix opencl shape processing * remove unneeded lambda	2022-10-28 08:27:03 -07:00
George Hotz	10921a60c4	more imports from llvm branch	2022-10-26 18:02:36 -07:00
Drew Hintz	a4ad1d774a	enable tests in test_ops.py that are disabled but now work. (#396 ) remove custom tolerances that don't appear to be needed.	2022-10-13 09:58:53 -07:00
George Hotz	b7f748c15a	Fix GPU 2*31 virtual size limit (#392 ) in progress * big conv test works * that's unneeded * fix opencl with reduce * rewrite contiguous_view_constant_fold * clean up mids in loop code * subidx * print cl kernel before run * no reduce, no loop * Revert "no reduce, no loop" This reverts commit `92777e40e9`.	2022-10-05 00:55:20 -04:00
George Hotz	7a61dc7ee9	test_sd_big_conv	2022-10-01 13:26:05 -04:00
George Hotz	271446e3eb	set requires_grad to None (#387 ) * set requires_grad to None * some things need gradients * hmm, why was get_parameters filtering	2022-09-21 11:16:02 -04:00
George Hotz	29ae21bb0d	import tests from CL metal texture fix	2022-09-19 20:01:47 -04:00
George Hotz	57e804a9bf	add min support	2022-09-18 20:39:41 -04:00
George Hotz	3c3534736e	fix matmul kernel and tests	2022-09-13 08:31:04 -07:00
Comma Device	62e9419206	fix test failure on MATMUL=1 backward pass	2022-09-13 11:18:52 -04:00
Comma Device	3b82afc6a0	simple on device failing test	2022-09-13 10:59:15 -04:00
George Hotz	4efde1ba0a	test_matmul	2022-09-13 07:51:33 -07:00
George Hotz	790af99a48	fix slice one multi, and linear can be simpler with new broadcasting	2022-09-06 19:51:33 -07:00
YassineYousfi	5aad460c7a	broadcast from right to left (#375 ) * broadcast from right to left * add another broadcasted add test	2022-09-06 16:36:13 -07:00
George Hotz	bcb867cdd6	better idea for numbers, do the division in python	2022-09-03 16:23:39 -07:00
George Hotz	033a3ecccf	found tinygrad bug	2022-09-03 12:32:43 -07:00
George Hotz	5d45c6e516	Fold reduce (#362 ) * folding reduce * fold through movementops * fixup shapes * was too aggressive * i knew we needed that * don't recompute reduce * working * fix openpilot compile * prunegraph openpilot * types and reduce_shape * refactor * cleanups * neater * 1009 * 1004 * clean up reduce for 998	2022-07-19 09:24:02 -07:00
George Hotz	f93e297804	fix bug caused by rounding	2022-07-17 12:49:58 -07:00
George Hotz	bcf422dfdd	Device2 (#358 ) * option for matmul * fixups * fast like a nascar * running * thneed runner * no buffer id makes no backing buffer * move constant folding to the top * runs on mac * folded biases * was v slow * maybe just that * elu touchup * speed and float32 Co-authored-by: Comma Device <device@comma.ai>	2022-07-16 07:26:19 -07:00
George Hotz	5e46561f7e	no_grad = NOT backward	2022-07-10 20:54:57 -07:00
George Hotz	b34ae7876f	lol chr(10) not chr(13)	2022-07-10 20:03:11 -07:00

1 2 3 4

159 Commits