tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
George Hotz	42256c0d9d	rocm sniffer dumps code	2023-05-05 18:36:53 +00:00
George Hotz	f2a964f447	nocopy (#764 )	2023-05-05 09:32:06 -07:00
George Hotz	3a2011ab2d	rocm sniffer	2023-05-04 22:22:39 +00:00
George Hotz	a55c4f5000	better rocm build scripts	2023-05-04 09:14:05 +00:00
George Hotz	987b1aaf96	rocm build scripts	2023-05-04 08:45:23 +00:00
George Hotz	ed33a89d52	no werror in archprobe	2023-05-03 19:34:17 +00:00
George Hotz	7ecf4dff68	multi cl_queue (#762 ) * multi cl_queue * only platforms 1 * gpus first, then cpus * put device on underlying buffer * cl_queue array	2023-05-03 12:15:28 -07:00
George Hotz	3b933b0a2f	rocm setup script	2023-05-03 16:01:17 +00:00
George Hotz	59d0d168cd	FLOAT16 off works	2023-04-19 15:34:56 -07:00
George Hotz	3d15769a8f	50 TFLOPS cuda matmul	2023-04-19 14:38:24 -07:00
George Hotz	0b5a0b9ba4	winograd comment	2023-04-16 03:36:51 -07:00
George Hotz	8b777af571	metal_conv gets over 10.4 TFLOPS...	2023-04-15 03:31:22 -07:00
George Hotz	d66e682205	metal matmul from tcores branch	2023-04-14 23:29:29 -07:00
Sohaib	70b9072663	add Pad onnx operator and rework _padding (#740 )	2023-04-06 17:07:36 +05:30
George Hotz	94e2c49c35	test_cacheline_size that works in both places	2023-03-30 06:47:20 +04:00
George Hotz	b05c2828f7	better cacheline test	2023-03-30 06:08:54 +04:00
George Hotz	76db1af6fc	better archprobe	2023-03-30 05:52:00 +04:00
George Hotz	20894991ed	good changes from the M1 Tensor Core project (#730 ) * good changes * working except llvm * llvm types * nice acc * archprobe * lang.float4 * use self.acc for late acc * fix store bug	2023-03-29 05:11:02 +04:00
George Hotz	68e45fca18	metal_matmul: bw and torch sync	2023-03-23 08:02:04 -07:00
George Hotz	bd6c3c31a9	compare to torch	2023-03-22 23:58:37 -07:00
George Hotz	c3a3db75c7	fix metal matmul example	2023-03-22 23:42:51 -07:00
George Hotz	b12b60af20	fix binop, other tests failure (#723 ) * fix binop, other tests failure * that was a bad idea * better layernorm * inference kernel count tests * new style reshape pushing * fixup replacement * 199 kernels is okay. fix flops * push reshape through unaryops only * GRAPH=2 draws the phantom ops * found resnet issue * non working test * mul is cheaper than div * OPT inflation * SHUFFLE_PAD_OPS in OPT=2	2023-03-22 18:15:07 -07:00
Fernando Vidal	73bd0b217b	add int64 as supported dtype from numpy (#699 ) * add int64 as supported dtype from numpy Without this, examples/transformer.py didn't run. With this change it runs successfully. * Update helpers.py * Update transformer.py * Update training.py	2023-03-18 17:15:04 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
Kirill	0532025b04	Fix llama 13B weights loading (#700 ) * Fix llama 13B weights loading * refactor more * add test * test storage offset * fix spacing * fix strides * llama 13B working? * yolo? * better test for seeks	2023-03-15 08:59:52 -07:00
George Hotz	15e0b56e39	compile works (#688 ) * compile works * runtimes * line count * fix custom, to tg dtype * meh, that's fine with lazy import	2023-03-12 11:01:25 -07:00
Kirill	af7745073f	Add comments to SD (#686 ) * Add explanation for empty lambdas * Fix my_unpickle if pytorch_lightning is installed * oops	2023-03-12 10:56:49 -07:00
George Hotz	6c3675c01c	_mmap loads to gpu fast	2023-03-11 23:00:13 -08:00
George Hotz	803b0aef28	track memory for numpy/torch	2023-03-11 20:39:10 -08:00
Diogo	784afc6c6f	Eq magic function support (#683 ) * add eq magic func * changed from eq to __eq__ * ignore type for linter * mypy doenst like descriptions :(	2023-03-11 10:31:46 -08:00
George Hotz	01f39b19dc	move to shapetracker.py	2023-03-11 07:50:07 -08:00
George Hotz	f3ac52aee8	Mypyc (#680 ) * building shapetracker * default ENABLE_METHOD_CACHE * symbolic compiles * improve types * tensor compiles * oops, that's a bug * best of both worlds * find legit typing bugs * pad2d can take list or tuple * sub 200ms when compiled	2023-03-11 07:33:30 -08:00
George Hotz	d7cb8e3e56	multithreaded fake_torch_load_zipped	2023-03-10 19:16:27 -08:00
George Hotz	b1206bcb18	third try at torch loading (#677 ) * third try at torch loading * numpy fixed * fix enet compile * load_single_weight supports empty weights * oops, CPU wasn't the default * so many bugs	2023-03-10 19:11:29 -08:00
George Hotz	4780f9a6df	llama runs (slowly) in master	2023-03-10 17:36:51 -08:00
George Hotz	1826ff6b89	dtypes nice and clean (#673 ) * add dtype class * dtypes * buffers are lazy * dtype is tracked by lazybuffer and GenericShape * fix types in llvm * llvm store * dtype tests * fix tests maybe * fix flop counter * fix CI * CI fix and check format * fix dtype and dtype check * fix custom test * fix test graph	2023-03-10 16:56:07 -08:00
George Hotz	d26345595d	more llama stuff	2023-03-10 10:48:10 -08:00
George Hotz	1a039306d2	good changes from llama branch (#671 ) * good changes from llama * transpose behavior changed	2023-03-09 20:51:22 -08:00
George Hotz	d8dda2af3a	openpilot fixups	2023-03-06 14:14:44 -08:00
George Hotz	a77d792aff	Codegen gpu cleanups (#640 ) * cleanups * fixups * handle pre upcasted global buffers * early is just required * delete junk from hand coded opt * implicit upcast_in_mid_reduce * speedup * fix exec w validhacks * reorder opt * only need to check the output for that * return total runtime from kernels if debugging	2023-03-04 15:31:51 -08:00
Patrick Geneva	117111825c	Fix windows file permission error (#634 )	2023-03-04 09:23:55 -08:00
George Hotz	528cb3b3b9	fix ast test	2023-03-04 07:49:25 -08:00
George Hotz	893f136fe0	lines from helpers	2023-03-03 23:07:46 -08:00
George Hotz	c53efb3635	optimize for CL (#633 ) * required opt * simplify * works * shift_to_last * required is fine * print shape in colored * better shape * args was wrong * debugs * fix empty shape * colored shape printer	2023-03-03 22:00:09 -08:00
Diogo	52204a7b88	adding comparison operators (#616 ) * Less, LessOrEqual, Greater, GreaterOrEqual, Equal * lint fix * using built in functions * overriding __eq__ breaks things * backwards pass for less - foward only tests * one other spot * removing backwards for comparison ops to match pytorch * raise runtime error * more tests for comparison ops * fixed the lineup * added number upcast tests	2023-03-02 08:10:44 -08:00
George Hotz	d062cc82b8	put restrict back	2023-03-01 21:34:45 -08:00
George Hotz	bfcec234a2	Refactor ASTs (#622 ) * ugh worst branch name * compiler refactor continues * scc -> cloc * buf -> _buf * finish _buf, and program -> runtime * gpu is still working, clang isn't * clang in new style * ops_metal * something broke it * improve metal * clean up tons of cl crap * hack fix sync * cleaner gpu * gpu metal clang * cleanups * minor refactor * GPUCodegen * fix up LLVM * blind CUDA refactor * codegen / runtime * keep ops naming * linter passes * woah, llvm was allocing 4x what it needed to * bugfixes * fix openpilot compiler * fix compile_efficientnet * method cache should fix tests * deal with duped functions	2023-03-01 18:57:29 -08:00
George Hotz	7e6edfbc64	unbreak onnx conv padding	2023-02-28 13:55:03 -08:00
George Hotz	7d556ca7e0	avg/max pool work in N-D	2023-02-28 13:38:27 -08:00
George Hotz	d584bae5c0	fine, openpilot can have 197 kernels	2023-02-27 11:48:36 -08:00

... 24 25 26 27 28 ...

1501 Commits