tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 01:18:26 -05:00

Author	SHA1	Message	Date
wozeparrot	7351eb4b61	feat: put temperary file in the same directory as the destination file (#805 )	2023-05-25 20:46:02 -07:00
Diogo	c19ef0fcce	Add sin/cos/tan (#794 ) * added sin/cos/tan * fix lint * added onnx ops support	2023-05-25 09:04:56 -07:00
George Hotz	0400315078	Revert "ops rdna" This reverts commit `81a11d891d`.	2023-05-21 13:02:18 -07:00
George Hotz	325a3bf2cf	Revert "writing 2" This reverts commit `dddd6c42f0`.	2023-05-21 13:02:17 -07:00
George Hotz	dddd6c42f0	writing 2	2023-05-21 12:52:36 -07:00
George Hotz	81a11d891d	ops rdna	2023-05-21 11:45:38 -07:00
George Hotz	90fff82c8a	Rdna (#776 ) * assembler maybe * custom asm * rdna3 on quiet * trigger crashes * fixed notes * non-fatal rdna2 crash * Crash4 * improve rdna sniffer * comments * improve sniffer * asm * 131 TFLOPS RDNA3 * opt simple matmul * todos	2023-05-16 05:33:57 -07:00
George Hotz	89b8b39d9c	fix mypy	2023-05-13 21:25:36 -07:00
George Hotz	e0b2035023	fast imagenet eval, gets 76.14% across the set	2023-05-13 21:18:31 -07:00
George Hotz	46d419060b	start on mlperf models	2023-05-10 16:30:49 -07:00
George Hotz	cb7c22beeb	fix mypy	2023-05-06 19:18:54 +00:00
George Hotz	5190037cbc	rocm: disassembler for shader	2023-05-06 19:07:52 +00:00
George Hotz	42256c0d9d	rocm sniffer dumps code	2023-05-05 18:36:53 +00:00
George Hotz	f2a964f447	nocopy (#764 )	2023-05-05 09:32:06 -07:00
George Hotz	3a2011ab2d	rocm sniffer	2023-05-04 22:22:39 +00:00
George Hotz	a55c4f5000	better rocm build scripts	2023-05-04 09:14:05 +00:00
George Hotz	987b1aaf96	rocm build scripts	2023-05-04 08:45:23 +00:00
George Hotz	ed33a89d52	no werror in archprobe	2023-05-03 19:34:17 +00:00
George Hotz	7ecf4dff68	multi cl_queue (#762 ) * multi cl_queue * only platforms 1 * gpus first, then cpus * put device on underlying buffer * cl_queue array	2023-05-03 12:15:28 -07:00
George Hotz	3b933b0a2f	rocm setup script	2023-05-03 16:01:17 +00:00
George Hotz	59d0d168cd	FLOAT16 off works	2023-04-19 15:34:56 -07:00
George Hotz	3d15769a8f	50 TFLOPS cuda matmul	2023-04-19 14:38:24 -07:00
George Hotz	0b5a0b9ba4	winograd comment	2023-04-16 03:36:51 -07:00
George Hotz	8b777af571	metal_conv gets over 10.4 TFLOPS...	2023-04-15 03:31:22 -07:00
George Hotz	d66e682205	metal matmul from tcores branch	2023-04-14 23:29:29 -07:00
Sohaib	70b9072663	add Pad onnx operator and rework _padding (#740 )	2023-04-06 17:07:36 +05:30
George Hotz	94e2c49c35	test_cacheline_size that works in both places	2023-03-30 06:47:20 +04:00
George Hotz	b05c2828f7	better cacheline test	2023-03-30 06:08:54 +04:00
George Hotz	76db1af6fc	better archprobe	2023-03-30 05:52:00 +04:00
George Hotz	20894991ed	good changes from the M1 Tensor Core project (#730 ) * good changes * working except llvm * llvm types * nice acc * archprobe * lang.float4 * use self.acc for late acc * fix store bug	2023-03-29 05:11:02 +04:00
George Hotz	68e45fca18	metal_matmul: bw and torch sync	2023-03-23 08:02:04 -07:00
George Hotz	bd6c3c31a9	compare to torch	2023-03-22 23:58:37 -07:00
George Hotz	c3a3db75c7	fix metal matmul example	2023-03-22 23:42:51 -07:00
George Hotz	b12b60af20	fix binop, other tests failure (#723 ) * fix binop, other tests failure * that was a bad idea * better layernorm * inference kernel count tests * new style reshape pushing * fixup replacement * 199 kernels is okay. fix flops * push reshape through unaryops only * GRAPH=2 draws the phantom ops * found resnet issue * non working test * mul is cheaper than div * OPT inflation * SHUFFLE_PAD_OPS in OPT=2	2023-03-22 18:15:07 -07:00
Fernando Vidal	73bd0b217b	add int64 as supported dtype from numpy (#699 ) * add int64 as supported dtype from numpy Without this, examples/transformer.py didn't run. With this change it runs successfully. * Update helpers.py * Update transformer.py * Update training.py	2023-03-18 17:15:04 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
Kirill	0532025b04	Fix llama 13B weights loading (#700 ) * Fix llama 13B weights loading * refactor more * add test * test storage offset * fix spacing * fix strides * llama 13B working? * yolo? * better test for seeks	2023-03-15 08:59:52 -07:00
George Hotz	15e0b56e39	compile works (#688 ) * compile works * runtimes * line count * fix custom, to tg dtype * meh, that's fine with lazy import	2023-03-12 11:01:25 -07:00
Kirill	af7745073f	Add comments to SD (#686 ) * Add explanation for empty lambdas * Fix my_unpickle if pytorch_lightning is installed * oops	2023-03-12 10:56:49 -07:00
George Hotz	6c3675c01c	_mmap loads to gpu fast	2023-03-11 23:00:13 -08:00
George Hotz	803b0aef28	track memory for numpy/torch	2023-03-11 20:39:10 -08:00
Diogo	784afc6c6f	Eq magic function support (#683 ) * add eq magic func * changed from eq to __eq__ * ignore type for linter * mypy doenst like descriptions :(	2023-03-11 10:31:46 -08:00
George Hotz	01f39b19dc	move to shapetracker.py	2023-03-11 07:50:07 -08:00
George Hotz	f3ac52aee8	Mypyc (#680 ) * building shapetracker * default ENABLE_METHOD_CACHE * symbolic compiles * improve types * tensor compiles * oops, that's a bug * best of both worlds * find legit typing bugs * pad2d can take list or tuple * sub 200ms when compiled	2023-03-11 07:33:30 -08:00
George Hotz	d7cb8e3e56	multithreaded fake_torch_load_zipped	2023-03-10 19:16:27 -08:00
George Hotz	b1206bcb18	third try at torch loading (#677 ) * third try at torch loading * numpy fixed * fix enet compile * load_single_weight supports empty weights * oops, CPU wasn't the default * so many bugs	2023-03-10 19:11:29 -08:00
George Hotz	4780f9a6df	llama runs (slowly) in master	2023-03-10 17:36:51 -08:00
George Hotz	1826ff6b89	dtypes nice and clean (#673 ) * add dtype class * dtypes * buffers are lazy * dtype is tracked by lazybuffer and GenericShape * fix types in llvm * llvm store * dtype tests * fix tests maybe * fix flop counter * fix CI * CI fix and check format * fix dtype and dtype check * fix custom test * fix test graph	2023-03-10 16:56:07 -08:00
George Hotz	d26345595d	more llama stuff	2023-03-10 10:48:10 -08:00
George Hotz	1a039306d2	good changes from llama branch (#671 ) * good changes from llama * transpose behavior changed	2023-03-09 20:51:22 -08:00

... 21 22 23 24 25 ...

1363 Commits