tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-10 14:45:35 -05:00

Author	SHA1	Message	Date
George Hotz	81a11d891d	ops rdna	2023-05-21 11:45:38 -07:00
George Hotz	ed038ba129	Contract float4 ALU operations (#780 ) * wrong expand * tests passing * pass lint	2023-05-16 19:03:49 -07:00
George Hotz	90fff82c8a	Rdna (#776 ) * assembler maybe * custom asm * rdna3 on quiet * trigger crashes * fixed notes * non-fatal rdna2 crash * Crash4 * improve rdna sniffer * comments * improve sniffer * asm * 131 TFLOPS RDNA3 * opt simple matmul * todos	2023-05-16 05:33:57 -07:00
George Hotz	89b8b39d9c	fix mypy	2023-05-13 21:25:36 -07:00
George Hotz	e0b2035023	fast imagenet eval, gets 76.14% across the set	2023-05-13 21:18:31 -07:00
Jacky Lee	c552f6f92b	Inference test: add tests for ResNet50 (#773 ) * Add ResNet inference test and cannon * Test with ResNet50 * test_car works with resnet fix	2023-05-13 21:18:15 -07:00
Rabia Eda Yılmaz	e5b4b36cba	add std to tensor.py (#767 ) * add std * delete comment * edit: one liner std, add: test * adjust * fix: shape mismatch * set unbiased to False * added unbiased option * fix unbiased option in test and clean code * better * generalize axis * holly coffee molly * generalize axes without unbiased opt. * hopefully done * complete unbiased true for axes * Update test_ops.py * fixed * std completed without bessels correction * fix comment * ups	2023-05-13 12:20:44 -07:00
George Hotz	b705510d5c	getting 77% on imagenet eval	2023-05-13 07:46:27 -07:00
George Hotz	810f03dafa	conv3d + unet3d (#772 ) * conv3d, needs test * test passes, padding wrong on unet * unet3d * no conv3d on images	2023-05-12 13:54:07 -07:00
George Hotz	46d419060b	start on mlperf models	2023-05-10 16:30:49 -07:00
Jacky Lee	d13629cb26	ResNet: match implementation with Nvidia and PyTorch (#770 ) * Match ResNet implementation with pytorch and nvidia * Reduce number of Epochs	2023-05-10 09:01:22 -07:00
Jacky Lee	b80cf9220c	Statistics test: check if distributions match torch (#769 ) * Check if tensor values match torch * Clean up randomness tests and remove dependency * Remove kaiming uniform test	2023-05-07 21:43:23 -07:00
George Hotz	cb7c22beeb	fix mypy	2023-05-06 19:18:54 +00:00
George Hotz	5190037cbc	rocm: disassembler for shader	2023-05-06 19:07:52 +00:00
George Hotz	7fbf96b992	jit: TODO, use abstractions	2023-05-05 22:51:30 -07:00
George Hotz	0cd3feb452	jit oops. should add that to commit tests	2023-05-05 22:01:13 -07:00
George Hotz	5b2ae262db	assertions for jit	2023-05-05 21:56:32 -07:00
George Hotz	42256c0d9d	rocm sniffer dumps code	2023-05-05 18:36:53 +00:00
George Hotz	81aa3e546b	exclude GPU on tiny (#766 )	2023-05-05 10:07:23 -07:00
George Hotz	f2a964f447	nocopy (#764 )	2023-05-05 09:32:06 -07:00
George Hotz	466ffeb04f	fast cifar on AMD	2023-05-05 02:10:50 +00:00
George Hotz	3a2011ab2d	rocm sniffer	2023-05-04 22:22:39 +00:00
George Hotz	a55c4f5000	better rocm build scripts	2023-05-04 09:14:05 +00:00
George Hotz	987b1aaf96	rocm build scripts	2023-05-04 08:45:23 +00:00
George Hotz	f28df9900f	multidevice works (#763 ) * basic multigpu working * better multigpu test * upper * touchups * cl sync	2023-05-04 01:04:58 -07:00
George Hotz	4f6d674ec0	use CPU tests in pre-commit	2023-05-03 19:46:16 +00:00
George Hotz	ed33a89d52	no werror in archprobe	2023-05-03 19:34:17 +00:00
George Hotz	7ecf4dff68	multi cl_queue (#762 ) * multi cl_queue * only platforms 1 * gpus first, then cpus * put device on underlying buffer * cl_queue array	2023-05-03 12:15:28 -07:00
Rylan Justice	7757f5fed2	Fixed package description (#761 ) * Updated LICENSE year * Fixed package description	2023-05-03 10:21:05 -07:00
George Hotz	3b933b0a2f	rocm setup script	2023-05-03 16:01:17 +00:00
Rylan Justice	9628a3f190	Updated LICENSE year (#760 )	2023-05-01 15:35:23 -07:00
Joqsan	0b9d4126d0	Add Tensor.stack() and Tensor.repeat() (...trying to make einops work with tinygrad) (#758 ) * add stack() and repeat() methods * make stack a static method	2023-05-01 09:37:46 -07:00
George Hotz	59d0d168cd	FLOAT16 off works	2023-04-19 15:34:56 -07:00
George Hotz	3d15769a8f	50 TFLOPS cuda matmul	2023-04-19 14:38:24 -07:00
George Hotz	03b38864db	fix batchnorm at training (#753 ) * e2e testing * min failure * no affine on bn, still fails * why did i think i could detach that? * allow more kernels for bn * some test issue i don't understand	2023-04-19 08:01:04 -07:00
George Hotz	1aa0648d6a	fix path linter issue	2023-04-18 19:17:41 -07:00
George Hotz	cbe2564b7b	oops, no hip yet	2023-04-18 19:10:36 -07:00
George Hotz	e4db0c820f	hlb_cifar10 init from torch weights	2023-04-18 19:09:13 -07:00
George Hotz	a6b9733256	GB/s can be higher	2023-04-18 17:51:03 -07:00
George Hotz	9fb3f9ace3	Revert "move t.grad realize on SGD" This reverts commit `ccdc0290d6`.	2023-04-18 17:50:08 -07:00
George Hotz	e93e04ed6e	Revert "huh...this is faster" This reverts commit `aedd4685fa`.	2023-04-18 17:50:07 -07:00
George Hotz	aedd4685fa	huh...this is faster	2023-04-18 17:36:31 -07:00
George Hotz	dbc99c243b	why did that test break?	2023-04-18 17:08:38 -07:00
George Hotz	ccdc0290d6	move t.grad realize on SGD	2023-04-18 16:47:51 -07:00
George Hotz	8b7ecd63bb	Remove Zeroview (#748 ) * no zeroview start * closer * stride mask * st tests pass, delete ZeroView * byebye zv * close to working * not contiguous with mask * subtract, don't add * mask on view * ugh, that shouldn't have been in there * shape merge * bugfixes * fuzzer + 4 fuzzer failures * fuzzer for symbolic * more fuzzing and nothing * that fuzzer doesn't hit either * fixes padding...ugh * no more offsets * working * rewrite load and store * all checks * fix idxs * progress * bugfix * float4_axis * works * cleanups * complex valids_okay	2023-04-17 08:21:46 -07:00
Jan Henrik Høiland	4e17d27d09	Fix cuda errors when running llama example (#749 )	2023-04-16 13:52:10 -07:00
George Hotz	0b5a0b9ba4	winograd comment	2023-04-16 03:36:51 -07:00
George Hotz	8b777af571	metal_conv gets over 10.4 TFLOPS...	2023-04-15 03:31:22 -07:00
George Hotz	d66e682205	metal matmul from tcores branch	2023-04-14 23:29:29 -07:00
George Hotz	732884653c	osx in hlb_cifar10_torch	2023-04-14 13:12:08 -07:00

1 2 3 4 5 ...

1855 Commits