tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-08 13:45:50 -05:00

Author	SHA1	Message	Date
George Hotz	7ecf4dff68	multi cl_queue (#762 ) * multi cl_queue * only platforms 1 * gpus first, then cpus * put device on underlying buffer * cl_queue array	2023-05-03 12:15:28 -07:00
Rylan Justice	7757f5fed2	Fixed package description (#761 ) * Updated LICENSE year * Fixed package description	2023-05-03 10:21:05 -07:00
George Hotz	3b933b0a2f	rocm setup script	2023-05-03 16:01:17 +00:00
Rylan Justice	9628a3f190	Updated LICENSE year (#760 )	2023-05-01 15:35:23 -07:00
Joqsan	0b9d4126d0	Add Tensor.stack() and Tensor.repeat() (...trying to make einops work with tinygrad) (#758 ) * add stack() and repeat() methods * make stack a static method	2023-05-01 09:37:46 -07:00
George Hotz	59d0d168cd	FLOAT16 off works	2023-04-19 15:34:56 -07:00
George Hotz	3d15769a8f	50 TFLOPS cuda matmul	2023-04-19 14:38:24 -07:00
George Hotz	03b38864db	fix batchnorm at training (#753 ) * e2e testing * min failure * no affine on bn, still fails * why did i think i could detach that? * allow more kernels for bn * some test issue i don't understand	2023-04-19 08:01:04 -07:00
George Hotz	1aa0648d6a	fix path linter issue	2023-04-18 19:17:41 -07:00
George Hotz	cbe2564b7b	oops, no hip yet	2023-04-18 19:10:36 -07:00
George Hotz	e4db0c820f	hlb_cifar10 init from torch weights	2023-04-18 19:09:13 -07:00
George Hotz	a6b9733256	GB/s can be higher	2023-04-18 17:51:03 -07:00
George Hotz	9fb3f9ace3	Revert "move t.grad realize on SGD" This reverts commit `ccdc0290d6`.	2023-04-18 17:50:08 -07:00
George Hotz	e93e04ed6e	Revert "huh...this is faster" This reverts commit `aedd4685fa`.	2023-04-18 17:50:07 -07:00
George Hotz	aedd4685fa	huh...this is faster	2023-04-18 17:36:31 -07:00
George Hotz	dbc99c243b	why did that test break?	2023-04-18 17:08:38 -07:00
George Hotz	ccdc0290d6	move t.grad realize on SGD	2023-04-18 16:47:51 -07:00
George Hotz	8b7ecd63bb	Remove Zeroview (#748 ) * no zeroview start * closer * stride mask * st tests pass, delete ZeroView * byebye zv * close to working * not contiguous with mask * subtract, don't add * mask on view * ugh, that shouldn't have been in there * shape merge * bugfixes * fuzzer + 4 fuzzer failures * fuzzer for symbolic * more fuzzing and nothing * that fuzzer doesn't hit either * fixes padding...ugh * no more offsets * working * rewrite load and store * all checks * fix idxs * progress * bugfix * float4_axis * works * cleanups * complex valids_okay	2023-04-17 08:21:46 -07:00
Jan Henrik Høiland	4e17d27d09	Fix cuda errors when running llama example (#749 )	2023-04-16 13:52:10 -07:00
George Hotz	0b5a0b9ba4	winograd comment	2023-04-16 03:36:51 -07:00
George Hotz	8b777af571	metal_conv gets over 10.4 TFLOPS...	2023-04-15 03:31:22 -07:00
George Hotz	d66e682205	metal matmul from tcores branch	2023-04-14 23:29:29 -07:00
George Hotz	732884653c	osx in hlb_cifar10_torch	2023-04-14 13:12:08 -07:00
George Hotz	17e37157b6	fix backward convs (#746 ) * fix backward convs * no pushing in reduce * late cout * test_fold_4convs_sgd	2023-04-14 10:42:11 -07:00
George Hotz	f7f416d6f4	back to 6 for test_fold_conv_sgd	2023-04-14 07:34:00 -07:00
George Hotz	133521e730	relu UnaryOp is back	2023-04-14 07:12:53 -07:00
George Hotz	584ee6f616	don't graph consts	2023-04-14 03:32:20 -07:00
George Hotz	9a39ebefde	hlb_cifar10_torch gets 80%	2023-04-14 02:47:03 -07:00
worldwalker2000	552a048a33	make maximum split the grad like torch when equal (#738 ) * make maximum split grad * added test for maximum split grad when equal * minor expr simplification * (2-eq)/2 only once * update test bc one more sum output child stays	2023-04-14 00:17:46 -07:00
Jacky Lee	06ed958abd	Fix train_resnet example (#744 ) * Fix ResNet example * Scientific notation	2023-04-12 13:48:39 +05:30
Sohaib	70b9072663	add Pad onnx operator and rework _padding (#740 )	2023-04-06 17:07:36 +05:30
jintuzhang	8e40ff8c8d	Do not specify errors when trying to load devices. (#741 )	2023-04-06 17:05:36 +05:30
Jacky Lee	7a45b989a1	Device: make GPU default and METAL/CUDA if possible (#732 ) * Make GPU the default device * Compile EfficientNet with CPU * don't print device * use METAL and CUDA if possible * Revert some changes to workflow * Fix import error when checking device availability * device lookup is now optional * hopefully fix linter and tests * fix workflow * Skip device if not available * don't change default if CPU=1 * simplify device selection * Default to CPU if no GPU * don't print device name... * No need to change default in llama * Make GPU the default device * Compile EfficientNet with CPU * don't print device * use METAL and CUDA if possible * Revert some changes to workflow * Fix import error when checking device availability * device lookup is now optional * hopefully fix linter and tests * fix workflow * Skip device if not available * don't change default if CPU=1 * simplify device selection * Default to CPU if no GPU * don't print device name... * No need to change default in llama * run github workflow * Fix logic to select default * pass if an error occurs * use separate function for try except	2023-04-04 09:41:52 +05:30
George Hotz	94e2c49c35	test_cacheline_size that works in both places	2023-03-30 06:47:20 +04:00
George Hotz	b05c2828f7	better cacheline test	2023-03-30 06:08:54 +04:00
George Hotz	76db1af6fc	better archprobe	2023-03-30 05:52:00 +04:00
George Hotz	1240c12ac5	download cifar to datasets dir	2023-03-29 12:25:42 +04:00
Jacky Lee	e5f430d8c6	Device: move below LazyBuffer (#733 )	2023-03-29 10:35:11 +04:00
George Hotz	b99798f08e	acc function not needed	2023-03-29 08:03:46 +04:00
George Hotz	20894991ed	good changes from the M1 Tensor Core project (#730 ) * good changes * working except llvm * llvm types * nice acc * archprobe * lang.float4 * use self.acc for late acc * fix store bug	2023-03-29 05:11:02 +04:00
Jacky Lee	156640e90d	Permute examples (#731 ) * examples: use permute instead of transpose * Use transpose but change args	2023-03-29 05:07:06 +04:00
Andre Slavescu	39d6e1525f	Added activation ops + tests (#729 ) * activation ops * type hints + more testing * formatting correction + parameter testing * fixes to shape testing * hardtanh to use clip + removed type hints * assign val fix	2023-03-28 13:17:53 +04:00
George Hotz	fa5516dda0	fix lint, installed pre-commit on now computer	2023-03-24 11:15:59 -07:00
George Hotz	ebc4ad6223	color the jit nicer	2023-03-24 10:54:20 -07:00
George Hotz	23f88fb026	synchronize for honest speed compare	2023-03-24 10:24:27 -07:00
George Hotz	1cb5b2d015	test_enet_se	2023-03-24 10:04:30 -07:00
Jacky Lee	fafe8e9ce2	casting: support all backends and implement half (#726 ) * casting: support all backends and implement half * map torch types in ops_torch * reuse type map for torch buffer * inverse dict lookup	2023-03-24 09:58:03 -07:00
George Hotz	e88b9bfe1e	print gflops avg with DEBUG=2	2023-03-23 16:07:08 -07:00
George Hotz	de04208247	hotcast bug fix	2023-03-23 11:49:47 -07:00
Jacky Lee	e009b6f341	Add tests for casting (#724 ) * Add tests for casting * Skip half_matmul_upcast when TORCH=1 * Fix promotion on torch * Fix spacing	2023-03-23 08:02:52 -07:00

1 2 3 4 5 ...

1828 Commits