tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-08 21:55:14 -05:00

Author	SHA1	Message	Date
George Hotz	f7f416d6f4	back to 6 for test_fold_conv_sgd	2023-04-14 07:34:00 -07:00
George Hotz	133521e730	relu UnaryOp is back	2023-04-14 07:12:53 -07:00
George Hotz	584ee6f616	don't graph consts	2023-04-14 03:32:20 -07:00
George Hotz	9a39ebefde	hlb_cifar10_torch gets 80%	2023-04-14 02:47:03 -07:00
worldwalker2000	552a048a33	make maximum split the grad like torch when equal (#738 ) * make maximum split grad * added test for maximum split grad when equal * minor expr simplification * (2-eq)/2 only once * update test bc one more sum output child stays	2023-04-14 00:17:46 -07:00
Jacky Lee	06ed958abd	Fix train_resnet example (#744 ) * Fix ResNet example * Scientific notation	2023-04-12 13:48:39 +05:30
Sohaib	70b9072663	add Pad onnx operator and rework _padding (#740 )	2023-04-06 17:07:36 +05:30
jintuzhang	8e40ff8c8d	Do not specify errors when trying to load devices. (#741 )	2023-04-06 17:05:36 +05:30
Jacky Lee	7a45b989a1	Device: make GPU default and METAL/CUDA if possible (#732 ) * Make GPU the default device * Compile EfficientNet with CPU * don't print device * use METAL and CUDA if possible * Revert some changes to workflow * Fix import error when checking device availability * device lookup is now optional * hopefully fix linter and tests * fix workflow * Skip device if not available * don't change default if CPU=1 * simplify device selection * Default to CPU if no GPU * don't print device name... * No need to change default in llama * Make GPU the default device * Compile EfficientNet with CPU * don't print device * use METAL and CUDA if possible * Revert some changes to workflow * Fix import error when checking device availability * device lookup is now optional * hopefully fix linter and tests * fix workflow * Skip device if not available * don't change default if CPU=1 * simplify device selection * Default to CPU if no GPU * don't print device name... * No need to change default in llama * run github workflow * Fix logic to select default * pass if an error occurs * use separate function for try except	2023-04-04 09:41:52 +05:30
George Hotz	94e2c49c35	test_cacheline_size that works in both places	2023-03-30 06:47:20 +04:00
George Hotz	b05c2828f7	better cacheline test	2023-03-30 06:08:54 +04:00
George Hotz	76db1af6fc	better archprobe	2023-03-30 05:52:00 +04:00
George Hotz	1240c12ac5	download cifar to datasets dir	2023-03-29 12:25:42 +04:00
Jacky Lee	e5f430d8c6	Device: move below LazyBuffer (#733 )	2023-03-29 10:35:11 +04:00
George Hotz	b99798f08e	acc function not needed	2023-03-29 08:03:46 +04:00
George Hotz	20894991ed	good changes from the M1 Tensor Core project (#730 ) * good changes * working except llvm * llvm types * nice acc * archprobe * lang.float4 * use self.acc for late acc * fix store bug	2023-03-29 05:11:02 +04:00
Jacky Lee	156640e90d	Permute examples (#731 ) * examples: use permute instead of transpose * Use transpose but change args	2023-03-29 05:07:06 +04:00
Andre Slavescu	39d6e1525f	Added activation ops + tests (#729 ) * activation ops * type hints + more testing * formatting correction + parameter testing * fixes to shape testing * hardtanh to use clip + removed type hints * assign val fix	2023-03-28 13:17:53 +04:00
George Hotz	fa5516dda0	fix lint, installed pre-commit on now computer	2023-03-24 11:15:59 -07:00
George Hotz	ebc4ad6223	color the jit nicer	2023-03-24 10:54:20 -07:00
George Hotz	23f88fb026	synchronize for honest speed compare	2023-03-24 10:24:27 -07:00
George Hotz	1cb5b2d015	test_enet_se	2023-03-24 10:04:30 -07:00
Jacky Lee	fafe8e9ce2	casting: support all backends and implement half (#726 ) * casting: support all backends and implement half * map torch types in ops_torch * reuse type map for torch buffer * inverse dict lookup	2023-03-24 09:58:03 -07:00
George Hotz	e88b9bfe1e	print gflops avg with DEBUG=2	2023-03-23 16:07:08 -07:00
George Hotz	de04208247	hotcast bug fix	2023-03-23 11:49:47 -07:00
Jacky Lee	e009b6f341	Add tests for casting (#724 ) * Add tests for casting * Skip half_matmul_upcast when TORCH=1 * Fix promotion on torch * Fix spacing	2023-03-23 08:02:52 -07:00
George Hotz	68e45fca18	metal_matmul: bw and torch sync	2023-03-23 08:02:04 -07:00
George Hotz	bd6c3c31a9	compare to torch	2023-03-22 23:58:37 -07:00
George Hotz	c3a3db75c7	fix metal matmul example	2023-03-22 23:42:51 -07:00
George Hotz	f5aea472a3	latest torch and onnx should be fine	2023-03-22 23:33:50 -07:00
George Hotz	51e19ac25c	OPTLOCAL=2 makes stable diffusion a usable speed after the cache builds	2023-03-22 19:19:11 -07:00
George Hotz	2e18469fd4	clean up display name	2023-03-22 18:32:05 -07:00
George Hotz	b12b60af20	fix binop, other tests failure (#723 ) * fix binop, other tests failure * that was a bad idea * better layernorm * inference kernel count tests * new style reshape pushing * fixup replacement * 199 kernels is okay. fix flops * push reshape through unaryops only * GRAPH=2 draws the phantom ops * found resnet issue * non working test * mul is cheaper than div * OPT inflation * SHUFFLE_PAD_OPS in OPT=2	2023-03-22 18:15:07 -07:00
George Hotz	d6f4219952	LayerNorm2d for 2 lines	2023-03-20 16:58:43 -07:00
George Hotz	128ca160ac	lazy: remove required device	2023-03-20 16:31:45 -07:00
George Hotz	120d7072bd	indexing merge almost works	2023-03-20 16:17:07 -07:00
George Hotz	06abbbfe7c	remove the stupid register class (#721 ) * remove the stupid register class * touchups * colorful display name	2023-03-20 15:45:12 -07:00
George Hotz	30b795874a	remove RMSprop, nobody uses it anymore	2023-03-20 12:31:34 -07:00
George Hotz	25287a974e	types (#720 ) * types * cleanups * don't use None, use LocalBuffer * eh	2023-03-20 12:31:02 -07:00
George Hotz	9b314c6342	factor uops transformers into functions	2023-03-20 08:19:48 -07:00
George Hotz	623fb1ef28	do test_conv_with_bn test	2023-03-19 23:53:56 -07:00
George Hotz	5495c7d64e	linearizer! (#714 ) * linearizer outputs something * working ish * cstyle codegen * clang mostly works * fix load valid * fix numberless loop * fancy gen * working * fix enet compiler * cleanups * float4 upcasting * less lines * supports_float4 * constant folding * mulacc * internet tests flaky in CI * 90% image support * fix image generic * bugs exposed with shapetracker and single view * new llvm * use vload, remove OLD * that's really poorly done * ending up being more lines	2023-03-19 23:43:49 -07:00
Cyril Roumégous	b629fd4cd8	add AdamW optimizer (#716 ) * add AdamW optimizer * one liner Adam optimizer	2023-03-19 12:51:06 -07:00
George Hotz	1012b68f7e	finally, some speedups	2023-03-18 18:17:33 -07:00
George Hotz	902906f909	Fix constant folding (#713 ) * fix * codegen * contiguous is real * no bufs_to_delete * don't assign rawconst * remove neg and not * need exec to fix custom function jit	2023-03-18 17:52:46 -07:00
Fernando Vidal	73bd0b217b	add int64 as supported dtype from numpy (#699 ) * add int64 as supported dtype from numpy Without this, examples/transformer.py didn't run. With this change it runs successfully. * Update helpers.py * Update transformer.py * Update training.py	2023-03-18 17:15:04 -07:00
George Hotz	f355b02987	remove comments and reorder	2023-03-18 14:48:39 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
Kirill	26a3888ab8	Fix llama 13B RAM usage (#710 )	2023-03-18 13:50:09 -07:00
Kirill	0fe5014b1f	Use pathlib (#711 ) * Use pathlib in llama * Use pathlib in stablediffusion	2023-03-18 13:49:21 -07:00

1 2 3 4 5 ...

1804 Commits