tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-21 04:47:56 -05:00

Author	SHA1	Message	Date
George Hotz	b80cacb416	fix GPU efficientnet example	2021-05-26 17:29:35 -07:00
George Hotz	1ae0e88627	nvidia notes	2021-05-26 14:27:00 -07:00
20kdc	2653d33292	vgg7 (image upscaling) implementation - not the best, but it works (#255 ) * vgg7 implementation - not the best, but it works * VGG7 implementation: Spread nansbane to deter NaNs, maybe improved training experience * VGG7 implementation: Fix training, for real this time Results actually attempt to approximate the input * VGG7 implementation: Sample probability management	2021-05-12 23:48:51 -07:00
Skosh	81bf933a91	Improved __getitem__ (#254 ) * Some progress on yolov3 * Removed some debugging comments… Also, the forward pass eats all RAM for some reason * forward pass almost runs * forward pass runs almost * forward pass runs, now we gotta load the weights * loading weights works * fetches config and weights * everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done * some changes * fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly * Something is wrong with the forward pass, Conv2d tests added * forward pass almost outputs correct values, gotta fix one more thign * yolo works * some final changes * reverting changes * removed dataloader * fixed some indentation * comment out failing test, somehow it fails CI even though it passes on my computer… * fixed wrong probabilities * added webcam option to YOLO, now just need to add bounding boxes and speed it up * some progress towards adding bounding boxes * trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage * Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image * removed some debugging print statements * updated result image * something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds… * Improved __getitem__ * Updated * Updated __getitem__ * Linebreaks * Maybe this works? * Added MNIST locally, tests run now	2021-05-05 22:15:22 -07:00
Skosh	78aa147b39	[WIP] YOLO working on tinygrad! (#245 ) * Some progress on yolov3 * Removed some debugging comments… Also, the forward pass eats all RAM for some reason * forward pass almost runs * forward pass runs almost * forward pass runs, now we gotta load the weights * loading weights works * fetches config and weights * everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done * some changes * fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly * Something is wrong with the forward pass, Conv2d tests added * forward pass almost outputs correct values, gotta fix one more thign * yolo works * some final changes * reverting changes * removed dataloader * fixed some indentation * comment out failing test, somehow it fails CI even though it passes on my computer… * fixed wrong probabilities * added webcam option to YOLO, now just need to add bounding boxes and speed it up * some progress towards adding bounding boxes * trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage * Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image * removed some debugging print statements * updated result image * something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…	2021-04-25 18:06:52 -07:00
ziofil	155ec1f18e	saving 50 LOC with automatic @staticmethod for forward and backward (#252 ) * automatic @staticmethod for forward and backward * triggering unit tests	2021-04-25 18:04:16 -07:00
freedom" Koan-Sin Tan	f0cc2b66f8	add an aneccompile example in Objective-C (#240 ) * add an aneccompile example in Objective-C add a compile.m corresponding to compile.mm build with ```clang compile.m -F /System/Library/PrivateFrameworks/ -framework ANECompiler -framework Foundation``` CoreFoundation framework is a C library. Foundation is an Objective-C framework. CF data structures in CoreFoundation usually have corresponding NS data structures in Foundation, e.g., NSDictionary is "toll-free bridged" with its Core Foundation counterpart, CFDictionary. See [1]. [1] https://developer.apple.com/library/archive/documentation/General/Conceptual/CocoaEncyclopedia/Toll-FreeBridgin/Toll-FreeBridgin.html * figure out how to use param_3 of ANECCompile add a simple param_3 blocks callback, which dumps the status dictionary when status != 0	2021-01-31 08:31:16 -08:00
Göktuğ Karakaşlı	eabe0b9017	remove deepwalk args (#243 )	2021-01-31 08:30:17 -08:00
George Hotz	ce77dda805	yolov5 v4	2021-01-05 07:56:17 -08:00
George Hotz	62e3a8558c	fix tolerance maybe	2021-01-05 07:45:47 -08:00
Asim	1c148f2fe4	fixed example broken after gpu refactor (#238 )	2021-01-05 07:41:54 -08:00
George Hotz	8a38e0d207	only mish failed	2021-01-03 09:47:11 -08:00
George Hotz	a337f7780e	smarter way to write sign	2021-01-03 09:46:00 -08:00
George Hotz	1a4487965a	remove negative from things w/o negative	2021-01-03 09:43:34 -08:00
George Hotz	0531b848eb	second class sign	2021-01-03 09:33:12 -08:00
George Hotz	0702e0c763	nah, no sign, it's not what you want. use relu	2021-01-03 09:30:33 -08:00
George Hotz	29655609d5	fix GPU sign...these tests aren't very good	2021-01-03 09:00:49 -08:00
George Hotz	ea9c9af5d7	faster sign	2021-01-03 08:54:21 -08:00
George Hotz	c2eeb6950b	add support for sign. technically relu can be second class now	2021-01-03 08:29:57 -08:00
George Hotz	6842ad9ec8	minor cleanups, yolo work	2021-01-03 08:14:16 -08:00
NeuralLink	0825cf7f79	⚡ Added softplus and mish non stable (#220 ) * ⚡ Added softplus and mish CPU * 🔨 refactor * 🔨 second class softplus and mish * 🔨 test fix * no need of device in testing	2021-01-03 08:08:41 -08:00
George Hotz	ac229ea750	remove print	2021-01-02 12:53:30 -08:00
George Hotz	895d142503	start trying to load yolo v5	2021-01-02 12:51:55 -08:00
NeuralLink	ece07a3d12	🔨 refactor register ops (#233 ) * 🔨 refactor register ops * 🔨 reorder and register for ANE * 🔨 refactor * 🔨 conflicts * 🔨 minor fix * ane fix * extra reshape weird	2021-01-02 07:47:16 -08:00
Marcel Bischoff	42b4761025	transformer >99.98% test accuracy in ~30s (#230 ) * transformer * BS might divide len(Y_test) * outoput when accuracy is high * more readeable * fixed loss in serious_mnist for new API	2021-01-02 07:45:09 -08:00
Liam	ebd72ff437	Test split (#231 ) * Split tests Split tests into "Test CPU" and "Test GPU". Add test flag "TEST_DEVICES" which is a comma separated list of devices: CPU,GPU,ANE * Run tests based on provided TEST_DEVICES flag By default will run all "CPU,GPU,ANE" * fix bad quote * Revert changes and use GPU=1 This is done through setting the default Tensor Device to Device.CPU of GPU=1 is set. Run GPU tests: GPU=1 pytest -s -v	2021-01-01 09:19:03 -05:00
George Hotz	4a7cf2e420	more reordering	2020-12-31 09:58:02 -05:00
George Hotz	92abe43683	reduce before binary because of unbroadcasting	2020-12-31 09:49:52 -05:00
George Hotz	4291002881	reorder GPU ops	2020-12-31 09:46:39 -05:00
George Hotz	de7fe085de	no read out of bounds	2020-12-31 09:41:36 -05:00
George Hotz	1fb5fcafce	GPU slice should fix tests	2020-12-31 09:37:03 -05:00
Liam	e972a45456	Dynamically register ops to Tensor (#232 ) * Dynamically register ops to Tensor This saves lines. And reduces redundant repetition. * ffs spacing you don't pay me enough!	2020-12-31 09:10:19 -05:00
Marcel Bischoff	e2f833f58f	max to behave on ties like torch (#229 ) * checkpoint * fixing pow * undo pow * backward max on GPU and CPU rewrite * indentation * changing seed for curiosity * max replaced equality * undo seed * rebase * fixed tests * merge error	2020-12-30 18:52:50 -05:00
George Hotz	30f8132646	reorder ops in ops cpu	2020-12-30 11:00:01 -05:00
George Hotz	e5b2803b5d	ops in readme	2020-12-30 10:48:55 -05:00
George Hotz	2d44bf7f1a	Dot -> Matmul	2020-12-30 10:41:51 -05:00
George Hotz	10fc3ff5b9	cleaner syntax	2020-12-30 10:35:37 -05:00
George Hotz	fcfe3dae01	write slice for CPU	2020-12-30 10:32:53 -05:00
George Hotz	47504004fd	ane ops	2020-12-29 18:00:53 -05:00
George Hotz	1f5c9618ef	refactor in readme and issue #225	2020-12-29 17:30:04 -05:00
George Hotz	f9170505b3	if you like your transformers twice as slow, use the GPU	2020-12-29 17:14:23 -05:00
George Hotz	6a6a82e999	support multidot on GPU	2020-12-29 16:56:30 -05:00
George Hotz	27208d729b	add GPU max thanks to marcelbischoff	2020-12-29 16:44:14 -05:00
George Hotz	4bbad11afe	link to papers	2020-12-29 14:15:46 -05:00
George Hotz	3f8e137b6f	extra/transformer	2020-12-29 14:14:00 -05:00
George Hotz	c4e7a1ae59	accessors are dumb	2020-12-29 14:10:26 -05:00
George Hotz	fb6aaefb9b	save 2 lines	2020-12-29 14:02:50 -05:00
George Hotz	ea341c84fe	logsoftmax good, div bad	2020-12-29 13:59:39 -05:00
George Hotz	f18801c7db	simple pool. swimming is very easy now	2020-12-29 13:48:50 -05:00
George Hotz	8f9232d59b	readmee	2020-12-29 13:40:34 -05:00

... 197 198 199 200 201 ...

10417 Commits