tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 05:18:01 -05:00

Author	SHA1	Message	Date
George Hotz	58ed46963e	fix broadcastdot	2021-11-29 18:54:57 -05:00
George Hotz	dca076dbf1	remove dumb nn ops	2021-11-29 18:05:31 -05:00
George Hotz	f909ab194f	gelu with broken test	2021-11-29 15:00:50 -05:00
George Hotz	c752033283	fix GPU OOM in test	2021-11-29 13:05:59 -05:00
George Hotz	99b6051467	add ff_dim to transformer	2021-11-29 12:40:52 -05:00
George Hotz	29dee59368	cat: forward only not required	2021-11-29 00:14:56 -05:00
George Hotz	3cdc77f526	add cat support	2021-11-28 23:21:49 -05:00
George Hotz	ce3d198bb7	less lines and fix default device	2021-11-27 11:18:49 -05:00
George Hotz	7ae14179d3	refactor ops	2021-11-27 11:12:23 -05:00
George Hotz	c162e748f5	fix float64 warning on training	2021-10-30 20:07:31 -07:00
George Hotz	b0f14b4af8	move datasets into datasets	2021-10-30 19:55:50 -07:00
George Hotz	7472a7ebe2	not forcing 3.9 for a stupid type	2021-10-30 16:52:40 -07:00
George Hotz	fc6597a6d9	only resnet18, it's too slow otherwise	2021-10-30 16:48:39 -07:00
Evan Mays	285621aeda	Cherry backprop for conv2d (#281 ) * quick math: 0 + x = x. * gradient w.r.t. x using cherry for conv * gradient w.r.t. w for conv on cherry but doing vector dot products * small optimization * [cherry] optimize conv backpass for large channel count * get rid of numpy einsum	2021-10-30 16:12:19 -07:00
Sebastian Kreft	8113eec4cf	feat: add efficientnet test (#285 ) Simple test using the Chicken example from https://upload.wikimedia.org/wikipedia/commons/4/41/Chicken.jpg and the image preprocessing from example/efficientnet.py Note that EfficientNet loads the weights from the internet so running the tests may be slow the first time. We could speed up the tests by caching the /tmp folder. Fixes #234	2021-10-30 15:53:51 -07:00
Guglielmo Camporese	2b7589db64	Added ResNet-{18, 34, 50, 101, 152} (#271 ) * added resnets * fix minor * fix minor * resnet in models * added resnet test * added resnet train test * added linear, conv2d nn tests * fix minor in extra/training * resnet in models * fix minor * fix tolerance for linear in nn test * fix eval, this causes cpu and gpu UT failing * revert transformer test * fix minor for CPU test * improved model get_params for sequential layer * fix minor for params counting * commented broken ops tests * improved train for resnet	2021-06-21 09:37:24 -07:00
George Hotz	89798d2f43	some flags	2021-06-19 11:46:31 -07:00
George Hotz	d3f169b267	move good models to models, add a training step test	2021-06-19 11:24:15 -07:00
Jacky Lee	3a91d5434f	Add dropout test (#265 ) * Add dropout test * Remove condition where training is false * Skip dropout test when on GPU * Revert changes to tensor.py and fix test case * Revert change on whitespace * Convert Tensor to cpu for testing * Fix whitespace in tensor.py	2021-06-19 08:49:13 -07:00
George Hotz	2affd226b3	speed up sum	2021-06-17 16:38:34 -07:00
George Hotz	c1d469d440	sum op	2021-06-17 16:19:35 -07:00
George Hotz	2075fdeb4f	FPGA Based Accelerator for Tinygrad (#258 ) * ops_risk * risk sim * guessing is for winners * minor * better * matmal with risk * conv doesn't work * closer * conv2d works * ops_risk * opt2 works * opt1 may not be possible * opt1 is a mulacc * arty * attosoc example building on mac * minor * riscv assembler * gucci gang * we got C code * not a scam * hello * make risk mergeable into master * unop support	2021-06-07 17:45:09 -07:00
Skosh	81bf933a91	Improved __getitem__ (#254 ) * Some progress on yolov3 * Removed some debugging comments… Also, the forward pass eats all RAM for some reason * forward pass almost runs * forward pass runs almost * forward pass runs, now we gotta load the weights * loading weights works * fetches config and weights * everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done * some changes * fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly * Something is wrong with the forward pass, Conv2d tests added * forward pass almost outputs correct values, gotta fix one more thign * yolo works * some final changes * reverting changes * removed dataloader * fixed some indentation * comment out failing test, somehow it fails CI even though it passes on my computer… * fixed wrong probabilities * added webcam option to YOLO, now just need to add bounding boxes and speed it up * some progress towards adding bounding boxes * trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage * Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image * removed some debugging print statements * updated result image * something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds… * Improved __getitem__ * Updated * Updated __getitem__ * Linebreaks * Maybe this works? * Added MNIST locally, tests run now	2021-05-05 22:15:22 -07:00
Skosh	78aa147b39	[WIP] YOLO working on tinygrad! (#245 ) * Some progress on yolov3 * Removed some debugging comments… Also, the forward pass eats all RAM for some reason * forward pass almost runs * forward pass runs almost * forward pass runs, now we gotta load the weights * loading weights works * fetches config and weights * everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done * some changes * fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly * Something is wrong with the forward pass, Conv2d tests added * forward pass almost outputs correct values, gotta fix one more thign * yolo works * some final changes * reverting changes * removed dataloader * fixed some indentation * comment out failing test, somehow it fails CI even though it passes on my computer… * fixed wrong probabilities * added webcam option to YOLO, now just need to add bounding boxes and speed it up * some progress towards adding bounding boxes * trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage * Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image * removed some debugging print statements * updated result image * something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…	2021-04-25 18:06:52 -07:00
George Hotz	62e3a8558c	fix tolerance maybe	2021-01-05 07:45:47 -08:00
George Hotz	8a38e0d207	only mish failed	2021-01-03 09:47:11 -08:00
George Hotz	1a4487965a	remove negative from things w/o negative	2021-01-03 09:43:34 -08:00
George Hotz	0702e0c763	nah, no sign, it's not what you want. use relu	2021-01-03 09:30:33 -08:00
George Hotz	c2eeb6950b	add support for sign. technically relu can be second class now	2021-01-03 08:29:57 -08:00
NeuralLink	0825cf7f79	⚡ Added softplus and mish non stable (#220 ) * ⚡ Added softplus and mish CPU * 🔨 refactor * 🔨 second class softplus and mish * 🔨 test fix * no need of device in testing	2021-01-03 08:08:41 -08:00
Liam	ebd72ff437	Test split (#231 ) * Split tests Split tests into "Test CPU" and "Test GPU". Add test flag "TEST_DEVICES" which is a comma separated list of devices: CPU,GPU,ANE * Run tests based on provided TEST_DEVICES flag By default will run all "CPU,GPU,ANE" * fix bad quote * Revert changes and use GPU=1 This is done through setting the default Tensor Device to Device.CPU of GPU=1 is set. Run GPU tests: GPU=1 pytest -s -v	2021-01-01 09:19:03 -05:00
George Hotz	4291002881	reorder GPU ops	2020-12-31 09:46:39 -05:00
Marcel Bischoff	e2f833f58f	max to behave on ties like torch (#229 ) * checkpoint * fixing pow * undo pow * backward max on GPU and CPU rewrite * indentation * changing seed for curiosity * max replaced equality * undo seed * rebase * fixed tests * merge error	2020-12-30 18:52:50 -05:00
George Hotz	fcfe3dae01	write slice for CPU	2020-12-30 10:32:53 -05:00
George Hotz	f9170505b3	if you like your transformers twice as slow, use the GPU	2020-12-29 17:14:23 -05:00
George Hotz	6a6a82e999	support multidot on GPU	2020-12-29 16:56:30 -05:00
George Hotz	27208d729b	add GPU max thanks to marcelbischoff	2020-12-29 16:44:14 -05:00
George Hotz	02655c07d5	break maxpool2d on GPU	2020-12-29 13:05:57 -05:00
George Hotz	061e37de39	touchups	2020-12-29 12:41:21 -05:00
George Hotz	a2e6562330	fix max op, less lines	2020-12-29 10:47:04 -05:00
Marcel Bischoff	dc8fa7999c	Transpose on GPU (#221 ) * 2serious * load/save * fixing GPU * added DEBUG * needs BatchNorm or doesn't learn anything * old file not needed * added conv biases * added extra/training.py and checkpoint * assert in test only * save * padding * num_classes * checkpoint * checkpoints for padding * training was broken * merge * rotation augmentation * more aug * needs testing * streamline augment, augment is fast thus bicubic * tidying up * transformer eval * axis=-1 * transpose * test for permutation using torch.movedims * another test * line	2020-12-29 10:40:11 -05:00
George Hotz	36579f66bf	max op	2020-12-28 23:54:52 -05:00
George Hotz	fafece9db7	avgpool2d is a second class op	2020-12-28 10:41:59 -05:00
George Hotz	593233b668	log and exp are first class ops	2020-12-28 10:00:30 -05:00
George Hotz	a361ef6861	fixup training loop	2020-12-27 18:35:56 -05:00
George Hotz	f15bec6dbc	make multidot work on CPU	2020-12-27 17:25:37 -05:00
George Hotz	131e04c90c	cpu only decorator	2020-12-27 17:18:55 -05:00
George Hotz	2f1b2c0a3b	add transpose, start on transformer	2020-12-27 16:59:12 -05:00
iainwo	56d44637f3	fixed pylint, formatted python files iwth cblack on localhost (#204 ) * fixed pylint, formatted python files iwth cblack on localhost * Revert "fixed pylint, formatted python files iwth cblack on localhost" This reverts commit `07e2b88466`. * dedented 4-spaces added linter Co-authored-by: Iain Wong <iainwong@outlook.com>	2020-12-17 14:37:31 -08:00
Liam	bcf1518309	All devices are equal! (#196 ) * Update all devices to be tested ANE, CPU and OCL all now support all tests. However tests are not currently passing on GPU and I cannot test on CPU. Failing GPU test are not an issue caused by this update. Tests have not been passing due to a missing "six" required installation. OpenCL Tests have not been run since commit: `1a1c63a08b` devices have 3 types and are handle by a new DeviceTypes enum. (The goal is to revert to Tensor.<type>, but this current setup allows for keyword argument defaults: `device=DeviceType.CPU`) All references to Tensor.GPU/CPU/ANE as been converted to the corresponding `DeviceTypes` enum. Refactor of the conversion code to allow for any device to any device conversion. * Add six dependency in requirements.txt * Resolve failure to run tests Move six into gpu required installs. Remove six from standard installation. * Remove repeated data conversion * Refactor method names Also reduce code with .to and .to_ * Dynamic device handlers * Refactor DeviceTypes -> Device * Add mem copy profiling back * test_backward_pass_diamond_model passing * Resolve Sum issue on GPU * Revert batchnorm2d tests * Update README with upadated API * ANE testing with * Last minute line gains	2020-12-15 23:44:08 -08:00

... 84 85 86 87 88 ...

4433 Commits