tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-14 00:25:10 -05:00

Author	SHA1	Message	Date
George Hotz	f9170505b3	if you like your transformers twice as slow, use the GPU	2020-12-29 17:14:23 -05:00
George Hotz	6a6a82e999	support multidot on GPU	2020-12-29 16:56:30 -05:00
George Hotz	27208d729b	add GPU max thanks to marcelbischoff	2020-12-29 16:44:14 -05:00
George Hotz	02655c07d5	break maxpool2d on GPU	2020-12-29 13:05:57 -05:00
George Hotz	061e37de39	touchups	2020-12-29 12:41:21 -05:00
George Hotz	a2e6562330	fix max op, less lines	2020-12-29 10:47:04 -05:00
Marcel Bischoff	dc8fa7999c	Transpose on GPU (#221 ) * 2serious * load/save * fixing GPU * added DEBUG * needs BatchNorm or doesn't learn anything * old file not needed * added conv biases * added extra/training.py and checkpoint * assert in test only * save * padding * num_classes * checkpoint * checkpoints for padding * training was broken * merge * rotation augmentation * more aug * needs testing * streamline augment, augment is fast thus bicubic * tidying up * transformer eval * axis=-1 * transpose * test for permutation using torch.movedims * another test * line	2020-12-29 10:40:11 -05:00
George Hotz	36579f66bf	max op	2020-12-28 23:54:52 -05:00
George Hotz	fafece9db7	avgpool2d is a second class op	2020-12-28 10:41:59 -05:00
George Hotz	593233b668	log and exp are first class ops	2020-12-28 10:00:30 -05:00
George Hotz	f15bec6dbc	make multidot work on CPU	2020-12-27 17:25:37 -05:00
George Hotz	131e04c90c	cpu only decorator	2020-12-27 17:18:55 -05:00
George Hotz	2f1b2c0a3b	add transpose, start on transformer	2020-12-27 16:59:12 -05:00
iainwo	56d44637f3	fixed pylint, formatted python files iwth cblack on localhost (#204 ) * fixed pylint, formatted python files iwth cblack on localhost * Revert "fixed pylint, formatted python files iwth cblack on localhost" This reverts commit `07e2b88466`. * dedented 4-spaces added linter Co-authored-by: Iain Wong <iainwong@outlook.com>	2020-12-17 14:37:31 -08:00
Liam	bcf1518309	All devices are equal! (#196 ) * Update all devices to be tested ANE, CPU and OCL all now support all tests. However tests are not currently passing on GPU and I cannot test on CPU. Failing GPU test are not an issue caused by this update. Tests have not been passing due to a missing "six" required installation. OpenCL Tests have not been run since commit: `1a1c63a08b` devices have 3 types and are handle by a new DeviceTypes enum. (The goal is to revert to Tensor.<type>, but this current setup allows for keyword argument defaults: `device=DeviceType.CPU`) All references to Tensor.GPU/CPU/ANE as been converted to the corresponding `DeviceTypes` enum. Refactor of the conversion code to allow for any device to any device conversion. * Add six dependency in requirements.txt * Resolve failure to run tests Move six into gpu required installs. Remove six from standard installation. * Remove repeated data conversion * Refactor method names Also reduce code with .to and .to_ * Dynamic device handlers * Refactor DeviceTypes -> Device * Add mem copy profiling back * test_backward_pass_diamond_model passing * Resolve Sum issue on GPU * Revert batchnorm2d tests * Update README with upadated API * ANE testing with * Last minute line gains	2020-12-15 23:44:08 -08:00
Marcel Bischoff	5d46df638a	abs as non-first class operation using relu (#171 ) * abs (non-first class) * whitespace	2020-12-09 12:20:34 -08:00
NeuralLink	00e376f36c	leaky relu as geohot suggested (#167 )	2020-12-09 02:58:35 -08:00
Liam	89d0ff6989	Consistent testing (#137 ) * Consistent GPU classes Convert the existing GPU classes into one standard format. Remove duplicated functions in `test_mnist` and create a TestMNISTGPU class. This reduces line count and ensures consistency. Use `@unittest.skipUnless(GPU, "Requires GPU")` instead of `if GPU:` to skip GPU testing. This will ensure that skipped tests are displayed accordingly in the pytest output. * Optim Testing now supports GPU * Tensor testing now supports GPU jacobian and gradcheck auto skipped until GPU float64 support added. * GPU support for custom constructor methods * Remove GPU flag from Model constructors It was requested that the `gpu` kwarg be removed from the model constructor. GPU conversion is now handled in the train function. This also required the conversion of Optimizer parameters as they are constructed prior to execution of the `train` function and are dependant on the model GPU state. * Fix typo: float32->float64 * Clean `get_parameters` utility Just a quick refactor w/ the new support for optimizers. * Remove GPU kwarg from TinyNet Remove `gpu` kwarg from tiny net to match test_mnist `train` function.	2020-12-09 02:25:27 -08:00
George Hotz	4e1a0de392	fix rsub	2020-12-08 10:05:21 -08:00
George Hotz	c4540f1b8c	Support scalars by kartik4949	2020-12-08 09:52:07 -08:00
George Hotz	b355cd2571	Mean axis (doesn't work) (#154 ) * mean axis * fixed	2020-12-07 22:58:34 -08:00
Marcel Bischoff	58ccebd7cd	Sum with axis (#153 ) * sum with axis and tests * broken * works again * clean up * Update test_ops.py	2020-12-07 21:49:18 -08:00
adamritter	f190ca446d	Detach (#123 ) * Detach * Torch.detach reuses the buffer in the * Fix test * wakey wakey GitHub Actions Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-19 19:03:42 -08:00
George Hotz	17bf90dbe4	unbroadcasting works on the GPU	2020-11-16 09:16:55 -08:00
George Hotz	17eab716b6	unbroadcast GPU template	2020-11-16 08:16:36 -08:00
adamritter	5ea3d76dfb	Topological sort, zero_grads (#119 ) * Topological sort, zero_grads * Bug fix, add test * Add zero_grads * Put deepwalk function in backward * Move zero_grad to optim * Fix gradcheck hack Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-15 20:25:29 -08:00
Marcel Bischoff	c7b7f8ccc8	Backwards ops supporting broadcasting (#118 ) * streamlined numerical_jacobian * Got rid of the g loop in Conv2D.forward * ereased stupid line * nothing * no loops in Conv2D forward * Conv2D backprop improved * stupid things in examples * alternative to einsum * Conv2D backward einsum alternative * tidying up * tidied up * no ravel * got rid of print * Update efficientnet.py * Update efficientnet.py * Update efficientnet.py * only tensordot * 255.0 * whitespace * aspect ratio error in efficientnet * noprint * efficient net wrong strides * broadcasting for backward ops * Update ops.py * Update ops.py - was wrong * broadcast test for backward enabled * function adBC + not summing over already 1 axis * spacing Co-authored-by: Marcel Bischoff <marcel@Marcels-iMac.local>	2020-11-15 15:21:10 -08:00
dustcollector12	28474949b8	refactoring of forward in reshape (#115 ) * refactoring of forward in reshape * test case for reshape added	2020-11-13 13:20:43 -08:00
pb1729	420af82888	General broadcasting of binary operations (#114 ) * allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array * remove extra tabs Co-authored-by: phillip <phillip_bement@reedbement.com>	2020-11-12 22:27:48 -08:00
adamritter	08aa60d9d0	broadcasting 1s at the start, 1 kernel/4 divs version (#110 ) * Pad2d backward pass on GPU * Faster Pad2D GPU backward pass (no zeroing needed) * Fix out of bounds error * Don't save prg * Let compiler optimize division by 1 * More generic broadcasting (1s at the start) * Bug fix * Add comment * Try to fix flaky test with other method * Add mixed broadcast support * 1kernel * Separate broadcast tests Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-12 13:33:35 -08:00
NeuralLink	f773ef3996	⚡ tanh non first class op (#111 ) * ⚡ tanh non first class op * tanh test with 1e-6 tol Co-authored-by: Kartik Sharma <kartik.sharma@claimgenius.com>	2020-11-12 13:32:50 -08:00
Ryan Neph	608bdd4872	adds broadcasting test cases (#106 ) refs: #80, #90, #104, #105	2020-11-12 07:08:28 -08:00
adamritter	f1d21afe88	Somewhat more generic broadcasting (#105 ) * Somewhat more generic broadcasting * Add TODO * Set Torch to deterministic in test Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>	2020-11-11 20:33:00 -08:00
Ryan Neph	8827a536e0	GPU MaxPool2D.backward(); TinyConvNet train passes (#103 ) * no trailing whitespace * GPU MaxPool2D.backward(); TinyConvNet train passes! * Fix GPU avgpool.forward() init_val Doesn’t change result but is simpler. * Fix MaxPool GPU init_val Tests only cover random non-negative inputs. This fixes issues if negative inputs are fed to GPU MaxPool2D. Test update to follow.	2020-11-11 07:58:43 -08:00
George Hotz	d1284fa817	stride tests and i32	2020-11-10 16:10:14 -08:00
Marcel Bischoff	7bb803c5e0	Conv2D backward on GPU (#93 ) * to make it work locally * definitely not working * Conv2D GPU passes some of the tests * Conv2D GPU passes more of the tests * passes some tests and mnist * removed unecessary code * Conv2D Backpass works * wrong test_ops.py * white space + test backward * ereased useless code * removed default argument * long lines	2020-11-10 16:07:33 -08:00
George Hotz	866b759d3b	match torch api for pad2d	2020-11-09 17:48:56 -08:00
Ryan Neph	16d564a53c	finish unsupporting strided pool, add global avg pool test (#92 )	2020-11-09 17:31:22 -08:00
George Hotz	870b84a893	test pad2d backward on GPU	2020-11-09 15:50:43 -08:00
George Hotz	e46d122f65	not supporting stride	2020-11-09 15:06:58 -08:00
Ryan Neph	c21c2a0b62	revert `b0c0c5d`: Strided Pool funcs (#74 ) (#87 ) Strided CPU Pooling was introduced but assumes small kernel size (<=(10,10)), but efficientnet.py feeds kernel_size=(112,112). This causes a huge array buffer allocation in stack_for_pool() that hangs inference for a long time or until system OOM. Revert CPU Pooling for now, and re-introduce #74 later with a new global-average-pooling op that can be used instead of avgpool2d with large kernel size for efficientnet inference. Co-authored-by: Ryan Neph <ryanneph@google.com>	2020-11-09 14:58:18 -08:00
Ryan Neph	7e515308a5	label op subtests by params (#83 )	2020-11-09 06:25:06 -08:00
Ryan Neph	5bedf566d1	tests should use rtol unless special case (#82 )	2020-11-08 17:25:11 -08:00
Ryan Neph	04b9312a34	Fix GPU Pooling bug at boundary + better Pooling test coverage (#81 ) * fixed Pooling bug * Clarify Pooling tests	2020-11-08 17:25:01 -08:00
Ryan Neph	b0c0c5d0d6	strided Pool funcs (#74 ) * Pool2D GPU forward supports stride kernel_size from ctx instead of saved_tensors * Pool2D CPU forward supports stride update ctx.stride properly	2020-11-08 11:45:55 -08:00
ziofil	db3eccc16b	implemented backward for Pad2D & test (#73 )	2020-11-07 21:58:42 -08:00
Ryan Neph	5265f6c578	add AvgPool2D backward pass on GPU (#68 )	2020-11-07 12:27:29 -08:00
George Hotz	30442a086a	some broadcasting, pool test is fail	2020-11-07 11:29:42 -08:00
George Hotz	94d44c97bf	add pad2d on GPU	2020-11-07 10:46:36 -08:00
George Hotz	fbff6ab2e5	fix strided convs, GPU env var for enet	2020-11-07 10:26:37 -08:00

... 6 7 8 9 10

469 Commits