tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 23:18:04 -05:00

Author	SHA1	Message	Date
George Hotz	bc5df477de	readme and .ane()	2020-12-12 16:15:38 -08:00
George Hotz	da873cd556	Single ReLU in ANE (#188 ) * aneworks * cleanup	2020-12-12 16:11:34 -08:00
George Hotz	07ece2105e	actually move it	2020-12-12 15:26:58 -08:00
George Hotz	1d10559d1d	tinygrad.utils -> extra.utils	2020-12-12 15:26:07 -08:00
George Hotz	59358304a3	ane	2020-12-12 15:23:21 -08:00
George Hotz	36d4eee323	fix compiler segfault	2020-12-12 15:10:47 -08:00
George Hotz	abb7b74208	relu in python	2020-12-12 14:50:05 -08:00
George Hotz	d3886035dd	ane dylib	2020-12-12 13:41:09 -08:00
George Hotz	cf66d549c1	fix example ane	2020-12-12 13:32:49 -08:00
George Hotz	566045cefc	uint8 nope	2020-12-12 13:14:06 -08:00
pb1729	8c25431619	Faster but still general binop broadcasting (#159 ) * allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array * remove extra tabs * messy loop unrolling * fix loop unrolling bugs * revert loop unrolling changes, new plan here * binary_op(): avoid having a loop in the GPU C code, instead compute indices with nested expressions. simple broadcasts should have a similar level of performance to the simple-broadcast-specific code that was there before. broke out codegen and compilation into get_binop_prg(), which has a larger cache and depends only on the operation type and complist (this avoids doing a bunch of python string ops every time we want to compile something we've already compiled). the larger cache is needed since there will end up being quite a few possible types of broacasts (sum_i^N 3*i is a loose upper bound, N being the maximum number of dimensions). I assumed 5 kinds of binary operations when sizing the cache here, +, -, , /, and *. More may be needed in the future. add .cl to binop arguments * solved edge case where len(dimlist)==0. still problems when len(dimlist) > CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS * pyopencl can't handle more than 3 gids, so we just use 1 gid and compute the indices into the returned tensor in the kernel. this means more computation for the individual indices, but less for the index into the flattened tensor (last line of kernel), since it's just gid0 * trim some lines Co-authored-by: phillip <phillip_bement@reedbement.com>	2020-12-12 12:19:46 -08:00
Liam	bf9ba8718a	Profile GPU and CPU copying. (#182 ) Moving memory is slow, and therefor monitoring the time spent converting and limiting the number of copy operations can improve performance.	2020-12-12 12:15:47 -08:00
James Roberts	8e8cbc74b3	Minor clean up (#184 ) * Removes unused imports * Minor clean up	2020-12-11 14:25:29 -08:00
Skosh	f4faf401bc	require_init_gpu() function selects GPU as device and falls back to CPU if none are available (#180 ) * require_init_gpu() function selects GPU as device and falls back to CPU if none are available * Small fix for CPU specific code * Should work...	2020-12-11 09:21:59 -08:00
Daulet	c7e95ddb21	Add diamond model test (#181 ) * add backward pass test for diamond model * fix train_efficientnet example	2020-12-11 09:21:36 -08:00
Marcel Bischoff	38b29f49dd	abs (#172 )	2020-12-10 09:24:35 -08:00
Liam	e79cda6dad	Add pyopencl to dependency installs (#174 ) * Add pyopencl to dependency installs OpenCL was not actually being tested as pyopencl was not installed. * Reduce installation to 1 liner	2020-12-10 09:24:08 -08:00
NeuralLink	8ab8a71d5d	refactor (#178 )	2020-12-10 09:23:36 -08:00
Marcel Bischoff	d204f09316	some progress on batchnorms (draft) (#147 ) * no of categories for efficientnet * need layer_init_uniforn * merge fail * merge fail * batchnorms * needs work * needs work how determine training * pow * needs work * reshape was needed * sum with axis * sum with axis and tests * broken * works again * clean up * Update test_ops.py * using sum * don't always update running_stats * space * self * default return running_stats * passes test * need to use mean * merge * testing * fixing pow * test_ops had a line dropped * undo pow * rebase	2020-12-09 22:14:27 -08:00
Marcel Bischoff	5d46df638a	abs as non-first class operation using relu (#171 ) * abs (non-first class) * whitespace	2020-12-09 12:20:34 -08:00
George Hotz	4c55c7208f	no pow if mul will do	2020-12-09 08:19:29 -08:00
George Hotz	b85f17f247	more optim cleanup	2020-12-09 08:18:10 -08:00
George Hotz	9a64d13b94	add conv biases and max pool	2020-12-09 08:01:20 -08:00
George Hotz	99fa65f057	enable batchnorm in serious mnist	2020-12-09 03:29:40 -08:00
George Hotz	ffb96b2d0b	batchnorm by marcelbischoff	2020-12-09 03:23:04 -08:00
NeuralLink	00e376f36c	leaky relu as geohot suggested (#167 )	2020-12-09 02:58:35 -08:00
George Hotz	c225e62dd2	touchups	2020-12-09 02:52:28 -08:00
Liam	89d0ff6989	Consistent testing (#137 ) * Consistent GPU classes Convert the existing GPU classes into one standard format. Remove duplicated functions in `test_mnist` and create a TestMNISTGPU class. This reduces line count and ensures consistency. Use `@unittest.skipUnless(GPU, "Requires GPU")` instead of `if GPU:` to skip GPU testing. This will ensure that skipped tests are displayed accordingly in the pytest output. * Optim Testing now supports GPU * Tensor testing now supports GPU jacobian and gradcheck auto skipped until GPU float64 support added. * GPU support for custom constructor methods * Remove GPU flag from Model constructors It was requested that the `gpu` kwarg be removed from the model constructor. GPU conversion is now handled in the train function. This also required the conversion of Optimizer parameters as they are constructed prior to execution of the `train` function and are dependant on the model GPU state. * Fix typo: float32->float64 * Clean `get_parameters` utility Just a quick refactor w/ the new support for optimizers. * Remove GPU kwarg from TinyNet Remove `gpu` kwarg from tiny net to match test_mnist `train` function.	2020-12-09 02:25:27 -08:00
Liam	34b38dd4d0	Extra install requirements. (#164 ) * Testing install requirements * GPU install requirements	2020-12-09 02:22:47 -08:00
George Hotz	0e02f394ee	serious_mnist	2020-12-08 21:43:05 -08:00
Daulet	24d688c184	win more lines for core library (#158 ) ...and sacrifice test speed	2020-12-08 14:18:45 -08:00
NeuralLink	9f77fd6135	🔨 refactor optim (#156 ) * 🔨 refactor optim * 🔨 refactor optim * 🔨 more clean up	2020-12-08 14:16:31 -08:00
George Hotz	4e1a0de392	fix rsub	2020-12-08 10:05:21 -08:00
George Hotz	c4540f1b8c	Support scalars by kartik4949	2020-12-08 09:52:07 -08:00
George Hotz	97fd9c1237	zero_grad there to match readme	2020-12-07 23:12:18 -08:00
George Hotz	c63f950348	need zero grad now	2020-12-07 23:10:43 -08:00
George Hotz	b355cd2571	Mean axis (doesn't work) (#154 ) * mean axis * fixed	2020-12-07 22:58:34 -08:00
George Hotz	38f97c8c80	prepare for ops_ane	2020-12-07 21:54:22 -08:00
George Hotz	7f249ec76d	touch up	2020-12-07 21:51:32 -08:00
Marcel Bischoff	58ccebd7cd	Sum with axis (#153 ) * sum with axis and tests * broken * works again * clean up * Update test_ops.py	2020-12-07 21:49:18 -08:00
George Hotz	ac9fecb05d	lots of notes	2020-12-07 21:40:31 -08:00
George Hotz	8d1500f497	conv neuron	2020-12-07 21:12:52 -08:00
George Hotz	e4bb53b0e9	work out more	2020-12-07 20:32:50 -08:00
George Hotz	4927ad1897	float16 weights in min.weights	2020-12-07 20:15:15 -08:00
George Hotz	3aac9aefce	fix GPU profiling	2020-12-07 20:03:28 -08:00
James Roberts	b2eca6d45f	Format debug output (#152 )	2020-12-07 14:07:14 -08:00
George Hotz	c7973cb0a1	ugh buffer_np is bad	2020-12-07 08:07:00 -08:00
George Hotz	088f280dc3	touchups	2020-12-07 07:50:27 -08:00
George Hotz	0cf21881b7	hwx parse w/o macho mods	2020-12-06 23:13:28 -08:00
Josh Smith	aa4161f63e	use classmethods for Tensor helper funcs (#146 )	2020-12-06 22:35:43 -08:00

1 2 3 4 5 ...

403 Commits