tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
David Hou	3378625773	name upcast variables (#3200 ) * name upcast variables * typing * unused	2024-01-22 11:37:28 -05:00
chenyu	827b7a3c64	cleanup pad_reflect and make_square_mask in hlb_cifar (#3206 ) removed some complicated looking stuff. no wall time difference	2024-01-22 11:30:46 -05:00
chenyu	99884f4c98	cifar flags for RANDOM_CROP, RANDOM_FLIP, and CUTMIX (#3204 ) experimenting with different setups, also would like to jit the data augmentation next	2024-01-22 01:12:51 -05:00
chenyu	53afec2841	add HALF to handcode_resnet50_opt.py (#3202 ) use this to study tensor cores on HIP	2024-01-21 23:03:59 -05:00
chenyu	836883fedc	comment out cutmix in hlb_cifar (#3201 ) it's no-op with multi gpu and less STEPS. also the patch was selected from the whole dataset, not from the same batch	2024-01-21 22:24:53 -05:00
chenyu	e6c71f1b26	fix device of Tensor.arange inside Tensor.one_hot (#3199 ) it should have the same device as self	2024-01-21 21:03:50 -05:00
chenyu	f7d1c42239	cleanup noop prefixes in _pool (#3198 ) * cleanup noop prefixes in _pool make expand dim=None as noop (in addition to -1). then slice, reshape, expand in _pool can share the same noop prefix * nit * something then reshape style * that's repeat	2024-01-21 20:03:32 -05:00
uuuvn	640e5c36ad	Fix metal tests broken by `3f56d1a` (#3196 ) * Remove from binary_operations before copying binary_operations into integer_binary_operations * Also remove lt and eq if running on METAL	2024-01-21 11:53:25 -05:00
chenyu	b9d27636aa	cleanup test_ops.py (#3192 ) - removed exact duplicated tests - only kept one function if torch_fxn is the same as tinygrad_fxn - used tensor method instead of class method style - replaced unneeded `lamdba f: f(x)` with just `f` - re-enabled commented tests that work now - removed some forward_only now 0 shape tensor can backward	2024-01-20 20:08:56 -05:00
chenyu	3f56d1a5e8	add operator.lt and operator.eq to test_dtype_alu (#3191 ) * add operator.lt and operator.eq to test_dtype_alu those should pass now as we have broadcasted before passing to lt and eq. also updated the test skipping criteria to reuse test_dtype.is_dtype_supported * llvm lt nan is incorrect * enable truediv too * Revert "enable truediv too" This reverts commit `df703235fb`. * just that	2024-01-20 14:54:02 -05:00
chenyu	c4b5661146	fuzz length for multitensor reduce test case (#3190 ) so that the uneven case is not just with 0 length and can have other positve values	2024-01-20 00:44:38 -05:00
chenyu	fdb1c2b1d9	move reduce over 0 len axis logic to lazy.py (#3188 ) * move reduce over 0 len axis logic to lazy.py this fixed uneven shard reduce case if the uneven one has length 0 * fix interpreted backends * fix backwards for 0 shape tensors too	2024-01-20 00:13:03 -05:00
chenyu	485332935e	ring copy example (#3185 ) * ring copy example * use ones for init	2024-01-19 23:34:30 -05:00
George Hotz	254a7372fe	buffer copy refactor (#3187 )	2024-01-19 20:21:24 -08:00
chenyu	fb4bd2a57d	reenable padto to search action (#3183 )	2024-01-19 14:17:53 -05:00
chenyu	cb4cfc078a	parameterize multitensor tests for reduce (#3181 ) uneven shards reduce is incorrect now	2024-01-19 14:03:01 -05:00
nimlgen	5097d5b808	fix padto when with late reduce (#3180 ) * fix padto test * no long comment	2024-01-19 14:01:44 -05:00
George Hotz	729a01bf3e	complex PRs will not be merged	2024-01-19 10:58:47 -08:00
nimlgen	f87ecbb0f3	fuzzer validates outputs + (partially) oob accesses (#3178 ) * fuzzer validates outputs + (partially) oob accesses * +random * oob check only for compiled * type cmp fixes * fix zeroing * no prints * add seed	2024-01-19 13:34:51 -05:00
chenyu	b2571d586c	hypothesis.st -> hypothesis.strat (#3179 ) leave `st` for shapetracker	2024-01-19 11:55:26 -05:00
chenyu	c4faedebf3	add test cases for negative entry max allreduce (#3177 )	2024-01-18 22:26:51 -05:00
chenyu	ab1b7c4d09	fix allreduce for max (#3175 ) * test cases to show allreduce for max is incorrect * oh fixed	2024-01-18 20:25:35 -05:00
George Hotz	c51c90bcd4	more sync in transfer (#3174 )	2024-01-18 17:17:03 -08:00
chenyu	28dcbf0e00	test case sharded batchnorm has different ast on devices (#3172 )	2024-01-18 18:12:15 -05:00
chenyu	a60d50487d	disable padto, seems to have bug in gpt2 (#3173 )	2024-01-18 18:09:30 -05:00
George Hotz	c80884884e	event driven hip (#3160 ) * event driven hip * simpler, src makes copy * pass mypy	2024-01-18 14:35:18 -08:00
George Hotz	d2aab65958	remove unused expr node (#3170 ) * remove unused expr node * still works * simple expr_idxs * fixup typing	2024-01-18 14:18:43 -08:00
chenyu	097b1390ec	touchup test_indexing (#3169 )	2024-01-18 14:32:43 -05:00
George Hotz	a04e4d0442	inline clang renderer (#3168 )	2024-01-18 11:17:34 -08:00
geohotstan	efbe4788d1	indexing: Final cleanup (#3156 ) * init * feat: add _to_const_val to getitem * doc: changed docs * docs: updated more docs * merge: improved/fancy * better error msg, minor cleanups * feat: added index_put to test_indexing * clean: test_indexing * revert: gather changes lol * refactor: use dict for tracking tensor indexing, also asserts for type * oooooooooops * ugh * will revert this commit xD * fix: removed asserts * improvement: made in-line if statement clearer * improved err message and improved slice_int tests * fix: recover accidentally deleted line * finishing touches * reword some docs and del torch device tests in test_indexing * del some redundant tests * revert: gather asserts, do it in seperate pr * fix some data_ptr stuff * done * done done	2024-01-18 14:08:03 -05:00
chenyu	e139ae550d	smaller limit_dims_to_max (#3167 ) same questionable logic, but less lines now	2024-01-18 13:02:20 -05:00
nimlgen	992067399e	clean up exceptions in __del__ everywhere (#3165 )	2024-01-18 08:34:09 -08:00
Max-We	0338903429	Update kits19.py (#3166 )	2024-01-18 08:33:50 -08:00
George Hotz	67bc2ddfd8	JIT cleanups (#3164 ) * move GraphException * factor out apply_graph_to_jit * that return was wrong	2024-01-17 23:39:57 -08:00
George Hotz	f0c178b7e9	move get_contraction to helpers (#3162 ) * move get_contraction to helpers * move simplify * lines * to_movement_ops is not generic	2024-01-17 19:13:11 -08:00
chenyu	e52a609240	make WINO a context var, and LATEWINO in hlb_cifar (#3161 )	2024-01-17 20:21:26 -05:00
George Hotz	ee83505fcc	fix test extra issue (#3159 )	2024-01-17 11:58:08 -08:00
George Hotz	9cc2577a08	use hip events (#3157 ) * use hip events * cleanup	2024-01-17 10:39:57 -08:00
chenyu	1b508e0f71	fix fuzz_linearizer toCPU to as_buffer (#3158 )	2024-01-17 13:18:46 -05:00
George Hotz	743b36f0ce	hotfix: copy size is in bytes	2024-01-17 16:44:15 +00:00
George Hotz	2e6162b281	graph cleanup (#3155 ) * simpler graph * unused functions	2024-01-16 20:57:31 -08:00
George Hotz	a72b1b6d65	sharding for llama (#3151 ) * shard llama * sharding works * simpler * simpler * consume option * disable that test * save a line --------- Co-authored-by: George Hotz <george@tinygrad.org>	2024-01-16 19:28:00 -08:00
chenyu	14c010958b	support for non-uniform sharding (#3154 ) * support for non-uniform sharding * bugfix and more tests --------- Co-authored-by: George Hotz <geohot@gmail.com>	2024-01-16 20:33:32 -05:00
nimlgen	81ae4ea179	compile cache for several devices (#3148 ) * compile cache for several devices * ops_gpu uses hash to not care about sql * hip rdna test with device * linter happy * no device passed where possible * arch is optional to compile_{hip\|cuda}	2024-01-16 11:45:26 -08:00
chenyu	589c16756f	hlb_cifar multi gpu training (#3150 ) * cifar train with multi gpu * GPUS=1 is noop	2024-01-16 14:38:45 -05:00
George Hotz	cc0de99751	hotfix: multilazybuffer can have only one lazybuffer	2024-01-16 10:06:45 -08:00
George Hotz	228f30b96a	multitensor jit (#3149 ) * initial multitensor jit support and tests * Added graphs to multitensor jit and updated tests * update unbind api * fix set device, add TinyJit to resnet * update_stats includes device --------- Co-authored-by: ramenguy99 <ramenguy99@gmail.com>	2024-01-16 09:09:15 -08:00
chenyu	b9d470577c	gelu -> quick_gelu in hlb_cifar (#3147 ) 89 -> 86 seconds, same eval acc	2024-01-16 02:03:37 -05:00
chenyu	ec5a212b0a	modernize hlb_cifar (#3146 ) * modernize hlb_cifar do more things in Tensor space instead of numpy, clean up dtypes and use more Tensor methods. * eigens are float64	2024-01-16 01:35:11 -05:00
chenyu	2088937206	run full hlb_cifar training in tinybox ci (#3145 ) * run full hlb_cifar training in tinybox ci single gpu ~89 seconds * time that	2024-01-15 23:59:20 -05:00

1 2 3 4 5 ...

3452 Commits