tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-20 20:38:03 -05:00

Author	SHA1	Message	Date
George Hotz	47f9887ce4	hip events work (#3229 ) * hip events work * event	2024-01-24 11:49:53 -08:00
George Hotz	e2e4632aea	LoadOps SYNC (#3223 ) * LoadOps SYNC and WAIT * no wait, only sync * DEBUG >= 1 * track cross device	2024-01-23 21:59:18 -08:00
chenyu	c4b5661146	fuzz length for multitensor reduce test case (#3190 ) so that the uneven case is not just with 0 length and can have other positve values	2024-01-20 00:44:38 -05:00
chenyu	fdb1c2b1d9	move reduce over 0 len axis logic to lazy.py (#3188 ) * move reduce over 0 len axis logic to lazy.py this fixed uneven shard reduce case if the uneven one has length 0 * fix interpreted backends * fix backwards for 0 shape tensors too	2024-01-20 00:13:03 -05:00
George Hotz	254a7372fe	buffer copy refactor (#3187 )	2024-01-19 20:21:24 -08:00
chenyu	cb4cfc078a	parameterize multitensor tests for reduce (#3181 ) uneven shards reduce is incorrect now	2024-01-19 14:03:01 -05:00
chenyu	c4faedebf3	add test cases for negative entry max allreduce (#3177 )	2024-01-18 22:26:51 -05:00
chenyu	ab1b7c4d09	fix allreduce for max (#3175 ) * test cases to show allreduce for max is incorrect * oh fixed	2024-01-18 20:25:35 -05:00
chenyu	28dcbf0e00	test case sharded batchnorm has different ast on devices (#3172 )	2024-01-18 18:12:15 -05:00
George Hotz	ee83505fcc	fix test extra issue (#3159 )	2024-01-17 11:58:08 -08:00
George Hotz	a72b1b6d65	sharding for llama (#3151 ) * shard llama * sharding works * simpler * simpler * consume option * disable that test * save a line --------- Co-authored-by: George Hotz <george@tinygrad.org>	2024-01-16 19:28:00 -08:00
chenyu	14c010958b	support for non-uniform sharding (#3154 ) * support for non-uniform sharding * bugfix and more tests --------- Co-authored-by: George Hotz <geohot@gmail.com>	2024-01-16 20:33:32 -05:00
George Hotz	228f30b96a	multitensor jit (#3149 ) * initial multitensor jit support and tests * Added graphs to multitensor jit and updated tests * update unbind api * fix set device, add TinyJit to resnet * update_stats includes device --------- Co-authored-by: ramenguy99 <ramenguy99@gmail.com>	2024-01-16 09:09:15 -08:00
George Hotz	cec0a7bc37	use shard api to eval resnet fast (#3136 ) * use shard api to eval resnet fast * to supports shard * test to in multitensor	2024-01-15 16:49:38 -08:00
Yixiang Gao	c13d51da1d	add device options for tests in multigpu (#3121 )	2024-01-14 15:17:47 -08:00
Yixiang Gao	13e872b53f	add mutigpu support for llama attention (#3064 ) * add llama attention test for multigpu * test fails * kv cache trying to shrink on sharded axis * mask None works for scale dot product * kv cache seems to be working but scale dot product breaks * scaled dot product works, but the last linear layer failed * running into the reshape case where it could be wrong for multigpu * making sure it was the reshape * adding contiguous doesn't solve * need to shard more properly * remove reshape test * minor adjustment to scale dot product attention test * weights are sharded wrong * continue fix new weight sharding * clean up * fix attention when start_pos is 0 * remove print * add TODOs for the best mutigpu interface	2024-01-11 16:31:02 -08:00
Yixiang Gao	adcc844755	cat works (#3086 )	2024-01-11 08:25:20 -08:00
Yixiang Gao	6842476ca6	better test demonstration (#3077 ) * a better test demonstration * fix white space	2024-01-10 10:50:52 -08:00
George Hotz	ac3f246c11	cached size (#3060 ) * cached size * simplify simplify * 0 doesn't have base * fix test * cleaner cache * hmm, metal is flaky on this...might be real(ish) but useless as test * short circuit reshape/expand properly * better reshape bypass	2024-01-09 16:37:37 -08:00
Yixiang Gao	73b72b8de2	test scaled dot product attention (#3063 ) * add test * add initial test for scaled dot product attention * test pass for scaled dot product attention	2024-01-09 14:30:57 -08:00
Yixiang Gao	259bf9bffc	add multigpu test for RMSNorm (#3056 ) * need all gather * add two multigpu test scenarios for RMSNorm	2024-01-09 09:52:51 -08:00
Yixiang Gao	a686663657	make Embedding device aware for multigpu (#3051 ) * make Embedding device aware for multigpu * split line instead of igore because that's cheating * add test incomplete * add test complete * remove comment * fix white space * remove nn.Embedding	2024-01-08 20:09:26 -08:00
Yixiang Gao	8a63f26a0f	make LR scheduler work with multigpu (#3011 ) * add a failing test for LR scheduler when using multigpu * fix calculation order and unnecessary tensor created for float * min_lr is no longer tensor	2024-01-04 12:10:56 -08:00
chenyu	81b97cd2c6	canonicalize device in LazyBuffer constructor (#2991 ) fixed the multitensor +1 then sum bug	2024-01-03 12:55:25 -05:00
chenyu	db525cf8c2	multitensor failed test case with +1 then sum on DEVICE:0 (#2990 )	2024-01-03 12:17:11 -05:00
George Hotz	5dbaaa7061	hotfix: make multitensor shard contiguous	2024-01-03 08:48:30 -08:00
George Hotz	f494b9d463	simple multitensor API (#2903 ) * simple multitensor API * test multitensor * mt work * new api * copies * all but data parallel * allreduce there * works, but axis sharded * fix all mt tests * features/multi * work * backprop * fix tests * tests passing * mt progress * cleanups * less lines * tensor cleanup * save more lines * mypy passes * fix tests * skip for cuda too * bump download cache	2024-01-02 17:49:44 -08:00

1 2 3 4

177 Commits