tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
Tobias Fischer	1a9e145388	Tensor Clone Function (#7154 ) * implemented clone function * cleanup linting, single func * added tests, cleaned up grad cloning * fixed whitespace	2024-11-01 12:24:43 +08:00
George Hotz	4812801aa6	try for canonical order (#7286 ) * try for canonical order * cmp better * disable bad tests * flip const order * fix test * fix tests * different fix for NOOP * metaclass here * fix tests * narrower scope	2024-10-25 16:04:54 +08:00
George Hotz	d726eb6f48	uop resolve [run_process_replay] (#6826 ) * uop bool and int and stuff [run_process_replay] * add ne support * can't even be None anymore * BinaryOps.AND support * less compare	2024-10-01 13:11:42 +08:00
wozeparrot	c100f3d406	default threefry (#6116 )	2024-09-25 17:45:13 +08:00
George Hotz	cb22ef379a	truncate consts early (#6741 ) * truncate consts early * ptx still fails * Update dtype.py	2024-09-25 16:49:51 +08:00
wozeparrot	2be0b26a1f	rand only supports single device (#6682 )	2024-09-24 16:07:44 +08:00
qazal	982086f54c	UOps.VALID try 2 (#6623 ) * make UOps.VALID compile * fixable tests * bufs dedup * cleanup the CONST spec * regenerate dataset with graph_rewrite ```py def rewrite_const(const:UOp, st_src:UOp) -> UOp: st: ShapeTracker = st_src.arg return UOp(UOps.VALID, dtypes.bool, (st.to_uop(),)).where(UOp.const(const.dtype, const.arg), UOp.const(const.dtype, 0)) pm = PatternMatcher([(UPat(UOps.CONST, name="const", src=(UPat(UOps.SHAPETRACKER, name="st_src"),)), rewrite_const)]) ``` * rm arg * remove arg * revert arg removal This reverts commit `2c35c75c95`. * red test_pickle_define_var	2024-09-21 14:19:25 +08:00
George Hotz	dbd4536167	Revert "add UOps.VALID (#6387 )" (#6441 ) This reverts commit `8186e4e7d6`.	2024-09-09 21:33:00 +08:00
George Hotz	8186e4e7d6	add UOps.VALID (#6387 ) * uops valid * broke full_shape * fixup that st (hardcoded asts still red) * fixup DEFINE_VAR debug more debug * start moving stuff to ast_const * move test_linearizer * move test_linearizer_failures to ast_const * fixup test_schedule * small diff change * regenerate dataset * fixup test_multitensor * regen dataset try 2 --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-09-09 16:58:43 +08:00
chenyu	943ab97d24	fix Tensor.prod for multitensor (#6264 )	2024-08-24 08:52:24 -04:00
qazal	28c75bf2a6	merge uops with ops (#6111 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-08-16 18:17:57 -04:00
qazal	c23d44c779	AST is UOp (#6030 ) * most of the work from the uops2 branch * schedule * realize * kernel * lowerer * search * green * merge uops with ops * Revert "merge uops with ops" This reverts commit `1408a59f12`. * fix benchmark * remove extra dedup	2024-08-16 22:09:00 +03:00
Tobias Fischer	6e3eb50fd1	added fix and reg tests (#6060 )	2024-08-12 21:00:48 -04:00
David Hou	eb91423cb4	MLB support reshape for uneven shards (#5804 ) * cleaner uneven reshape * update test	2024-08-01 02:36:03 -07:00
David Hou	492a696d14	allow specify splits in shard, handle multiple different splits in MLB.e (#5599 ) * allow specify splits in shard, handle multiple different splits in MLB.e * line width * linter * don't use Device in docstring * specify size of shards instead of boundaries * adjust docstring for specify size of shards instead of boundaries * don't allow splits on symbolic axis? * just allow sint in splits_to_bounds * add message for assert * bounds instead of splits to save lines * fix types * reduce diff * fix * tuple * golf :( --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-07-30 19:33:04 -07:00
George Hotz	e638b0084f	smaller multitensor resnet test (#5450 ) * minor improvments to matcher speed [run_process_replay] * oh, put that back * make fake images smaller for resnet test	2024-07-13 07:31:28 -07:00
George Hotz	6707c778d0	scheduleitem is not Tuple [run_process_replay] (#5425 ) * scheduleitem is not Tuple [run_process_replay] * fix tests * fix op + fuzzers * fix mop test	2024-07-12 15:13:19 -07:00
George Hotz	f6ef283e6a	s/loadops/metaops [run_process_replay] (#5421 )	2024-07-12 13:26:50 -07:00
George Hotz	3e40211e45	add UOP_IS_SYMBOLIC [run_process_replay] [no_assert] (#5386 ) * cleanup a few things in uops [run_process_replay] [no_assert] * add optional UOP_IS_SYMBOLIC	2024-07-11 10:48:45 -07:00
nimlgen	1678199b15	add update_copy to hcq spec (#5348 ) * add update_copy to hcq spec * fix amd	2024-07-09 20:44:44 +03:00
qazal	c1e166c08a	fix dtype mismatch for bool ops in multi (#5299 )	2024-07-06 11:36:40 +03:00
chenyu	b2c3a28a5e	nn.RMSNorm (#5272 ) the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize	2024-07-02 21:39:01 -04:00
Roelof van Dijk	f88f71d73a	ruff: unnecessary-comprehension (#5174 ) * enable ruff C416 unnecessary-comprehension * already a list	2024-06-27 07:45:29 -04:00
David Hou	666a9c1448	don't view origin buffer when sharding (#5122 ) * make buffer view optional with a flag * do not view when sharding to save memory	2024-06-25 20:19:09 -07:00
chenyu	7948b05738	fix uneven shard with shrink and pad args on sharded axis (#5131 ) it's incorrect to assume all first (len(device)-1) shards would have the same size. e.g. size 2 shard 4 -> (1, 1, 0, 0)	2024-06-24 16:55:50 -04:00
chenyu	4a7d403777	cleanup test_multitensor (#5118 ) renamed d_zero, d0, d1, d2, ... to d0, d1, d2, d3 and reused some multi device tuples	2024-06-23 20:54:22 -04:00
chenyu	c0ba5e0dfb	multi copy_to_device return the copy on same device if possible (#5117 ) previously it always returns from the first device	2024-06-23 20:25:56 -04:00
chenyu	b886d250fb	improve test_dropout_on_shard (#4912 ) tested some basic property, also minor formatting for a few Tensor.training setups	2024-06-11 11:36:02 -04:00
George Hotz	35e53c0809	add sharded arange test (#4908 )	2024-06-11 10:58:33 +02:00
chenyu	e33efd6a3d	test cases for multitensor adds const (#4892 ) Tested const remained const in ast. Removed the TODO in _to_const_val too	2024-06-08 22:57:48 -04:00
nimlgen	e78a9bf3f2	support view in nv/amd (#4812 ) * support view in nv/amd * fix amd * fix * run test on nv/amd	2024-06-03 22:11:52 +03:00
qazal	637f482588	configure derandomizing CI tests (#4793 )	2024-05-31 17:06:58 +03:00
George Hotz	07b350a8f4	new uops is an actual graph (#4560 ) * new uops is an actual graph * it's way slower * simpler * fix define acc * render_loop unique * ops test pass * add pattern matcher back, there's bugs * rewrite * use priority queue * recursive children * fix tests * fix tests with SINK * fix abstractions * fix assembly * simpler * link define_acc * fix DEFINE_ACC placement * type verify * full cmp * fix cmp * ACCESS_ACC * insert DEFINE_ACC * fix PHI * recursive rewrite * fix many tests * sum collapse * more patterns * correct change * fold arange * fix that lin test * space * big folding rule works * close * has more maxes, meh * cached node replace * set changed * simplest folding yet * works * works * DIV * all tests pass * del * fuzz linearizer fails * sum_collapse * test depth 2 cf * fix lin test 14 * fix clang depth * disable that * failure 14 is fixed * fix ptx * failure 27 is fixed * fix llama * run_cnt * Revert "Optimize PTX gated loads index calculation (#4304)" This reverts commit `d97d5a7689`. * fix uops loop * fix ptx bugs * add barrier * print * mem_type in ptx direct * bypass tests that fail in CI but pass locally * ptx remove ptr_ar * more ptx passing * fix ptx tests * assert compile support * remove model inference benchmark from red	2024-05-17 18:00:18 -07:00
nimlgen	eb9689336e	nv mockgpu (#4600 ) * mockgpu nv * works * comment that out * fix merge * setup gpuocelot * install packages * not run all of them * passes * fix ci * almost * should pass * linter * linter 2 * try this? * ugn, not supported * ci * remove ticket from description * better descs	2024-05-15 23:46:08 +03:00
George Hotz	5ba611787d	move image into tensor.py. delete features (#4603 ) * move image into tensor.py * change setup.py * openpilot tests need pythonpath now	2024-05-15 10:50:25 -07:00
George Hotz	2f970a4fc2	all realize 2 (#4527 ) * all realize 2 * tests fixup * fix more tests * fix openpilot * fix tests * unneeded	2024-05-10 22:43:09 -07:00
George Hotz	89e119bc58	move Allocator to buffer.py (#4502 ) * move Allocator to buffer.py * move those to realize * memory file * cleanup	2024-05-09 19:45:56 -07:00
George Hotz	c9e84ed0da	refactor to Program class (#4476 ) * refactor to Program class * switch to Program * fix tests * smaller diff * self.p * more tests * fix metal test * tests * fix openpilot * move that to linearizer * p.launchdims	2024-05-09 17:29:07 -07:00
George Hotz	17faae091b	optimizer shouldn't be run without training (#4460 ) * optimizer shouldn't be run without training * set training in relevant tests * fix multitensor * that too	2024-05-06 15:34:12 -07:00
George Hotz	9fc4465557	subbuffer support (#4397 ) * subbuffer support * diskbuffer offset * cuda subbuffer works * use subbuffer * more subbuffer tests * consecutive * cast * consec * offset * view is a better name * offset is in nbytes * fix view + memory planner * delete unused DiskRunner * reverse order * no subbuffers on unrealized consts * only enabled for disk * don't reverse memory * view supported devices * pickle buffer view * ring jit * support extra view inputs in jit * fix JIT=2 issue * test copy jit * p2p isn't an option anymore * fix dep tracking issue * fix mypy * fix pickle * from_nv is contents now	2024-05-03 18:05:57 -07:00
George Hotz	c8a2047377	testing for all reduce (#4387 )	2024-05-02 06:34:10 -07:00
chenyu	f363f39e83	fix dtype of const folded sum (#4349 ) const folding sum should return in the same dtype the same as regular sum, which can be different from input dtype	2024-04-29 11:40:45 -04:00
George Hotz	50e780a588	multitensor shouldn't recompile (#4164 ) * multitensor shouldn't recompile * type annotations * fix tests * outcount in reduce	2024-04-13 00:03:48 -07:00
uuuvn	2b81d9b334	Fix broken test (#4104 )	2024-04-07 12:02:12 -04:00
uuuvn	bb7567b365	Fix metal (#4101 )	2024-04-07 05:21:19 -07:00
George Hotz	a337922c44	more work on kfd (#4079 ) * more work on kfd * fix multitensor test on kfd * stuff	2024-04-05 08:36:36 -07:00
chenyu	82440d3416	don't call contiguous for unpadded const into multi tensor (#4032 ) * don't call contiguous for unpadded const into multi tensor fixed multi const folding for sharded const. still wip, need to be careful that this does not break multi device cache somewhere * ehh need a memory test for that * simple sharded memory test	2024-04-01 19:22:14 -04:00
George Hotz	9eef44521b	ScheduleItem uses Buffer (#3995 ) * schedule Buffer * update * update tests * master * works * remove LoadOps.WAIT * fix compile2 * bad test * rename and note	2024-03-29 20:50:27 -07:00
George Hotz	68ca4d4276	split to schedule.py (#3949 ) * split to schedule.py * split	2024-03-26 21:02:46 -07:00
George Hotz	150ea2eb76	create engine folder and move code (#3948 ) * retry * older tf * that	2024-03-26 20:38:03 -07:00

1 2

95 Commits