tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-08 22:48:25 -05:00

Author	SHA1	Message	Date
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
qazal	79fb5c6470	hotfix: test_shard_no_recompile shouldn't rely on schedule order [pr] (#8928 )	2025-02-06 16:27:59 +02:00
chenyu	48349efdc1	copy is already contiguous (#8886 )	2025-02-04 17:53:33 -05:00
chenyu	836cf42c2e	fix rand_like for multi (#8880 )	2025-02-03 19:00:14 -05:00
chenyu	746d899dbd	move multi axis to property (#8879 ) also updated tests so that axis is known prior to realize	2025-02-03 16:02:09 -05:00
George Hotz	431a86615d	fix multi Ops.CONTIGUOUS_BACKWARD [pr] (#8843 )	2025-02-01 09:21:31 +08:00
George Hotz	62655e4999	move multi into engine [pr] (#8778 ) * move multi into engine [pr] * all runtime is one sz	2025-01-28 09:15:29 +09:00
George Hotz	0ffd572e1e	fix multi with no real srcs (#8749 )	2025-01-26 08:41:00 +09:00
chenyu	e2b380b743	make UOp.multi real a tuple instead of list [pr] (#8744 ) tuple is immutable. also updated test_rand_like_from_alu test	2025-01-24 20:47:27 -05:00
chenyu	e0e176efbc	failed test case for multi rand_like [pr] (#8740 ) new multi broke multi device dropout	2025-01-24 13:56:51 -05:00
George Hotz	e82ba1454b	MultiLazyBuffer is UOp [pr] (#8662 ) * MultiLazyBuffer is UOp [pr] * this is new mlb * this is the idea * progress * multitensor works * more movement ops * this * MultiLazyBuffer is UOp * cleanups * multi axis * fix more tests * work * not that * add multi grad and move shard to ops * mops not views * no double contig * sweet, all mt tests passing * port old logic * remove lbs * fix realized * whitespace * assign tweak * test_assign_kv_cache_multi passes * fix is_realized * fix JIT for multi * just a few more lines i'll pay them back soon i swear please bro just a few more * no split reduceop for multi	2025-01-24 13:28:55 +09:00
George Hotz	46a8c5e1e5	delete forced_realize (#8615 ) * delete forced_realize * put that back * expectedFailures * cleaner create_subbuffer * more comments --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-20 09:40:36 -08:00
George Hotz	8609b880bd	hotfix: test_backward_sum	2025-01-17 10:25:02 -08:00
chenyu	f8cc971c3b	raise RuntimeError for uneven shards in Tensor.shard [pr] (#8656 )	2025-01-17 12:48:39 -05:00
qazal	23f0ff0ed8	add bitcast to multi [pr] (#8652 )	2025-01-17 03:17:19 -05:00
qazal	2b7db9b45d	delete unused cast/bitcast lines from ops.py [pr] (#8651 ) * move cast and bitcast out * more deletion of bitcast arg * fix test_bitcast_fuses * update tests * work	2025-01-17 03:04:18 -05:00
George Hotz	f29d6f54b8	support multilb gradient [pr] (#8624 )	2025-01-14 18:33:33 -08:00
chenyu	0790d8059f	remove MultiLazyBuffer.from_sharded [pr] (#8620 ) it's eqivalent to taking the lazydata from Tensor.split, then copy to devices	2025-01-14 18:00:49 -05:00
George Hotz	fdd46c9f28	delete view instant rule (#8616 ) * remove cast before view * greener * indexing * delete view instant rule * that passes too * openpilot too * ack * base on cast_before_view * add it as a rewrite rule * VIEW(DEVICE) is also fine * test_shard_memory depends on forced_realize removal * put that back, will go soon * UOp representations change once we don't instantly fold things * do not duplicate tests --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 16:15:13 -05:00
chenyu	393eec3201	raise RuntimeError for uneven shard [pr] (#8593 ) no 7B llama on 6 GPUs skip 70B	2025-01-14 14:51:48 -05:00
chenyu	d443e91d82	remove custom splits in Tensor.shard [pr] (#8602 ) towards even split only	2025-01-13 21:29:13 -05:00
qazal	866dfa1f23	create_schedule([x.lazydata]) -> x.schedule() in tests (#8449 )	2024-12-31 03:15:52 +08:00
qazal	34987a03af	const copy folding spec + multi.py behavior [pr] (#8436 ) * const copy folding spec + multi behavior [pr] * copy from clang, move multi test	2024-12-29 23:12:13 +08:00
George Hotz	074315ec08	hotfix: simpler test_mnist_model	2024-12-20 10:18:17 -08:00
qazal	9044b0746a	delete lazy [pr] (#7801 ) * LazyBuffer = UOp * try 4 at this diff * skip optimization tests p1 * raise kernel count expectations * BIND isn't the _only_ uop that can become a tensor * fix test_ones_sum on symbolic * bump openpilot, correctness first * offset on assign is fine * uop is immutable * what if this was higher * more optimization skips * instant fold const copy * test_multitensor shouldn't expect buffer for unrealized * move copy folder to upats * start BUFFER_VIEW * kinda BUFFER_VIEW * Revert "kinda BUFFER_VIEW" This reverts commit `94b4fe3040`. * BUFFER_VIEW try 2 * linter and missed _device * pylint * keep Ops.CONTIGUOUS * always BUFFER_VIEW disk * test * cpu isn't a real device * buffer references afte del * add that back * start bringing some of these back * more test updates * simpler simplify copy * subbufer everything * this is fine with buffer view * cleanup the diff in test/ 1 * copy is one thing * diff pruning * diff pruning 2 * oh bind unbinds way too early * extra * more diff pruning * more const folding * experiment with symbolic here * Revert "experiment with symbolic here" This reverts commit `cb87d61f7a`. * Revert "more const folding" This reverts commit `2a7d258a2b`. * Revert VALID early folding This reverts commit `4074f52317`. * storing const is fine * fix test_prefer_half_buffer * iterate on test_real_world * this fixes test_train_mnist memory, breaks everything else * Revert "this fixes test_train_mnist memory, breaks everything else" This reverts commit `dccfcbe068`. * always expect buffer to exist here * temp debug: something is mutating lazydata in compile3 * Revert "temp debug: something is mutating lazydata in compile3" This reverts commit `71400f0d55`. * everything back to normal * compile3 * compile3 test * start captured jit work, that test passes * finalized memory skip set * linter err * back to base here * tiny metaop cleanup * print tensor * 4th type this unbind got me * green pickle * tensor_variable sanity * cast sanity * link from the reds * COPY sanity + minor repr change * you can exist * enable test_winograd * bye bye nbytes * danger, uop is mutating * real become * delete those from uop init * put it in buffer init * buffer inits with so much stuff * buffer pickle try 2 * toposort can't be a cached property * fix test_schedule_gc_with_inputs * remove all @unittest.skip(gc) * Revert "remove all @unittest.skip(gc)" This reverts commit `9d8d92dd85`. * reenable real world + test_schedule_gc * test: RUN_PROCESS_REPLAY=0 * fix pickle jit * test changes * reenable test_lru_alloc and TestTrain * fix imagedtype * bring pr back * reenable 3 gc tests * test_schedule better diff * disable SPLIT_REDUCEOP * test_save_all_dtypes looks fixed * fix metadata * skip that one * fix viz by not pickling buffers * simple test for const folding * bring split reduceop back * add simplify_alu * simplify_binop fixes a test * fix cast folding * disable that test * that test looks fine * changes from delete_lazy pruning p1 * cast folding and children base * test: cast folding from pruning branch * green test_sgd_4convs_fuse_conv_bw * enable some indexing folding * test_complex_backward is fixed * prune more, 295 -> 233 * fix test_multi_const_folding_literal * fix double copy * early become test * ooooops * clean up ctx in all big_graph * fix openpilot 208 kernels * train_cifar is fine now * fix CAST_BEFORE_VIEW * ever faker const * back to 13 * mark expectedFailure * fine don't create them * test_multi_const_folding_tensor --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-12 05:05:19 +08:00
chenyu	5eadae204b	test multi device rand with manual_seed (#8164 )	2024-12-11 13:11:31 -05:00
Ahmed Harmouche	a8cfdc70ed	Run more webgpu tests (#8142 )	2024-12-10 23:20:04 +01:00
qazal	df84dc6444	unrelated test fixups from delete_lazy [pr] (#8088 ) * unrelated test fixups from delete_lazy [pr] * fine if it's scheduled later	2024-12-06 17:31:02 +02:00
chenyu	66d7d5af50	fix Tensor(MultiLazyBuffer) with different dtype should fail (#7757 ) similar to Tensor(LazyBuffer) as we don't cast implicitly	2024-11-17 21:05:45 -05:00
ignaciosica	597a239e28	Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725 ) * remove unaryops * remove ternaryops * remove metaops * hotfix * remove binaryops * hotfix: test_pattern_matcher --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-16 20:56:56 +08:00
chenyu	d1dfd598a2	assert specifying device to rand_like a multi tensor (#7678 ) * assert specifying device to rand_like a multi tensor raise RuntimeError instead of dropping it silently * fix that	2024-11-13 10:24:40 -05:00
chenyu	51432bfbff	add rand_like test case with device specified (#7663 ) in single device or copied multi case, device is applied. but for sharded case the device is silently ignored now. maybe similar to rand we just don't allow tuple device in rand_like	2024-11-13 09:32:55 -05:00
qazal	e84d089ef1	delete ReduceOps, only use REDUCE_AXIS (#7667 )	2024-11-13 19:04:27 +08:00
uuuvn	c846dd70b2	Increase test tolerance for probabilistic test (#7580 )	2024-11-07 09:35:11 -05:00
George Hotz	205befa788	move is_dtype_supported to device [pr] (#7575 )	2024-11-07 20:38:03 +08:00
George Hotz	99bd4372a5	Ops.ALU is no more, the arg is just an op (#7525 ) * op arg alu [pr] * more * more passing * fix more tests * more tests passing * fix single failing test * so much cleaner * noop to not have process replay trigger * fix ptx	2024-11-05 00:22:22 +08:00
George Hotz	0c19b6298b	rename ops to have unique names (#7522 )	2024-11-04 17:09:45 +08:00
George Hotz	c8bf09b7d4	s/UOps/Ops (#7500 ) * s/UOps/Ops [pr] * fix	2024-11-03 11:26:10 +08:00
chenyu	18e159c9ac	comment about multi real and more tests [pr] (#7467 )	2024-11-01 11:49:11 -04:00
Tobias Fischer	1a9e145388	Tensor Clone Function (#7154 ) * implemented clone function * cleanup linting, single func * added tests, cleaned up grad cloning * fixed whitespace	2024-11-01 12:24:43 +08:00
George Hotz	4812801aa6	try for canonical order (#7286 ) * try for canonical order * cmp better * disable bad tests * flip const order * fix test * fix tests * different fix for NOOP * metaclass here * fix tests * narrower scope	2024-10-25 16:04:54 +08:00
George Hotz	d726eb6f48	uop resolve [run_process_replay] (#6826 ) * uop bool and int and stuff [run_process_replay] * add ne support * can't even be None anymore * BinaryOps.AND support * less compare	2024-10-01 13:11:42 +08:00
wozeparrot	c100f3d406	default threefry (#6116 )	2024-09-25 17:45:13 +08:00
George Hotz	cb22ef379a	truncate consts early (#6741 ) * truncate consts early * ptx still fails * Update dtype.py	2024-09-25 16:49:51 +08:00
wozeparrot	2be0b26a1f	rand only supports single device (#6682 )	2024-09-24 16:07:44 +08:00
qazal	982086f54c	UOps.VALID try 2 (#6623 ) * make UOps.VALID compile * fixable tests * bufs dedup * cleanup the CONST spec * regenerate dataset with graph_rewrite ```py def rewrite_const(const:UOp, st_src:UOp) -> UOp: st: ShapeTracker = st_src.arg return UOp(UOps.VALID, dtypes.bool, (st.to_uop(),)).where(UOp.const(const.dtype, const.arg), UOp.const(const.dtype, 0)) pm = PatternMatcher([(UPat(UOps.CONST, name="const", src=(UPat(UOps.SHAPETRACKER, name="st_src"),)), rewrite_const)]) ``` * rm arg * remove arg * revert arg removal This reverts commit `2c35c75c95`. * red test_pickle_define_var	2024-09-21 14:19:25 +08:00
George Hotz	dbd4536167	Revert "add UOps.VALID (#6387 )" (#6441 ) This reverts commit `8186e4e7d6`.	2024-09-09 21:33:00 +08:00
George Hotz	8186e4e7d6	add UOps.VALID (#6387 ) * uops valid * broke full_shape * fixup that st (hardcoded asts still red) * fixup DEFINE_VAR debug more debug * start moving stuff to ast_const * move test_linearizer * move test_linearizer_failures to ast_const * fixup test_schedule * small diff change * regenerate dataset * fixup test_multitensor * regen dataset try 2 --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-09-09 16:58:43 +08:00
chenyu	943ab97d24	fix Tensor.prod for multitensor (#6264 )	2024-08-24 08:52:24 -04:00
qazal	28c75bf2a6	merge uops with ops (#6111 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-08-16 18:17:57 -04:00

1 2 3

134 Commits