tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
George Hotz	b8342fb085	independent lowerer [run_process_replay] (#5434 ) * independent lowerer [run_process_replay] * don't relinearize PTX * fix ptx * Revert "fix ptx" This reverts commit `f4e8e059c0`. * Revert "don't relinearize PTX" This reverts commit `f6c12c506c`. * parents is fine, no need for linearization * remove loop local idxs * recover stupid loop_idxs	2024-07-12 18:08:43 -07:00
chenyu	9a187e6102	fix handcode_opt script (#5435 ) * fix handcode_opt script * run in ci * real run in ci * HALF=0	2024-07-12 20:52:28 -04:00
wozeparrot	b80fd7d23c	allow benchmarking forward only (#5436 )	2024-07-12 17:37:49 -07:00
chenyu	00813a92a0	update Tensor.eye api to match torch (#5433 ) * update Tensor.eye api to match torch input is n for nrows and optional m for ncols * space * fix onnx	2024-07-12 20:25:12 -04:00
George Hotz	cddfd8e25d	bugfix: group for reduce should check all dimensions (#5431 )	2024-07-12 17:02:40 -07:00
George Hotz	fbaf040baf	compute full_shape from LazyOp [run_process_replay] (#5429 ) * compute full_shape from LazyOp * put KernelInfo in the sink * wrong but pass	2024-07-12 16:47:08 -07:00
George Hotz	870dc8c350	s/Linearizer/Lowerer [run_process_replay] (#5428 )	2024-07-12 15:54:07 -07:00
George Hotz	6707c778d0	scheduleitem is not Tuple [run_process_replay] (#5425 ) * scheduleitem is not Tuple [run_process_replay] * fix tests * fix op + fuzzers * fix mop test	2024-07-12 15:13:19 -07:00
chenyu	4cd1de038a	smaller reshape_and_permute arg in shift_to (#5426 ) adding tuples directly [run_process_replay]	2024-07-12 17:46:48 -04:00
George Hotz	94599c0637	fixup ast in kernel to be MetaOps.SINK [run_process_replay] (#5424 ) * fixup ast in kernel to be MetaOps.SINK [run_process_replay] * fix tests * fix more tests	2024-07-12 14:01:03 -07:00
George Hotz	b055ece550	hotfix: bump to cache gpuocelot	2024-07-12 13:54:14 -07:00
chenyu	d37056f3b1	pass Renderer.global_max / local_max into get_grouped_dims (#5423 ) [run_process_replay]	2024-07-12 16:49:27 -04:00
George Hotz	4aefb1595d	MetaOps.SINK [run_process_replay] (#5422 ) * s/loadops/metaops [run_process_replay] * add metaops.sink [run_process_replay]	2024-07-12 13:37:30 -07:00
George Hotz	f6ef283e6a	s/loadops/metaops [run_process_replay] (#5421 )	2024-07-12 13:26:50 -07:00
nimlgen	f4944ced09	tiny amd cleanups (#5420 )	2024-07-12 22:54:42 +03:00
chenyu	b17e4adb3a	add `-c advice.detachedHead=false` to process replay git checkout (#5419 ) remove the noisy `Note: switching to 'origin/master'. You are in 'detached HEAD' state. You can look around, make experimental changes...` in log	2024-07-12 15:13:26 -04:00
wozeparrot	d1cbd6bb95	unity handcode_resnet_opt and handcode_bert_opt (#5418 )	2024-07-12 12:05:01 -07:00
chenyu	a0dbe20dbd	skip some redundant and slow tests in ci (#5416 )	2024-07-12 14:43:13 -04:00
chenyu	76125c07be	make some grouped_dim test work (#5415 ) next need to support max size per dim, splitting and correct way to do reverse or arbitrary permute global dims	2024-07-12 14:22:50 -04:00
wozeparrot	b7cc75a9df	usage summary in handcode opt (#5414 )	2024-07-12 11:21:18 -07:00
uuuvn	3cb94a0a15	Rename tinygrad/runtime/driver to support (#5413 )	2024-07-12 11:06:42 -07:00
nimlgen	6604d2b2c3	amd/nv respect visible devs (#5409 ) * nv/amd respect visible devices * linter * sort amd gpus * env docs	2024-07-12 20:02:12 +03:00
Roelof van Dijk	b18aa00bba	refactor: consolidate replace [run_process_replay] (#5403 )	2024-07-12 07:36:57 -07:00
chenyu	497274f663	add float64 to test_dtype_alu dtypes_float (#5410 ) * add float64 to test_dtype_alu dtypes_float * CUDACPU float64 crashes * real NV failed	2024-07-12 10:21:32 -04:00
qazal	31fcc516dc	more process replay tooling (#5407 ) * replays * what's in there * can it be up there * sha is enough * insert sha as the key * fix str * update reset utils * that nested try/except was terrible * github_context can go	2024-07-12 13:11:34 +03:00
Roelof van Dijk	6ec7dbc287	ci: parallelize uops tests (#5405 )	2024-07-12 11:22:41 +03:00
qazal	e22b377839	generalize FUSE_AS_ONE_KERNEL in the scheduler (#5397 ) * test: use const * hotfix: base * asserts * dont push through reshape * cleanup * dont need the cache * test_reduceop_reshape_dont_push and test_index_fused are next	2024-07-12 10:23:16 +03:00
chenyu	6e0a523078	repro slow resnet kernel with 4 global dims (#5402 ) * repro slow resnet kernel with 4 global dims * fix ruff	2024-07-11 23:31:15 -04:00
George Hotz	8390feb7b9	optim.OptimizerGroup in hlb_cifar (#5401 )	2024-07-11 20:14:36 -07:00
George Hotz	01fbd18209	metal compile fail	2024-07-11 19:27:05 -07:00
George Hotz	3a2b5a75d2	improve single kernel indexing (#5398 ) * improve single kernel indexing * metadata in graph (#5399) * indexing is O(1) * add failing test * ugh, that all needs to be replaced with symbolic * broken on ptx, it's fine --------- Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2024-07-11 19:00:57 -07:00
wozeparrot	c24d495ef9	metadata in handcode_opt (#5400 )	2024-07-11 17:45:34 -07:00
wozeparrot	c60838594c	metadata in graph (#5399 )	2024-07-11 17:02:12 -07:00
George Hotz	c2da4454cd	indexing getting better (#5389 ) * indexing getting better [run_process_replay] [no_assert] * fix test * test_arange_2_reduce is a simpler test * put that print back, NOOPT * don't merge reduces (they could be different reduces) * FUSE_AS_ONE_KERNEL * fix tests * fix test_var_multireduce * w/e put that there * fails on others too * fix test, revert UNMUL change * in case order matters * one kernel indexing works * one kernel indexing works (test other)	2024-07-11 16:41:51 -07:00
qazal	9712d9ffb6	pass lowering errors if not asserting process replay (#5395 ) * pass lowering errors if not asserting process replay * ProcessReplayError	2024-07-11 19:09:12 -04:00
wozeparrot	a02b38c0ac	download openimages by running it (#5396 )	2024-07-11 16:06:13 -07:00
qazal	0421f5d83e	hotfix: compare test_var_multireduce against numpy (#5394 )	2024-07-11 18:57:08 -04:00
qazal	b91a0ccdc3	make [run_process_replay] [no_assert] the default (#5390 )	2024-07-11 22:36:59 +03:00
George Hotz	e8191479a3	add bigint type for indexing [run_process_replay] (#5387 )	2024-07-11 11:37:10 -07:00
George Hotz	5232e405ce	hotfix: add BS to beautiful_mnist	2024-07-11 10:55:05 -07:00
George Hotz	3e40211e45	add UOP_IS_SYMBOLIC [run_process_replay] [no_assert] (#5386 ) * cleanup a few things in uops [run_process_replay] [no_assert] * add optional UOP_IS_SYMBOLIC	2024-07-11 10:48:45 -07:00
nimlgen	b3790b759b	nv cleanup gpfifo setup (#5382 ) * nv cleanup gpfifo setup * save lines	2024-07-11 17:50:52 +03:00
chenyu	416f838a1a	hotfix tqdm respects total=0 if set (#5380 ) if you insist total=0, it should use 0 instead of inferring from iterable. matched tqdm	2024-07-11 10:30:12 -04:00
nimlgen	2ba96d4c29	nv use mv_address (#5381 ) * nv use mv_address * unsued import	2024-07-11 16:45:03 +03:00
nimlgen	bd77efda2f	add HWCommandQueue base class for hcq devices (#5303 ) * add HWCommandQueue as base queue for hcq devices * try this * fixes * comments * linter * linetr2 * linter * linter * fixed * revert this	2024-07-11 16:19:13 +03:00
qazal	dc3ea78560	hotfix: faster UOps.END* insert [run_process_replay] (#5377 ) * is this faster * p2 * don't waste lines	2024-07-11 13:20:19 +03:00
qazal	004366b193	context aware process replay [run_process_replay] (#5378 ) * test tc as ctx var * remove from opts * process replay * pop variable * B -> Variable * fix re-assign * pop temp vars * move TRANSCENDENTAL=2	2024-07-11 13:07:28 +03:00
qazal	45e1b9d5e3	use TC options as ContextVars [run_process_replay] (#5379 ) * delete from renderer * move to ctx	2024-07-11 12:01:36 +03:00
qazal	289fd2e940	Lowerer cleanup 2 [run_process_replay] (#5376 ) * test outbufs delete * comments * valid is bool	2024-07-11 10:56:53 +03:00
qazal	9ca2d96b6b	delete extra check in DEFINE_ACC [run_process_replay] (#5375 )	2024-07-11 10:49:03 +03:00

1 2 3 4 5 ...

5055 Commits