tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 14:28:09 -05:00

Author	SHA1	Message	Date
Francis Lam	2d53abb04a	test/external/fuzz_linearizer: fix for new AST changes (#5519 ) * test/external/fuzz_linearizer: fix for new AST changes also add beautiful_mnist failures * add CLANG and LLVM to test_failure_35 failed_platforms * fix test_linearizer_failure names	2024-07-17 00:08:07 -04:00
Edward Wang	9a7d5a148e	move colorize_float to helpers.py (#5490 ) * add colorize_float to helpers.py * update references	2024-07-15 11:29:03 -07:00
qazal	ae4cb7994e	run process replay with DEBUG=0 (#5491 ) * process replay with DEBUG=0 * graceful shutdown * use and	2024-07-15 16:30:57 +03:00
qazal	3c378efcb6	process replay docs improvements (#5481 ) * minor cleanups * docs and logs * shorter * comma * s/print/logging.info [run_process_replay] * use logging.warn * process name is noise * revert lowerer change [run_process_replay]	2024-07-15 00:09:28 +03:00
qazal	671779f280	limit process replay diff to ~20% of kernels (#5480 ) * render lidx starting with 0 changed from ``` int gidx0 = gid.x; /* 4096 / int lidx4 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx5 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx6 = lid.z; / 2 / ``` to ``` int gidx0 = gid.x; / 4096 / int lidx0 = lid.x; / 8 / int gidx1 = gid.y; / 7 / int lidx1 = lid.y; / 8 / int gidx2 = gid.z; / 7 / int lidx2 = lid.z; / 2 / ``` the existing one started from pre-limited global dims which skip number if there are more than 3 global dims don't need start_dim * add changed * env var * more early exit * simpler? * Revert "Merge branch 'lidx0' into process_replay_limit" This reverts commit `cbadcfa5e9`, reversing changes made to `fc9bf37ee7`. * minor cleanup --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-14 23:10:08 +03:00
qazal	0b3a34e3b1	vectorize folding [run_process_replay] (#5470 ) * test_gep_vec_fold * remove that * fix process replay * lint	2024-07-14 09:41:48 +03:00
chenyu	28972418c4	s/get_linearizer/get_kernel [run_process_replay] (#5467 )	2024-07-13 20:32:22 -04:00
Francis Lata	0345577032	UNet3D dataloader shared memory fix (#5465 ) * create separate SharedMemory between inputs and labels * update path check for shared mem * clean up unit test for dataset	2024-07-13 20:26:00 -04:00
qazal	487ceff825	hotfix: ASSERT_PROCESS_REPLAY sometimes doesn't exist (#5456 )	2024-07-13 21:15:40 +03:00
qazal	40ec9410f9	simpler process replay (#5452 ) * remove check_process_replay * that can go to the top * add assert back * [run_process_replay] * checkout code [run_process_replay] * temp [run_process_replay] * revert temp [run_process_replay] * ahh this is why [run_process_replay] * revert temp [run_process_replay]	2024-07-13 19:55:06 +03:00
qazal	23b907efbb	restore process replay runs by their id (#5453 )	2024-07-13 19:32:34 +03:00
qazal	bb1a9ebf78	run process replay in parallel (#5443 )	2024-07-13 11:29:36 +03:00
George Hotz	fb3011ac61	improve matcher speed [run_process_replay] (#5438 ) * improve matcher speed [run_process_replay] * don't use arg set in ptx	2024-07-12 20:02:19 -07:00
George Hotz	03c2dc8bd7	lowerer is kernel [run_process_replay] (#5437 )	2024-07-12 18:50:55 -07:00
wozeparrot	b80fd7d23c	allow benchmarking forward only (#5436 )	2024-07-12 17:37:49 -07:00
George Hotz	870dc8c350	s/Linearizer/Lowerer [run_process_replay] (#5428 )	2024-07-12 15:54:07 -07:00
George Hotz	6707c778d0	scheduleitem is not Tuple [run_process_replay] (#5425 ) * scheduleitem is not Tuple [run_process_replay] * fix tests * fix op + fuzzers * fix mop test	2024-07-12 15:13:19 -07:00
George Hotz	94599c0637	fixup ast in kernel to be MetaOps.SINK [run_process_replay] (#5424 ) * fixup ast in kernel to be MetaOps.SINK [run_process_replay] * fix tests * fix more tests	2024-07-12 14:01:03 -07:00
George Hotz	f6ef283e6a	s/loadops/metaops [run_process_replay] (#5421 )	2024-07-12 13:26:50 -07:00
uuuvn	3cb94a0a15	Rename tinygrad/runtime/driver to support (#5413 )	2024-07-12 11:06:42 -07:00
qazal	31fcc516dc	more process replay tooling (#5407 ) * replays * what's in there * can it be up there * sha is enough * insert sha as the key * fix str * update reset utils * that nested try/except was terrible * github_context can go	2024-07-12 13:11:34 +03:00
chenyu	6e0a523078	repro slow resnet kernel with 4 global dims (#5402 ) * repro slow resnet kernel with 4 global dims * fix ruff	2024-07-11 23:31:15 -04:00
George Hotz	01fbd18209	metal compile fail	2024-07-11 19:27:05 -07:00
qazal	9712d9ffb6	pass lowering errors if not asserting process replay (#5395 ) * pass lowering errors if not asserting process replay * ProcessReplayError	2024-07-11 19:09:12 -04:00
qazal	004366b193	context aware process replay [run_process_replay] (#5378 ) * test tc as ctx var * remove from opts * process replay * pop variable * B -> Variable * fix re-assign * pop temp vars * move TRANSCENDENTAL=2	2024-07-11 13:07:28 +03:00
George Hotz	d13654a820	move uopgraph to file [run_process_replay] (#5364 ) * move uopgraph to file [run_process_replay] * fix print tree test	2024-07-10 17:34:50 -07:00
Elias Wahl	097268fab3	Add layerwise performance bench for bert (#5349 ) * add bert bench * dont disable by defauöt * remove lr * linter	2024-07-09 15:03:25 -04:00
qazal	bee96a19ff	fuzz uop schedules (#5345 ) * basic blocks + cleanups * fixups * elif is better for future me * fuzz_schedule_max_paths * fix linter	2024-07-09 15:24:56 +03:00
qazal	d813617742	prescheduling refactor (#5300 ) * p1 * refactor tuple	2024-07-06 12:04:03 +03:00
qazal	b369e75ed0	refactor schedule creation (#5297 )	2024-07-05 21:14:38 +03:00
chenyu	f1ff65e763	remove "no-nans-fp-math"="true" for LLVM (#5282 ) fixed isnan for llvm (still have issue with < nan)	2024-07-03 17:52:50 -04:00
nimlgen	7be776f9af	add _alloc_signal/_free_signal to hcq (#5264 ) * add _alloc_signal/_free_signal api * oops, revert this * linter	2024-07-02 23:35:39 +03:00
Tobias Fischer	8c9c1cf62f	Pulled CLIP and UNet into Seperate Files (#5253 ) * pulled clip and unet into seperate files * reference cleanup, lru cache fix * better pool indexing	2024-07-01 22:33:01 -04:00
Roelof van Dijk	975b811ad9	names shadowing builtins (#5179 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-27 08:15:01 -04:00
Roelof van Dijk	f88f71d73a	ruff: unnecessary-comprehension (#5174 ) * enable ruff C416 unnecessary-comprehension * already a list	2024-06-27 07:45:29 -04:00
qazal	6ca7b13ed1	limit pickled objects [run_process_replay] (#5154 ) * limit pickled objects * delete uop from the list * debug metal * need self.opts for TC * dont need device * [run_process_replay] * minor	2024-06-26 13:51:32 +03:00
George Hotz	63ba2d05d1	uops dfs cleanup (#5147 ) * uops dfs cleanup * Update uops.py	2024-06-25 18:51:42 -07:00
chenyu	e356807696	tinytqdm.set_description and tinytrange (#5101 )	2024-06-22 14:45:06 -04:00
chenyu	8080298739	s/tinytqdm/tqdm (#5103 ) except in unit test where tqdm is imported	2024-06-22 14:18:26 -04:00
nimlgen	f1e758bacb	graph fuzzer (#5082 ) * graph fuzzer * more options * mypy * no underscores for funcs	2024-06-21 18:47:23 +03:00
qazal	8aa786232d	docs for running process replay locally (#5083 )	2024-06-21 09:55:08 -04:00
George Hotz	6f6b3b10c9	import from uops, not linearizer (#5064 )	2024-06-20 08:08:44 -07:00
qazal	ee01e464e3	use process replay as a diff creator (#4903 ) * add no_assert option [run_process_replay] [no_assert] * test [run_process_replay] [no_assert] * [run_process_replay] * back to normal [run_process_replay] * remove the log	2024-06-19 18:17:31 +03:00
chenyu	a3ed4176c8	use tinytqdm in active tests and examples (#5038 ) * use tinytqdm in active tests and examples stress test this before 0.9.1 * no set_description	2024-06-18 16:01:19 -04:00
kormann	7c3b877216	rename uop [run_process_replay] (#5031 ) * rename * fix unittests * rename vin * fix test * fix type [run_process_replay] * rm pre commit hook change	2024-06-18 21:34:05 +03:00
nimlgen	794acefbf3	hcq update waits and signals in place (#4984 ) * hcq update waits and signals in place * start amd * amd works * prettier * test * normal messages * linetr * linter 2	2024-06-17 17:19:07 +03:00
qazal	71aad183fd	check Program from HEAD [run_process_replay] (#4996 ) * use the same prg [run_process_replay] * put var back	2024-06-16 20:12:30 +03:00
chenyu	67e8df4969	remove numpy from dtype (#4969 ) replaced all dtype.np with _to_np_dtype defined in tensor.py. after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer	2024-06-14 15:38:45 -04:00
George Hotz	14189bca68	graph_dedup function [run_process_replay] (#4955 )	2024-06-14 04:24:37 -07:00
George Hotz	63a8add2c2	move uops add logic to linearize (#4952 ) * move logic to linearize * idk how this should work * empty	2024-06-14 03:52:37 -07:00

1 2 3 4 5 ...

411 Commits