tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
George Hotz	7f46bfa587	hotfix: docs touchup	2024-06-28 14:36:20 -07:00
nimlgen	c941a58581	amd refactor queue creation (#5216 ) * amd refactor queue creation * fixes * use data64_le * fix linter	2024-06-28 23:24:49 +03:00
chenyu	7ba4938510	simplify View.permute arg check [run_process_replay] (#5218 ) it checks if `axis` is a valid permutation, which is the same as `sorted(axis) == list(range(len(self.shape)))`	2024-06-28 16:18:46 -04:00
George Hotz	80ac21200b	hotfix: linearizer test fixup	2024-06-28 10:52:25 -07:00
George Hotz	c9714dfcf4	rename graph to children [run_process_replay] (#5215 )	2024-06-28 09:53:52 -07:00
kormann	6c456b6d66	remove uopgraph dedup + slight speedup (#5199 ) * rm dedup * rm dedup * tests * reduce diff * oups * reduce diff * rm UOp.tuple	2024-06-28 09:26:32 -07:00
nimlgen	9b08a9397c	amd inline bf16 funcs (#5212 )	2024-06-28 18:45:00 +03:00
chenyu	7090eac8cb	validate sdxl output and put it in benchmark (#5211 ) * validate sdxl output and put it in benchmark * don't print fetch progress_bar in CI	2024-06-28 11:40:52 -04:00
chenyu	63fa4e2a0e	fix seed = 0 in sdxl (#5209 ) removed a few unneeded realize and contiguous too	2024-06-28 08:48:59 -04:00
Tobias Fischer	4688f97d48	Add SDXL Inference to Examples (#5206 ) * added sdxl inference code * fixed trailing whitespace * use original impl code, removed uneeded numpy calls	2024-06-28 07:42:28 -04:00
qazal	3e56c8422c	remu err handling (#5208 ) * add error handling * use pre release * minor * works	2024-06-28 13:15:18 +03:00
nimlgen	7f7fa26e03	allow hugepage failure in memadvise (#5207 )	2024-06-28 11:41:10 +03:00
chenyu	73395b998b	better error msg for TinyJit inside TinyJit (#5202 ) it's possible to support TinyJit inside TinyJit, but there are edge cases like two TinyJit functions shared another TinyJit function. so just give a more precise error for now	2024-06-27 18:09:19 -04:00
nimlgen	ac748cccdb	nv apply relocs (#5165 ) * nv do reloc * a bit cleaner	2024-06-27 23:54:16 +03:00
Roelof van Dijk	540ebdf47c	missing init files (#5196 )	2024-06-27 15:30:02 -04:00
chenyu	d8dc43ad06	remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark (#5198 ) this no longer helps	2024-06-27 15:20:34 -04:00
George Hotz	345bcc2099	move graph_dedup out of class [run_process_replay] (#5197 )	2024-06-27 12:04:00 -07:00
George Hotz	d094a6828f	single pass rewrite (#5159 ) * single pass rewrite * claude cleanups * claude cleanups * skip those tests * restrict that to ints * comment * asserts i don't expect to fail do fail * simplest...rewrite...ever * simplest...rewrite...ever * add that rule back * tests pass? * only collapse reduce loops * second SHL/SHR arg must be 4 bytes * fix verify * no SHL/SHR in ptx * put that back * skip them in PTX...bad tests	2024-06-27 11:36:05 -07:00
Roelof van Dijk	1ff9bbaa61	ruff: close file handle (#5180 ) * close file handle * some more open file handles * must stay open * remove this close, stays open	2024-06-27 11:29:47 -07:00
chenyu	83da8b3558	use NV instead of CUDA in benchmark (#5192 ) also reenabled mixtral on green	2024-06-27 13:52:58 -04:00
chenyu	0c6c7c5f7b	CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark (#5191 ) ignoring beam cache but using compile cache should be fine, saved some benchmark time. also updated `beam_search` to check flag value before accessing diskcache	2024-06-27 13:15:18 -04:00
chenyu	c12de4f47d	benchmark use JITBEAM for llama and gpt2 (#5189 )	2024-06-27 12:56:02 -04:00
chenyu	ad91962dcf	CACHECOLLECTING -> CAPTURING and don't capture clear_l2 (#5190 ) fixed first time BEAM slowness	2024-06-27 12:32:28 -04:00
Roelof van Dijk	01e8838b65	ruff: suppressible-exception (#5182 ) * fix: use contextlib to suppress errors * enable rule SIM105 --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-06-27 08:23:44 -07:00
Roelof van Dijk	9704c7d4d4	ruff rule if-exp-instead-of-or-operator (FURB110) (#5178 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-27 08:22:19 -07:00
chenyu	5b8fda3c65	fix: JIT=0 means no JIT (#5188 )	2024-06-27 10:31:37 -04:00
qazal	3af17849bf	safely parse quoted titles [run_process_replay] (#5183 )	2024-06-27 16:39:48 +03:00
Roelof van Dijk	975b811ad9	names shadowing builtins (#5179 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-27 08:15:01 -04:00
Roelof van Dijk	26e254c42b	ruff: else-raise and else-return (#5175 ) * ruff: enable else-raise and else-return * ruff: add error names * fix order --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-27 07:54:59 -04:00
Roelof van Dijk	f88f71d73a	ruff: unnecessary-comprehension (#5174 ) * enable ruff C416 unnecessary-comprehension * already a list	2024-06-27 07:45:29 -04:00
reddyn12	f1c7944c44	Fix batchnorm shapes for resnet.load_pretrained (#5167 ) * Fix batchnorm shapes * make it general reshape	2024-06-26 18:44:10 -04:00
George Hotz	396ce6cfc9	clean up graph dedup function [run_process_replay] (#5169 )	2024-06-26 15:07:34 -07:00
kormann	3a04e518ec	print_tree UPat +fix (#5132 ) * fix and extend print_tree * typing * typing * fix upat * fix none * ws * rm prefix * mv luop dag * typo * test print_tree	2024-06-26 15:02:19 -07:00
chenyu	0ba093dea0	hotfix: only validate stable diffusion when using threefry (#5166 )	2024-06-26 16:50:38 -04:00
chenyu	e4a5870b36	validate stable_diffusion output (#5163 ) changed default steps, forgot to update validation	2024-06-26 16:42:21 -04:00
nimlgen	21b225ac45	llama3 download works (#5160 )	2024-06-26 22:45:13 +03:00
wozeparrot	c91b3c4079	shard llama3 on 0 sometimes (#5157 )	2024-06-26 11:50:57 -07:00
Roelof van Dijk	294bd1a9ff	refactor: name check [run_process_replay] (#5158 )	2024-06-26 11:39:41 -07:00
Roelof van Dijk	2c80583e14	perf: cache const UOp creation [run_process_replay] (#5156 )	2024-06-26 11:13:14 -07:00
George Hotz	eda2824cd8	freeze uop [run_process_replay] (#5155 )	2024-06-26 10:18:15 -07:00
Elias Wahl	e267f3161d	Add MLLogger (#5125 ) * add MLPerf logger * eval steps * start with step 1 * compliance for 3.1.0 and 4.0.0 * more compliance * assert, comment and contiguous	2024-06-26 12:23:56 -04:00
nimlgen	16405b973a	fix hcq sync (#5062 ) * fix hcq sync * rewrite * linter + comment * fix profiler * no default dict * correct sync of unjitted transfer * fix test	2024-06-26 17:50:37 +03:00
David Hou	3604642847	Llama shard axis 0 sometimes (#5123 ) * make buffer view optional with a flag [run_process_replay] * do not view when sharding to save memory [run_process_replay] * llama shard axis=0 sometimes --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-26 10:35:25 -04:00
nimlgen	fd27f19e92	graph tests (#5153 ) * graph tests * add test * cleanup	2024-06-26 16:31:20 +03:00
George Hotz	7b709c3ccd	switch tensorcoreoptions to tuple [run_process_replay] (#5143 ) * switch tensorcoreoptions to tuple [run_process_replay] * localbuffer can stay namedtuple for now * freeze LocalBuffer * remove NamedTuple --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-06-26 14:12:53 +03:00
qazal	6ca7b13ed1	limit pickled objects [run_process_replay] (#5154 ) * limit pickled objects * delete uop from the list * debug metal * need self.opts for TC * dont need device * [run_process_replay] * minor	2024-06-26 13:51:32 +03:00
George Hotz	ee4f080a14	rewrite div const [run_process_replay] [no_assert] (#5151 ) * rewrite div const [run_process_replay] [no_assert] * Update uops.py	2024-06-25 20:23:14 -07:00
David Hou	666a9c1448	don't view origin buffer when sharding (#5122 ) * make buffer view optional with a flag * do not view when sharding to save memory	2024-06-25 20:19:09 -07:00
George Hotz	89e106686a	simpler unmatch [run_process_replay] (#5149 )	2024-06-25 19:57:40 -07:00
George Hotz	c98ca23cb9	test pickle variable (#5150 ) * test pickle variable * fix process replay	2024-06-25 19:49:21 -07:00

1 2 3 4 5 ...

4886 Commits