tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
hikettei	ad1ca7da64	[Feature] Added BinaryOps.AND/BinaryOps.OR (#5223 ) * [Feature] Added BinaryOps.AND/BinaryOps.OR * Add: __rand__, __ror__	2024-06-29 17:20:25 -07:00
chenyu	50b05dd3f4	tqdm minor cleanup (#5229 ) combined some if branches	2024-06-29 18:58:24 -04:00
chenyu	b2ea610df8	fix tqdm unit_scale and support hours in time (#5227 ) * fix tqdm unit_scale and support hours in time previously it only supports MM:SS. more chars to unitscales, strip trailing "." and " " in formatting, and more tests * simpler	2024-06-29 14:48:51 -04:00
qazal	f374fb77af	assert bool dtype for valid [run_process_replay] (#5214 ) * valid is always bool * prevent NumNode to begin with * part 2 * test: disable pattern matchers, asserts should pass * test: store without cast * test: if (0) * cleanup time * only pattern match bool literal * better for upstream debug	2024-06-29 21:20:32 +03:00
qazal	3f4eeb8b54	late UOps.IF generation [run_process_replay] [no_assert] (#5027 ) * find all places * test gates * test * gate based on depths * add ctx * that cache was so wrong * delete useless things * dont double write if * self.if_cond * move UOps.IF to gated store * test_padto_where_multioutput * test_padto_group * minor cleanup * hmm this actually works? * need a good barrier * merge 2 * delete ctx * p1 * maybe p2 * p3 * minor fixup * fixup 2 * smart thing from the Lowerer branch * refactoring * refactoring 2 * maybe before graph_rewrite * slightly more acceptable Linearizer diff * more correct * [run_process_replay] [no_assert]	2024-06-29 12:22:14 -04:00
chenyu	42d1f92fc1	simpler tqdm (#5221 ) can do more, but many cases are not tested	2024-06-29 07:41:46 -04:00
nimlgen	dd7eef7d71	libc defs to autogen (#5217 ) * libc defs to autogen * amd import libc * linter * better a bit * remove comment, check this * not hardcoded path	2024-06-29 14:37:33 +03:00
nimlgen	6b08cb5e38	ptx runs on nv in benchmarks (#5224 )	2024-06-29 11:06:44 +03:00
nimlgen	b4c49ae3fa	remove cudacpu in favour of mockgpu (#5225 ) * remove cudacpu in favour of mockgpu * remove unused import * not used as well	2024-06-29 11:05:16 +03:00
nimlgen	ee02dcb98e	nv supports PTX=1 (#5222 ) * nv supports PTX=1 * not needed * split nv compiler into nvrtc autogen * remove to_c_array * test * Revert "test" This reverts commit `f0b56f308b`.	2024-06-29 10:46:29 +03:00
wozeparrot	7bcb74ab23	feat: tag 0.9.1 (#5220 ) v0.9.1	2024-06-28 20:16:14 -07:00
George Hotz	7f46bfa587	hotfix: docs touchup	2024-06-28 14:36:20 -07:00
nimlgen	c941a58581	amd refactor queue creation (#5216 ) * amd refactor queue creation * fixes * use data64_le * fix linter	2024-06-28 23:24:49 +03:00
chenyu	7ba4938510	simplify View.permute arg check [run_process_replay] (#5218 ) it checks if `axis` is a valid permutation, which is the same as `sorted(axis) == list(range(len(self.shape)))`	2024-06-28 16:18:46 -04:00
George Hotz	80ac21200b	hotfix: linearizer test fixup	2024-06-28 10:52:25 -07:00
George Hotz	c9714dfcf4	rename graph to children [run_process_replay] (#5215 )	2024-06-28 09:53:52 -07:00
kormann	6c456b6d66	remove uopgraph dedup + slight speedup (#5199 ) * rm dedup * rm dedup * tests * reduce diff * oups * reduce diff * rm UOp.tuple	2024-06-28 09:26:32 -07:00
nimlgen	9b08a9397c	amd inline bf16 funcs (#5212 )	2024-06-28 18:45:00 +03:00
chenyu	7090eac8cb	validate sdxl output and put it in benchmark (#5211 ) * validate sdxl output and put it in benchmark * don't print fetch progress_bar in CI	2024-06-28 11:40:52 -04:00
chenyu	63fa4e2a0e	fix seed = 0 in sdxl (#5209 ) removed a few unneeded realize and contiguous too	2024-06-28 08:48:59 -04:00
Tobias Fischer	4688f97d48	Add SDXL Inference to Examples (#5206 ) * added sdxl inference code * fixed trailing whitespace * use original impl code, removed uneeded numpy calls	2024-06-28 07:42:28 -04:00
qazal	3e56c8422c	remu err handling (#5208 ) * add error handling * use pre release * minor * works	2024-06-28 13:15:18 +03:00
nimlgen	7f7fa26e03	allow hugepage failure in memadvise (#5207 )	2024-06-28 11:41:10 +03:00
chenyu	73395b998b	better error msg for TinyJit inside TinyJit (#5202 ) it's possible to support TinyJit inside TinyJit, but there are edge cases like two TinyJit functions shared another TinyJit function. so just give a more precise error for now	2024-06-27 18:09:19 -04:00
nimlgen	ac748cccdb	nv apply relocs (#5165 ) * nv do reloc * a bit cleaner	2024-06-27 23:54:16 +03:00
Roelof van Dijk	540ebdf47c	missing init files (#5196 )	2024-06-27 15:30:02 -04:00
chenyu	d8dc43ad06	remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark (#5198 ) this no longer helps	2024-06-27 15:20:34 -04:00
George Hotz	345bcc2099	move graph_dedup out of class [run_process_replay] (#5197 )	2024-06-27 12:04:00 -07:00
George Hotz	d094a6828f	single pass rewrite (#5159 ) * single pass rewrite * claude cleanups * claude cleanups * skip those tests * restrict that to ints * comment * asserts i don't expect to fail do fail * simplest...rewrite...ever * simplest...rewrite...ever * add that rule back * tests pass? * only collapse reduce loops * second SHL/SHR arg must be 4 bytes * fix verify * no SHL/SHR in ptx * put that back * skip them in PTX...bad tests	2024-06-27 11:36:05 -07:00
Roelof van Dijk	1ff9bbaa61	ruff: close file handle (#5180 ) * close file handle * some more open file handles * must stay open * remove this close, stays open	2024-06-27 11:29:47 -07:00
chenyu	83da8b3558	use NV instead of CUDA in benchmark (#5192 ) also reenabled mixtral on green	2024-06-27 13:52:58 -04:00
chenyu	0c6c7c5f7b	CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark (#5191 ) ignoring beam cache but using compile cache should be fine, saved some benchmark time. also updated `beam_search` to check flag value before accessing diskcache	2024-06-27 13:15:18 -04:00
chenyu	c12de4f47d	benchmark use JITBEAM for llama and gpt2 (#5189 )	2024-06-27 12:56:02 -04:00
chenyu	ad91962dcf	CACHECOLLECTING -> CAPTURING and don't capture clear_l2 (#5190 ) fixed first time BEAM slowness	2024-06-27 12:32:28 -04:00
Roelof van Dijk	01e8838b65	ruff: suppressible-exception (#5182 ) * fix: use contextlib to suppress errors * enable rule SIM105 --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-06-27 08:23:44 -07:00
Roelof van Dijk	9704c7d4d4	ruff rule if-exp-instead-of-or-operator (FURB110) (#5178 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-27 08:22:19 -07:00
chenyu	5b8fda3c65	fix: JIT=0 means no JIT (#5188 )	2024-06-27 10:31:37 -04:00
qazal	3af17849bf	safely parse quoted titles [run_process_replay] (#5183 )	2024-06-27 16:39:48 +03:00
Roelof van Dijk	975b811ad9	names shadowing builtins (#5179 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-27 08:15:01 -04:00
Roelof van Dijk	26e254c42b	ruff: else-raise and else-return (#5175 ) * ruff: enable else-raise and else-return * ruff: add error names * fix order --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-27 07:54:59 -04:00
Roelof van Dijk	f88f71d73a	ruff: unnecessary-comprehension (#5174 ) * enable ruff C416 unnecessary-comprehension * already a list	2024-06-27 07:45:29 -04:00
reddyn12	f1c7944c44	Fix batchnorm shapes for resnet.load_pretrained (#5167 ) * Fix batchnorm shapes * make it general reshape	2024-06-26 18:44:10 -04:00
George Hotz	396ce6cfc9	clean up graph dedup function [run_process_replay] (#5169 )	2024-06-26 15:07:34 -07:00
kormann	3a04e518ec	print_tree UPat +fix (#5132 ) * fix and extend print_tree * typing * typing * fix upat * fix none * ws * rm prefix * mv luop dag * typo * test print_tree	2024-06-26 15:02:19 -07:00
chenyu	0ba093dea0	hotfix: only validate stable diffusion when using threefry (#5166 )	2024-06-26 16:50:38 -04:00
chenyu	e4a5870b36	validate stable_diffusion output (#5163 ) changed default steps, forgot to update validation	2024-06-26 16:42:21 -04:00
nimlgen	21b225ac45	llama3 download works (#5160 )	2024-06-26 22:45:13 +03:00
wozeparrot	c91b3c4079	shard llama3 on 0 sometimes (#5157 )	2024-06-26 11:50:57 -07:00
Roelof van Dijk	294bd1a9ff	refactor: name check [run_process_replay] (#5158 )	2024-06-26 11:39:41 -07:00
Roelof van Dijk	2c80583e14	perf: cache const UOp creation [run_process_replay] (#5156 )	2024-06-26 11:13:14 -07:00

1 2 3 4 5 ...

4897 Commits