tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 23:08:06 -05:00

Author	SHA1	Message	Date
Vyacheslav Pachkov	d3e4e21759	add return type for HCQCompatAllocator _alloc (#5267 ) Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-07-03 10:25:44 +03:00
chenyu	191463a919	add timing to SDXL (#5273 )	2024-07-02 23:29:54 -04:00
chenyu	b2c3a28a5e	nn.RMSNorm (#5272 ) the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize	2024-07-02 21:39:01 -04:00
chenyu	9a2a82a77f	test stable diffusion unet in ci (#5268 ) unet is parameterized now so can test a smaller one is ci	2024-07-02 21:37:52 -04:00
chenyu	ce52b10f6f	add a flag DISABLE_LOOP_COLLAPSE (#5270 ) workaround if user encountered UNMUL error	2024-07-02 20:01:11 -04:00
George Hotz	e53b164e1a	small changes from lowerer (#5266 )	2024-07-02 15:03:54 -07:00
nimlgen	7be776f9af	add _alloc_signal/_free_signal to hcq (#5264 ) * add _alloc_signal/_free_signal api * oops, revert this * linter	2024-07-02 23:35:39 +03:00
Tobias Fischer	9a25ee0b9a	pixed unet call params (#5262 )	2024-07-02 12:40:27 -04:00
qazal	59bc837ad1	refactor gated load rendering [run_process_replay] (#5259 ) * refactor gated load rendering [run_process_replay] * hotfix: extra line * remove llvm diff	2024-07-02 15:13:10 +03:00
nimlgen	e050603b4b	nv close fds after mapping (#5246 )	2024-07-02 13:57:46 +03:00
qazal	d3cfb6c2e3	refactor UOps.LOAD barrier [run_process_replay] (#5258 )	2024-07-02 13:48:47 +03:00
qazal	a1044e6063	iterate over scoped uops once [run_process_replay] (#5255 )	2024-07-02 09:21:09 +03:00
wozeparrot	dfbee4f0f5	feat: add blobfile to testing (#5254 )	2024-07-01 19:33:58 -07:00
Tobias Fischer	8c9c1cf62f	Pulled CLIP and UNet into Seperate Files (#5253 ) * pulled clip and unet into seperate files * reference cleanup, lru cache fix * better pool indexing	2024-07-01 22:33:01 -04:00
chenyu	5808c37302	hotfix disable flaky llama3 beam benchmark on green (#5249 )	2024-07-01 15:00:47 -04:00
chenyu	b9122ecdaf	revert stable diffusion validation with threefry (#5248 ) * Revert "use threefry in stable diffusion benchmark (#4988)" This reverts commit `44dfa37c70`. * sdxl and validation fix * relax threshold	2024-07-01 14:43:47 -04:00
nimlgen	57e89645cd	hcq spec test (#5226 ) * start hcq spec test * more test * fixes * run on amd as well * test amdgpu exec * fix amd * amd mockgpu support sdma timestamp	2024-07-01 17:36:37 +03:00
Carson Powers	d7839fdc5f	Add x!=0 -> (bool)x pattern [run_process_replay] [no_assert] (#5237 ) * x!=0 -> (bool)x pattern * bool != bool pattern * redundant upat	2024-06-30 17:48:45 -07:00
George Hotz	14980f79dd	hotfix: unbreak llama	2024-06-30 15:27:54 -07:00
George Hotz	146eb3a811	hotfix: add repeat_interleave docs	2024-06-30 15:25:18 -07:00
George Hotz	3df47bc21e	OpenELM + repeat_interleave (#5234 ) * start writing openelm * progress...hit bug * repeat_interleave support * gqa * add rotary embedding * spp * i think it runs correctly * broken * output is good now * cleanups * no io_uring on android	2024-06-30 15:18:39 -07:00
nimlgen	7b7b751513	simple hip backend for debugging (#5201 ) * hip backend * fix mypy * shorter * fixes * tiny changes	2024-06-30 23:00:11 +03:00
chenyu	88763eb9ff	fix stable_diffusion with fp16 (#5239 )	2024-06-30 12:59:31 -04:00
chenyu	649641a2f2	fix tqdm with generator without `__len__` (#5238 ) it should be treated as total = 0 (just show iteration count). also removed duplicated ": " in fetch and fixed unit scale with total = 0	2024-06-30 12:20:59 -04:00
chenyu	fd53b6d901	tqdm supports fractional blocks (#5233 ) enabled progress bar match in test, it matched perfectly now	2024-06-29 22:30:18 -04:00
chenyu	ae10ae4722	simplify tqdm scale math (#5231 ) expand the log of log stuff	2024-06-29 21:17:40 -04:00
hikettei	ad1ca7da64	[Feature] Added BinaryOps.AND/BinaryOps.OR (#5223 ) * [Feature] Added BinaryOps.AND/BinaryOps.OR * Add: __rand__, __ror__	2024-06-29 17:20:25 -07:00
chenyu	50b05dd3f4	tqdm minor cleanup (#5229 ) combined some if branches	2024-06-29 18:58:24 -04:00
chenyu	b2ea610df8	fix tqdm unit_scale and support hours in time (#5227 ) * fix tqdm unit_scale and support hours in time previously it only supports MM:SS. more chars to unitscales, strip trailing "." and " " in formatting, and more tests * simpler	2024-06-29 14:48:51 -04:00
qazal	f374fb77af	assert bool dtype for valid [run_process_replay] (#5214 ) * valid is always bool * prevent NumNode to begin with * part 2 * test: disable pattern matchers, asserts should pass * test: store without cast * test: if (0) * cleanup time * only pattern match bool literal * better for upstream debug	2024-06-29 21:20:32 +03:00
qazal	3f4eeb8b54	late UOps.IF generation [run_process_replay] [no_assert] (#5027 ) * find all places * test gates * test * gate based on depths * add ctx * that cache was so wrong * delete useless things * dont double write if * self.if_cond * move UOps.IF to gated store * test_padto_where_multioutput * test_padto_group * minor cleanup * hmm this actually works? * need a good barrier * merge 2 * delete ctx * p1 * maybe p2 * p3 * minor fixup * fixup 2 * smart thing from the Lowerer branch * refactoring * refactoring 2 * maybe before graph_rewrite * slightly more acceptable Linearizer diff * more correct * [run_process_replay] [no_assert]	2024-06-29 12:22:14 -04:00
chenyu	42d1f92fc1	simpler tqdm (#5221 ) can do more, but many cases are not tested	2024-06-29 07:41:46 -04:00
nimlgen	dd7eef7d71	libc defs to autogen (#5217 ) * libc defs to autogen * amd import libc * linter * better a bit * remove comment, check this * not hardcoded path	2024-06-29 14:37:33 +03:00
nimlgen	6b08cb5e38	ptx runs on nv in benchmarks (#5224 )	2024-06-29 11:06:44 +03:00
nimlgen	b4c49ae3fa	remove cudacpu in favour of mockgpu (#5225 ) * remove cudacpu in favour of mockgpu * remove unused import * not used as well	2024-06-29 11:05:16 +03:00
nimlgen	ee02dcb98e	nv supports PTX=1 (#5222 ) * nv supports PTX=1 * not needed * split nv compiler into nvrtc autogen * remove to_c_array * test * Revert "test" This reverts commit `f0b56f308b`.	2024-06-29 10:46:29 +03:00
wozeparrot	7bcb74ab23	feat: tag 0.9.1 (#5220 ) v0.9.1	2024-06-28 20:16:14 -07:00
George Hotz	7f46bfa587	hotfix: docs touchup	2024-06-28 14:36:20 -07:00
nimlgen	c941a58581	amd refactor queue creation (#5216 ) * amd refactor queue creation * fixes * use data64_le * fix linter	2024-06-28 23:24:49 +03:00
chenyu	7ba4938510	simplify View.permute arg check [run_process_replay] (#5218 ) it checks if `axis` is a valid permutation, which is the same as `sorted(axis) == list(range(len(self.shape)))`	2024-06-28 16:18:46 -04:00
George Hotz	80ac21200b	hotfix: linearizer test fixup	2024-06-28 10:52:25 -07:00
George Hotz	c9714dfcf4	rename graph to children [run_process_replay] (#5215 )	2024-06-28 09:53:52 -07:00
kormann	6c456b6d66	remove uopgraph dedup + slight speedup (#5199 ) * rm dedup * rm dedup * tests * reduce diff * oups * reduce diff * rm UOp.tuple	2024-06-28 09:26:32 -07:00
nimlgen	9b08a9397c	amd inline bf16 funcs (#5212 )	2024-06-28 18:45:00 +03:00
chenyu	7090eac8cb	validate sdxl output and put it in benchmark (#5211 ) * validate sdxl output and put it in benchmark * don't print fetch progress_bar in CI	2024-06-28 11:40:52 -04:00
chenyu	63fa4e2a0e	fix seed = 0 in sdxl (#5209 ) removed a few unneeded realize and contiguous too	2024-06-28 08:48:59 -04:00
Tobias Fischer	4688f97d48	Add SDXL Inference to Examples (#5206 ) * added sdxl inference code * fixed trailing whitespace * use original impl code, removed uneeded numpy calls	2024-06-28 07:42:28 -04:00
qazal	3e56c8422c	remu err handling (#5208 ) * add error handling * use pre release * minor * works	2024-06-28 13:15:18 +03:00
nimlgen	7f7fa26e03	allow hugepage failure in memadvise (#5207 )	2024-06-28 11:41:10 +03:00
chenyu	73395b998b	better error msg for TinyJit inside TinyJit (#5202 ) it's possible to support TinyJit inside TinyJit, but there are edge cases like two TinyJit functions shared another TinyJit function. so just give a more precise error for now	2024-06-27 18:09:19 -04:00

1 2 3 4 5 ...

4923 Commits