tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 22:38:16 -05:00

Author	SHA1	Message	Date
chenyu	fc03fc025e	enable sin on METAL in test_dtype_alu (#5298 )	2024-07-05 14:52:09 -04:00
qazal	b369e75ed0	refactor schedule creation (#5297 )	2024-07-05 21:14:38 +03:00
qazal	5292d37db6	LoadOps.VIEW in the scheduler spec (#5296 ) * refactor to allow_buffer_view * tests * fix multi	2024-07-05 19:43:50 +03:00
hikettei	1ab7a4cff0	Handling Multiple UnaryOps.BITCAST in Function for Proper Kernel Fusion [run_process_replay] (#5172 ) * [Patch] added an option not to ignore view replacing when doing bitcast * added the testcase * [Add] reproduced bitcast cannot be fused into a single kernel in the unittest --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-07-05 19:16:44 +03:00
chenyu	43c3f73fbc	handcode_bert_opt.py (#5295 ) similar to handcode_resnet50_opt.py, one file to check bert kernels without dataset.	2024-07-05 11:01:20 -04:00
nimlgen	d7835a705c	hotfix: fix metal with vars (#5294 ) * hotfix: fix metal with vars * one more place	2024-07-05 16:53:40 +03:00
nimlgen	8a548b0b6e	metal support offset (#5293 )	2024-07-05 16:13:05 +03:00
qazal	1cefbb33ab	uop graph tests + type_verify cleanup (#5292 ) * test_cast_alu_fold * test_double_cast_fold + these should assert	2024-07-05 13:00:01 +03:00
qazal	341c4a29d1	hotfix: use dtype.scalar() for rendering cast [run_process_replay] [no_assert] (#5290 )	2024-07-05 11:29:35 +03:00
chenyu	87d27c45ec	minor _broadcast cleanup (#5286 ) `any(x==0 for x in y)` is `0 in y`. also `get_args(ConstType)` instead of hard coded `float, int, bool`	2024-07-04 14:25:24 -04:00
SnakeOnex	8c03816ae9	fix README example (#5284 ) * fixed README example * README test * changed py -> python markdown code flags in REAME	2024-07-04 11:15:07 -04:00
nimlgen	2778b6046c	new memory scheduler (#5278 ) * new memory schedule algo * works * fix * fix * linter * tiny fixes * do not optimize copy buffers * mpre comments * tiny cleanups	2024-07-04 18:06:04 +03:00
nimlgen	84b3e3bb6f	hcq exec no embedded signal (#5142 )	2024-07-04 13:29:21 +03:00
Tobias Fischer	0c3a35e5c2	Stable Diffusion v2 Inference (#5283 ) * model implementation * clip fix, more qol options	2024-07-03 22:47:10 -04:00
chenyu	e5ba385f03	remove first contiguous in multi from_sharded (#5121 ) second contiguous guarantees lbs are contiguous going into MultiLazyBuffer, don't need the first contiguous	2024-07-03 19:42:56 -04:00
chenyu	f1ff65e763	remove "no-nans-fp-math"="true" for LLVM (#5282 ) fixed isnan for llvm (still have issue with < nan)	2024-07-03 17:52:50 -04:00
chenyu	3929a9dc94	fix UOp.cmp_tuple for ALU (#5280 ) * fix UOp.cmp_tuple for ALU for ALU, use self.arg instead of self.op to compare * skip that?	2024-07-03 14:59:05 -04:00
qazal	a9d6a6c339	verify_lazyop with multi reduce (#5276 ) * outsource the assert to the implicit movement op check * tests	2024-07-03 20:15:42 +03:00
George Hotz	16e3b8b013	uops work from lowerer [run_process_replay] (#5279 )	2024-07-03 09:40:00 -07:00
chenyu	622b7bd556	simpler TinyJit inside TinyJit detection (#5219 ) * simpler TinyJit inside TinyJit detection suggested in `73395b998b (commitcomment-143660402)` * cannot repro... * clear the way out * finally clear	2024-07-03 12:28:53 -04:00
gip	04ef0fd328	fix: message when applegpu tools missiong (#5236 )	2024-07-03 09:07:09 -07:00
reddyn12	d3e244d8b7	prev speed improvements (#5252 ) Co-authored-by: reddyn <nikidsniper@gmail.com>	2024-07-03 09:06:01 -07:00
nimlgen	21d41f06a2	nv follows HCQCompatAllocRes protocol (#5275 ) * nv follows HCQCompatAllocRes protocol * fix amd	2024-07-03 11:34:10 +03:00
Vyacheslav Pachkov	d3e4e21759	add return type for HCQCompatAllocator _alloc (#5267 ) Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-07-03 10:25:44 +03:00
chenyu	191463a919	add timing to SDXL (#5273 )	2024-07-02 23:29:54 -04:00
chenyu	b2c3a28a5e	nn.RMSNorm (#5272 ) the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize	2024-07-02 21:39:01 -04:00
chenyu	9a2a82a77f	test stable diffusion unet in ci (#5268 ) unet is parameterized now so can test a smaller one is ci	2024-07-02 21:37:52 -04:00
chenyu	ce52b10f6f	add a flag DISABLE_LOOP_COLLAPSE (#5270 ) workaround if user encountered UNMUL error	2024-07-02 20:01:11 -04:00
George Hotz	e53b164e1a	small changes from lowerer (#5266 )	2024-07-02 15:03:54 -07:00
nimlgen	7be776f9af	add _alloc_signal/_free_signal to hcq (#5264 ) * add _alloc_signal/_free_signal api * oops, revert this * linter	2024-07-02 23:35:39 +03:00
Tobias Fischer	9a25ee0b9a	pixed unet call params (#5262 )	2024-07-02 12:40:27 -04:00
qazal	59bc837ad1	refactor gated load rendering [run_process_replay] (#5259 ) * refactor gated load rendering [run_process_replay] * hotfix: extra line * remove llvm diff	2024-07-02 15:13:10 +03:00
nimlgen	e050603b4b	nv close fds after mapping (#5246 )	2024-07-02 13:57:46 +03:00
qazal	d3cfb6c2e3	refactor UOps.LOAD barrier [run_process_replay] (#5258 )	2024-07-02 13:48:47 +03:00
qazal	a1044e6063	iterate over scoped uops once [run_process_replay] (#5255 )	2024-07-02 09:21:09 +03:00
wozeparrot	dfbee4f0f5	feat: add blobfile to testing (#5254 )	2024-07-01 19:33:58 -07:00
Tobias Fischer	8c9c1cf62f	Pulled CLIP and UNet into Seperate Files (#5253 ) * pulled clip and unet into seperate files * reference cleanup, lru cache fix * better pool indexing	2024-07-01 22:33:01 -04:00
chenyu	5808c37302	hotfix disable flaky llama3 beam benchmark on green (#5249 )	2024-07-01 15:00:47 -04:00
chenyu	b9122ecdaf	revert stable diffusion validation with threefry (#5248 ) * Revert "use threefry in stable diffusion benchmark (#4988)" This reverts commit `44dfa37c70`. * sdxl and validation fix * relax threshold	2024-07-01 14:43:47 -04:00
nimlgen	57e89645cd	hcq spec test (#5226 ) * start hcq spec test * more test * fixes * run on amd as well * test amdgpu exec * fix amd * amd mockgpu support sdma timestamp	2024-07-01 17:36:37 +03:00
Carson Powers	d7839fdc5f	Add x!=0 -> (bool)x pattern [run_process_replay] [no_assert] (#5237 ) * x!=0 -> (bool)x pattern * bool != bool pattern * redundant upat	2024-06-30 17:48:45 -07:00
George Hotz	14980f79dd	hotfix: unbreak llama	2024-06-30 15:27:54 -07:00
George Hotz	146eb3a811	hotfix: add repeat_interleave docs	2024-06-30 15:25:18 -07:00
George Hotz	3df47bc21e	OpenELM + repeat_interleave (#5234 ) * start writing openelm * progress...hit bug * repeat_interleave support * gqa * add rotary embedding * spp * i think it runs correctly * broken * output is good now * cleanups * no io_uring on android	2024-06-30 15:18:39 -07:00
nimlgen	7b7b751513	simple hip backend for debugging (#5201 ) * hip backend * fix mypy * shorter * fixes * tiny changes	2024-06-30 23:00:11 +03:00
chenyu	88763eb9ff	fix stable_diffusion with fp16 (#5239 )	2024-06-30 12:59:31 -04:00
chenyu	649641a2f2	fix tqdm with generator without `__len__` (#5238 ) it should be treated as total = 0 (just show iteration count). also removed duplicated ": " in fetch and fixed unit scale with total = 0	2024-06-30 12:20:59 -04:00
chenyu	fd53b6d901	tqdm supports fractional blocks (#5233 ) enabled progress bar match in test, it matched perfectly now	2024-06-29 22:30:18 -04:00
chenyu	ae10ae4722	simplify tqdm scale math (#5231 ) expand the log of log stuff	2024-06-29 21:17:40 -04:00
hikettei	ad1ca7da64	[Feature] Added BinaryOps.AND/BinaryOps.OR (#5223 ) * [Feature] Added BinaryOps.AND/BinaryOps.OR * Add: __rand__, __ror__	2024-06-29 17:20:25 -07:00

1 2 3 4 5 ...

4946 Commits