tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
George Hotz	6972a2569f	Linearizer -> Lowerer (#4957 ) * st to uops function * lowerer * uops reduce * uops reduce * acc_number correct * reduce unroll * complete unroll * do upcasts * handle multioutput * define_accs * fix valid * get grouped dims * revert lin * minor * fixup_ast * group for reduce * group works now * all forwards pass * all ops tests pass * fix clang * mypy * lil cleanups, no image yet * ugh, variables everywhere * bugfix * counters and name fix * use symbolic, not uops * cleanups * Fix tests * linearizer tests * expands * float4 expand load * tests pass * woooo, float4 test * test ops works again * one more lin test * more lin tests * bypass * fix tests * something like this * const in defineacc * uops get_reduce_acc * move around * allow consts in the LOAD/STORE * each axis should only appear once, 21 failures * 16 failures * fix some image * optional float4 * onnx tests * gate the stores * add reorder * fix terrible skip function * tc work * opt add/mul merge * fix float4 tests * tiny tweak, 9 failing * 7 test failures * start tc, but i don't think this will work * progress on tensorcores * note * fix ops tests * closer on tc * weeee...one tensor core works * still works, more generic * large WMMA works * tc test passes * use WMMA as accumulator * basic tc tests passing * small gemm padded works * 4 failures * 3 tests failing * super barrier * now two tests failing * one test failing * cleanpus, add reduce to UopGraph * remove the linearizer * remove unused * lil cleanups * Lowerer everywhere * remove test that doesn't exist now * image indexing * llvm fix * fix metal * fix image * fix images * might fix ptx * fix image type mismatch * more tests pass * CAST -> VECTORIZE * forgot that one * fix TestOps.test_flip_eye_crash * locals shouldn't be image dtype * change less files * test fix * fix recursive expands * touches * MULACC support in python * delete unneeded * alu before contract * bug fixes * tests * no var multireduce * simpler tc * metal works in new style * working on AMD and METAL * fix amd * shot in the dark, fix amd * something for CUDA * CUDA WORKS from the docs * comment * correct merge * cleanups + ptx fix + get_reduce_acc * local alias isn't used anymore * add store sanity check * fix for AMD * cleanups and single expand pass * more correct with acc_cache * tests should pass * block on WMMA * tests pass * merge contract and reduce * contractor fixes issue * multicontract * pre expand wmma (same as a reduce) * expand wmma and only take one * all expands * comments and whitespace	2024-07-10 15:07:42 -07:00
chenyu	322c37e621	use helpers.JIT in llama and gpt2 examples (#5350 ) * use helpers.JIT in llama and gpt2 examples replaced getenv("JIT"), effectively made gpt2 default jit * fix test_gpt2	2024-07-09 15:04:43 -04:00
Elias Wahl	097268fab3	Add layerwise performance bench for bert (#5349 ) * add bert bench * dont disable by defauöt * remove lr * linter	2024-07-09 15:03:25 -04:00
nimlgen	1678199b15	add update_copy to hcq spec (#5348 ) * add update_copy to hcq spec * fix amd	2024-07-09 20:44:44 +03:00
qazal	1f5de80eba	multi reduce Tensor.var passing verify_lazyop (#5346 ) * what about this * reset late gate	2024-07-09 17:20:17 +03:00
kormann	3d452195e4	[bug fix] nested commutative pattern _match [run_process_replay] [no_assert] (#5340 ) * deep pat test * lint * min diff * min lines * nothing * is res extra * cleanup2 * add res back * reduce lines * type anno --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-07-09 16:38:39 +03:00
qazal	bee96a19ff	fuzz uop schedules (#5345 ) * basic blocks + cleanups * fixups * elif is better for future me * fuzz_schedule_max_paths * fix linter	2024-07-09 15:24:56 +03:00
George Hotz	c13da83f12	tests from lowerer branch (#5339 ) * tests from lowerer branch * Update test_image_dtype.py * Update test_image_dtype.py * Update test_image_dtype.py	2024-07-08 21:23:19 -07:00
chenyu	4ceab5d2b1	fix PTX match rule for gated LOAD (#5338 ) * test padto sum with bool tensor and bool acc dtype make sure bool tensor acc with gate is handled correctly * broken in PTX * fix ptx	2024-07-08 22:25:03 -04:00
chenyu	a80f2df1bd	fix some PTX tests (#5337 ) fix broken PTX tests in test_linearizer and test_uops. there are tests that were skipped and broken because it runs only with CUDA=1 and we run PTX with NV=1 now	2024-07-08 21:33:05 -04:00
wozeparrot	9150a6be7a	tensor metadata (#5271 )	2024-07-08 17:45:40 -07:00
chenyu	0f0940225a	fix Tensor.all and Tensor.any for PTX (#5335 ) supported boolean acc and boolean phi. and rewrite boolean max to uint8 max	2024-07-08 18:15:04 -04:00
kormann	2349d837fb	Fix scope order in graph toposort [run_process_replay] (#5330 ) * fix * test * nothing	2024-07-08 11:46:15 -07:00
Timmy	bb7746985f	multireduce scheduler tests (#5141 ) Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-07-08 20:28:55 +03:00
chenyu	6856f915d6	Tensor.any and Tensor.all (#5320 ) does not work in ptx yet due to how boolean tensor is handled	2024-07-07 14:36:00 -04:00
chenyu	2029cb7047	support passing None to Tensor.clip (#5319 ) passing None for no upper bound or no lower bound	2024-07-07 13:04:22 -04:00
chenyu	c1e330f302	Tensor.int and Tensor.bool (#5317 )	2024-07-07 11:52:58 -04:00
qazal	ae10e936e7	UOps.VECTORIZE cleanups [run_process_replay] (#5314 ) * still render_cast * one extra line ok * these are all just vectorize * save space * behavior change can go in a different diff	2024-07-07 10:49:08 +03:00
greg-niemeyer	77b2ce9fc9	Add UOps.VECTORIZE [run_process_replay] (#5289 ) * Add UOps.VECTORIZE to core * Update vectorized cast tests * Addresses code review comments - Removes VECTORIZE from LLVMRenderer - Add line breaks to unduly long lines - Add noop CAST rule back - Update asserts and add render_vectorize in CSytleLanguage renderer * Add missing const folding rule for VECTORIZE Also adds corresponding test * Fixes test_const_vectorize_fold and add assert - Use sane types with VECTORIZE in test_const_vectorize_fold - Add assert that sanity checks the types for VECTORIZE * Rename test_cast_vectorized_fold Renames test_cast_vectorized_fold to test_noop_vectorize_fold because the test targets a very specific rule and there are other tests for VECTORIZE. * Revert unrelated changes --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: qazal <qazal.software@gmail.com>	2024-07-07 09:59:57 +03:00
qazal	8a99514462	generalize the uops toposort spec to ptx (#5309 ) * generalize spec to ptx * redundant assert * extra print	2024-07-07 00:06:30 +03:00
chenyu	ca0ef1700b	use precise::sin in metal (#5307 )	2024-07-06 12:47:27 -04:00
qazal	d813617742	prescheduling refactor (#5300 ) * p1 * refactor tuple	2024-07-06 12:04:03 +03:00
qazal	c1e166c08a	fix dtype mismatch for bool ops in multi (#5299 )	2024-07-06 11:36:40 +03:00
chenyu	fc03fc025e	enable sin on METAL in test_dtype_alu (#5298 )	2024-07-05 14:52:09 -04:00
qazal	b369e75ed0	refactor schedule creation (#5297 )	2024-07-05 21:14:38 +03:00
qazal	5292d37db6	LoadOps.VIEW in the scheduler spec (#5296 ) * refactor to allow_buffer_view * tests * fix multi	2024-07-05 19:43:50 +03:00
hikettei	1ab7a4cff0	Handling Multiple UnaryOps.BITCAST in Function for Proper Kernel Fusion [run_process_replay] (#5172 ) * [Patch] added an option not to ignore view replacing when doing bitcast * added the testcase * [Add] reproduced bitcast cannot be fused into a single kernel in the unittest --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-07-05 19:16:44 +03:00
qazal	1cefbb33ab	uop graph tests + type_verify cleanup (#5292 ) * test_cast_alu_fold * test_double_cast_fold + these should assert	2024-07-05 13:00:01 +03:00
chenyu	f1ff65e763	remove "no-nans-fp-math"="true" for LLVM (#5282 ) fixed isnan for llvm (still have issue with < nan)	2024-07-03 17:52:50 -04:00
chenyu	3929a9dc94	fix UOp.cmp_tuple for ALU (#5280 ) * fix UOp.cmp_tuple for ALU for ALU, use self.arg instead of self.op to compare * skip that?	2024-07-03 14:59:05 -04:00
qazal	a9d6a6c339	verify_lazyop with multi reduce (#5276 ) * outsource the assert to the implicit movement op check * tests	2024-07-03 20:15:42 +03:00
chenyu	622b7bd556	simpler TinyJit inside TinyJit detection (#5219 ) * simpler TinyJit inside TinyJit detection suggested in `73395b998b (commitcomment-143660402)` * cannot repro... * clear the way out * finally clear	2024-07-03 12:28:53 -04:00
chenyu	b2c3a28a5e	nn.RMSNorm (#5272 ) the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize	2024-07-02 21:39:01 -04:00
chenyu	9a2a82a77f	test stable diffusion unet in ci (#5268 ) unet is parameterized now so can test a smaller one is ci	2024-07-02 21:37:52 -04:00
George Hotz	e53b164e1a	small changes from lowerer (#5266 )	2024-07-02 15:03:54 -07:00
nimlgen	7be776f9af	add _alloc_signal/_free_signal to hcq (#5264 ) * add _alloc_signal/_free_signal api * oops, revert this * linter	2024-07-02 23:35:39 +03:00
Tobias Fischer	9a25ee0b9a	pixed unet call params (#5262 )	2024-07-02 12:40:27 -04:00
Tobias Fischer	8c9c1cf62f	Pulled CLIP and UNet into Seperate Files (#5253 ) * pulled clip and unet into seperate files * reference cleanup, lru cache fix * better pool indexing	2024-07-01 22:33:01 -04:00
nimlgen	57e89645cd	hcq spec test (#5226 ) * start hcq spec test * more test * fixes * run on amd as well * test amdgpu exec * fix amd * amd mockgpu support sdma timestamp	2024-07-01 17:36:37 +03:00
George Hotz	3df47bc21e	OpenELM + repeat_interleave (#5234 ) * start writing openelm * progress...hit bug * repeat_interleave support * gqa * add rotary embedding * spp * i think it runs correctly * broken * output is good now * cleanups * no io_uring on android	2024-06-30 15:18:39 -07:00
chenyu	649641a2f2	fix tqdm with generator without `__len__` (#5238 ) it should be treated as total = 0 (just show iteration count). also removed duplicated ": " in fetch and fixed unit scale with total = 0	2024-06-30 12:20:59 -04:00
chenyu	fd53b6d901	tqdm supports fractional blocks (#5233 ) enabled progress bar match in test, it matched perfectly now	2024-06-29 22:30:18 -04:00
chenyu	ae10ae4722	simplify tqdm scale math (#5231 ) expand the log of log stuff	2024-06-29 21:17:40 -04:00
hikettei	ad1ca7da64	[Feature] Added BinaryOps.AND/BinaryOps.OR (#5223 ) * [Feature] Added BinaryOps.AND/BinaryOps.OR * Add: __rand__, __ror__	2024-06-29 17:20:25 -07:00
chenyu	b2ea610df8	fix tqdm unit_scale and support hours in time (#5227 ) * fix tqdm unit_scale and support hours in time previously it only supports MM:SS. more chars to unitscales, strip trailing "." and " " in formatting, and more tests * simpler	2024-06-29 14:48:51 -04:00
qazal	f374fb77af	assert bool dtype for valid [run_process_replay] (#5214 ) * valid is always bool * prevent NumNode to begin with * part 2 * test: disable pattern matchers, asserts should pass * test: store without cast * test: if (0) * cleanup time * only pattern match bool literal * better for upstream debug	2024-06-29 21:20:32 +03:00
qazal	3f4eeb8b54	late UOps.IF generation [run_process_replay] [no_assert] (#5027 ) * find all places * test gates * test * gate based on depths * add ctx * that cache was so wrong * delete useless things * dont double write if * self.if_cond * move UOps.IF to gated store * test_padto_where_multioutput * test_padto_group * minor cleanup * hmm this actually works? * need a good barrier * merge 2 * delete ctx * p1 * maybe p2 * p3 * minor fixup * fixup 2 * smart thing from the Lowerer branch * refactoring * refactoring 2 * maybe before graph_rewrite * slightly more acceptable Linearizer diff * more correct * [run_process_replay] [no_assert]	2024-06-29 12:22:14 -04:00
chenyu	42d1f92fc1	simpler tqdm (#5221 ) can do more, but many cases are not tested	2024-06-29 07:41:46 -04:00
George Hotz	80ac21200b	hotfix: linearizer test fixup	2024-06-28 10:52:25 -07:00
kormann	6c456b6d66	remove uopgraph dedup + slight speedup (#5199 ) * rm dedup * rm dedup * tests * reduce diff * oups * reduce diff * rm UOp.tuple	2024-06-28 09:26:32 -07:00

1 2 3 4 5 ...

2026 Commits