tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
Ahmed Harmouche	2d11765295	Fix WebGPU atomic store (#7954 )	2024-11-29 19:31:25 +08:00
Ahmed Harmouche	10618aba98	Bring back WebGPU (#7063 ) * Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-26 12:26:40 +08:00
chenyu	3b26e51fce	Tensor.cummax (#7854 ) generalized the existing cumsum and take Ops.MAX in addition to Ops.ADD	2024-11-22 15:55:02 -05:00
wozeparrot	c100f3d406	default threefry (#6116 )	2024-09-25 17:45:13 +08:00
chenyu	a37e92081a	fix unrolled arange folding (#6606 ) * fix unrolled arange folding also added flop test to test_arange to make sure it's 0 flop * skip PTX	2024-09-19 09:03:01 -04:00
chenyu	15c4d4f406	fold unrolled arange div pattern (#6465 )	2024-09-10 22:35:52 -04:00
Roelof van Dijk	ad4b3b457f	bump limit for test_llama_embedding_opt (#6332 )	2024-08-31 10:03:43 -04:00
George Hotz	16f420f7a7	split full_graph_rewrite and linearize_uop [run_process_replay] (#6215 ) * split full_graph_rewrite and linearize_uop * fix tests * graph rewrite in test uops * add types	2024-08-20 20:12:33 -07:00
George Hotz	a5d79688db	fix indexing out of bounds (#6208 ) * fix indeing out of bounds * 5 ops per access is fine	2024-08-20 11:34:56 -07:00
chenyu	4451bcaf95	update test_arange test_llama_embedding_opt (#6207 ) non CI uses larger embedding, still same orders of magnitude	2024-08-20 13:58:43 -04:00
George Hotz	cf7d3c1eb8	fix tests locally on metal (#6025 ) * remove contiguous child, it was breaking tests locally * hmm, it's still needed * include NOOPT in method cache key	2024-08-10 12:36:22 -07:00
qazal	45b1761175	smaller test_llama_embedding + assert correctness (#5986 ) * smaller test_llama_embedding in CI * test correctness	2024-08-08 22:11:29 +03:00
George Hotz	6d1fdcfce2	don't reduce the same thing in a vector (#5950 ) * don't reduce the same thing over and over * cleaner way to write it that doesn't loop	2024-08-06 16:59:15 -07:00
George Hotz	3e1336957d	test arange with all opts (#5923 ) * test arange with all opts * Update test_arange.py * Update test_arange.py * Update test_arange.py * Update test_arange.py * Update test_arange.py	2024-08-05 18:38:25 -07:00
George Hotz	5d17f54e3c	fast mnist indexing (#5921 ) * fast mnist indexing * more tests * remove those tests, new indexing rule	2024-08-05 13:55:15 -07:00
George Hotz	e81c18f494	make the arange test check correctness [run_process_replay] (#5920 )	2024-08-05 13:41:06 -07:00
George Hotz	42f599870c	unroll arange is broken (#5918 ) * unroll arange is broken * fix unrolled arange * one more test	2024-08-05 12:15:07 -07:00
qazal	65fa86901a	indexing fusion 2 (#5888 ) * arange fusion * kernels that fuse * tests	2024-08-03 13:13:39 +03:00
qazal	4e070a2c89	start work on indexing fusion (#5590 ) * start base * the views add up base reduceop st: ShapeTracker(views=(View(shape=(60000, 1), strides=(1, 0), offset=0, mask=None, contiguous=True),)) top st: ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False))) merged buf.st+st: ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False))) * p1 * some cleanups * more cleanups * one kernel * more * late fuse arange * less lines * more work * fix st strides 1 * update test_schedule, start argmax * test_tiny_argmax * add FUSE_ARANGE * more cleanup * add utils * reduce merging * fix axis and fold if needed * more fusion * need to figure this out * now fixing all of these * todos+save a line * ready for p1	2024-07-25 13:23:38 +03:00
George Hotz	dc21e63bd2	test: put conv in one reduce (#4441 ) * test: put conv in one reduce * put reduce at the end * more expand * generic, and that expand was breaking things * ratio * don't undo the expand * arg 1 * strides * warning, for resnet * warning removed * disable cast * handle cast * op * err, that's right * fixup * fix that * a test to play with * add double_reduces * working up to final reshape * fold the last reshape * moved to schedule * fix axis * ci, need to bring arange back * FUSE_CONV_BW maybe * valid in 3.9 * test_expand_reduce_is_folded_on_different_axes * add FUSE_CONV_BW=1 * test_fold_batchnorm_backward * test_sgd_4convs_fuse --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-07-22 12:16:13 +03:00
George Hotz	d1a7279605	indexing fold with casted bool (#5551 ) * cast bool is where * universal transform is wrong	2024-07-18 10:02:29 -07:00
qazal	0ad1672d5f	fuse indexing (LazyOp creation) (#5506 ) * bring FUSE_AS_ONE_KERNEL back * operands need reshape? * fused but arange didnt fold * something deeply wrong * yay, fused * derive broadcasts * s/input/reduce_input * _fixup_ones proved a point * this is what it takes * down to 3 required reshapes: 1. output_shape 2. the second reduce merge dims 3. remove dims for above reshape * start real reshapes * resolve shape in the edges pre lazyop * outputs are the same shape * rewrite1: just the reduce * more correct * fuse_as_one_kernel * closer * this passes * dont rerun info * dont need these * not needed	2024-07-18 14:09:17 +03:00
qazal	e22b377839	generalize FUSE_AS_ONE_KERNEL in the scheduler (#5397 ) * test: use const * hotfix: base * asserts * dont push through reshape * cleanup * dont need the cache * test_reduceop_reshape_dont_push and test_index_fused are next	2024-07-12 10:23:16 +03:00
George Hotz	3a2b5a75d2	improve single kernel indexing (#5398 ) * improve single kernel indexing * metadata in graph (#5399) * indexing is O(1) * add failing test * ugh, that all needs to be replaced with symbolic * broken on ptx, it's fine --------- Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2024-07-11 19:00:57 -07:00
George Hotz	c2da4454cd	indexing getting better (#5389 ) * indexing getting better [run_process_replay] [no_assert] * fix test * test_arange_2_reduce is a simpler test * put that print back, NOOPT * don't merge reduces (they could be different reduces) * FUSE_AS_ONE_KERNEL * fix tests * fix test_var_multireduce * w/e put that there * fails on others too * fix test, revert UNMUL change * in case order matters * one kernel indexing works * one kernel indexing works (test other)	2024-07-11 16:41:51 -07:00
chenyu	f1bf916b8a	apply NOOPT in test_arange complexity (#4774 ) with hcopt, arange(2560) uses less ops than arange(256)	2024-05-29 23:12:35 -04:00
George-the-1st	0627e26140	Added missing unittest execution code (#4400 ) same code as on every other test file, just missing from this one for some reason.	2024-05-02 22:34:30 -04:00
chenyu	a6ed2ae3c6	use old cumsum optimization for arange (#3813 ) revert to old cumsum opt while phi simplification is disabled. added a flops complexity test for this	2024-03-18 20:01:03 -04:00

28 Commits