tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 07:18:40 -05:00

Author	SHA1	Message	Date
qazal	345457f518	webgpu cache packages (#7911 ) * webgpu -n=auto * fix webgpu ci cache	2024-11-27 00:17:36 +08:00
qazal	6102e3159c	webgpu -n=auto (#7910 )	2024-11-26 21:13:12 +08:00
George Hotz	4e5bf9dc7a	test assignment in jit (#7906 ) * test assignment in jit * don't waste lines * skip broken test in webgpu	2024-11-26 17:37:00 +08:00
Ahmed Harmouche	10618aba98	Bring back WebGPU (#7063 ) * Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-26 12:26:40 +08:00
chenyu	46aa23539f	generate and print mypy lineprecision report (#7809 )	2024-11-20 16:53:17 -05:00
chenyu	d5f76462c8	fix CI beautiful_mnist dir (#7790 ) fixed `fatal: not a git repository (or any of the parent directories): .git` because $HOME is not $GITHUB_WORKSPACE	2024-11-19 09:59:02 -05:00
George Hotz	fbb4099b3c	add test for compile3 [pr] (#7783 ) Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-19 19:26:51 +08:00
chenyu	9fb396f660	test_ops maxpool2d -> max_pool2d (#7696 ) and avgpool2d -> avg_pool2d for better grepping the tests	2024-11-14 10:39:12 -05:00
George Hotz	d40673505f	new cloud is cloudy [pr] (#7631 ) * new cloud is cloudy [pr] * waste lines to add security * safety, with speed and less lines * timing and del * lines * cleanups * restore CloudSession * bump to 3.10 * quotes * renderer security	2024-11-11 20:18:04 +08:00
chenyu	e7b18cf5c0	fix load_worlds filter_novariable (#7564 ) filter based on "DEFINE_VAR" instead of "Variable". also added a unit test to make sure dataset includes image and variable kernels	2024-11-05 16:06:39 -05:00
chenyu	207bca6cea	set PAGE_SIZE=1 and generate new dataset (#7559 ) 13080 rows in total. both generating and loading this are pretty broken now. filters are wrong for example	2024-11-05 11:25:01 -05:00
George Hotz	72a9ac27e9	support image dtype in cloud [pr] (#7482 ) * support image dtype in cloud [pr] * remove outdated osx hack * unused imports	2024-11-02 23:54:27 +08:00
George Hotz	133fe81cc5	Revert "Revert "move up migrate + new gated fold (#7403 )" (#7406 )" (#7407 ) * Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)" This reverts commit `ea5654a9bc`. * test padded in emulation too * bring back early folding	2024-10-30 23:25:45 +08:00
George Hotz	d9d4dd6756	faster ci [pr] (#7348 )	2024-10-29 14:01:44 +08:00
George Hotz	a5e0f59e41	move autogen to different CI runner [pr] (#7346 ) * move autogen to different CI runner [pr] * balance a bit * readme back there * compile enet in autogen	2024-10-29 13:35:22 +08:00
George Hotz	f55c3dcff8	hotfix: bump ocelot	2024-10-29 12:46:24 +08:00
qazal	4cf7cca91a	delete fuzz_schedule [pr] (#7144 )	2024-10-18 15:09:39 +03:00
chenyu	d12c87dc8e	use ubuntu-22.04 in CI (#7068 ) ubuntu-latest points to 24.04 now, maybe it's this?	2024-10-15 09:44:59 -04:00
chenyu	fbaab30fe3	add timing to fuzz_linearizer (#7056 ) and applied smaller FUZZ_MAX_SIZE. this is getting quite slow in CI	2024-10-14 11:57:41 -04:00
George Hotz	f50d0e0ee0	cloud device [pr] (#6964 ) * first try at cloud device [pr] * real separation * we're free * clang works * unhappy with timeout * better timeouts and free * unrelated * use http verbs + add test * lines + better test * fix DELETE * shorter cloud * split key * fix sending renderer * PTXRenderer serialization * add sessions * http.client * minor timeout bump * fix keep-alive * inc server timeout * real fix timeout * that one too	2024-10-11 12:24:06 +08:00
qazal	3724a66716	move test_viz to test/, prereq for tinygrad/viz [pr] (#6972 )	2024-10-10 11:40:46 +03:00
George Hotz	0d6216aba1	bump the download cache (#6896 )	2024-10-05 10:23:18 +08:00
George Hotz	0f28e93224	add pickle support for pattern matchers [run_process_replay] (#6816 ) * add pickle support for pattern matchers [run_process_replay] * cleaner and all * no closures * fix tests * revert that * final * cleaner * python 3.8 fix * add round trip back * this * waste lines on this. that's the final line count * max print better * more targetted fix * regrettably add 3.8 support	2024-09-30 21:54:46 +08:00
wozeparrot	2b899164c6	no numpy (#6751 )	2024-09-26 16:40:18 +08:00
wozeparrot	c100f3d406	default threefry (#6116 )	2024-09-25 17:45:13 +08:00
George Hotz	dd575da7ee	real minimum cstyle change (#6709 ) * real minimum cstyle change * make it match * bring back DEFINE_GLOBAL store marking writable * bump line count to 9800 * closer * precompute don't render * cast/bitcast too * smem_align * vectorize * more pr match * remove that test * less PR diff	2024-09-25 12:40:46 +08:00
George Hotz	b0ffe2452b	bump line count to 9800	2024-09-25 09:15:30 +08:00
chenyu	26ebb7cab4	don't use div_folding in lt_folding (#6666 ) * don't use div_folding in lt_folding valids 35 -> 13 * fails the same as before	2024-09-23 01:50:18 -04:00
chenyu	da5b741656	removed valid in openpilot conv (#6619 ) 35 valids left	2024-09-23 00:30:18 -04:00
chenyu	1923932339	canonicalize simplex lt (#6658 ) (X := a0x0 + a1x1 + ...) > 0 is equivalent to x0 + x1 + ... > 0 if xi >= 0 and ai > 0 for ints	2024-09-22 23:04:47 -04:00
chenyu	5707503048	x//a<b -> x <a*b for positive a (#6622 ) openpilot valids 47 -> 37	2024-09-20 04:38:47 -04:00
chenyu	b14c1bc417	UOps.RANGE is_increasing (#6615 ) * UOps.RANGE is_increasing 283 -> 47 valids * test	2024-09-20 03:14:52 -04:00
chenyu	036c2f5b26	validhack use the new style ge for upper bound valid (#6612 ) also relaxed the bound check to check vmin/vmax instead just const. valids 482 -> 283	2024-09-19 23:45:42 -04:00
George Hotz	a1a882b006	arange folding with new ge (#6604 ) * arange folding with new ge * bump allowed gated * bump allowed speed	2024-09-19 18:01:28 +08:00
chenyu	d148a62f8d	more generic simplify_valid_image_load (#6603 ) use graph_rewrite to simplify the expression with narrowed variables, and check boundry conditions on monotonically increasing function to drop valid.	2024-09-19 05:33:37 -04:00
chenyu	162ead02a9	remove LOAD where valid is an empty set (#6579 ) 356 -> 354 valids	2024-09-18 03:49:41 -04:00
chenyu	a72d51e277	brute force VALIDHACK matching (#6575 ) * brute force VALIDHACK matching * cleanup * 9700	2024-09-18 01:59:50 -04:00
qazal	d8e5d5c663	move VIZ=1 tests to fuzzers (#6574 )	2024-09-18 12:12:03 +08:00
George Hotz	67a03e72bb	remove expr_idxs [run_process_replay] (#6567 ) * remove expr_idxs [run_process_replay] * goodbye that test	2024-09-17 18:34:51 +08:00
chenyu	5fb877c78c	generic valid match criteria of #6552 (#6558 ) 455 -> 364 valids. generalize `idx < image bound` to `idx < image bound + c` for some `c`	2024-09-17 02:40:36 -04:00
George Hotz	0ab06d5840	push geps through wmma (#6559 ) * push geps through wmma * update tests	2024-09-17 14:38:40 +08:00
chenyu	7c942418a1	other side of simple out of bound valid case (#6552 ) 462 -> 455	2024-09-16 23:57:15 -04:00
chenyu	aeaf7894a7	more generic version of #6548 (#6549 ) x(-1)<0 can be generalized to x(-1)<c, 473 -> 462 valids	2024-09-16 23:17:16 -04:00
chenyu	596f41eb46	simple drop image valid case (#6548 ) * simple drop image valid case started unit test, 530 -> 473 valids * cleanup	2024-09-16 22:54:07 -04:00
chenyu	798be6bb74	add gated read_image count in openpilot compile2 (#6546 ) 530 to go	2024-09-16 21:17:00 -04:00
George Hotz	cd90092f14	graph rewrite tests (#6519 ) * more graph rewrite tests * more complex test cases * more tests * more tests * cleanups * 9600 lines * cleanups	2024-09-15 17:29:16 +08:00
qazal	c5bae55ec8	new generate_dataset.sh (#6423 ) * new generate_dataset.sh * keep those there * test: rm expected failures * rename to extract	2024-09-09 15:13:07 +08:00
George Hotz	4b128da525	hotfix: line count to 9500	2024-09-06 09:10:03 +08:00
ignaciosica	c15506fc35	[WIP] amx support as TC (#5693 ) * almost working with relu, even hackable... but acc size is wrong, fix needed * upcast based on threads, change thread size to 4x4 * revert wrongfully commented assert * fix tc load indexing * modify for size 8 * fix bug for size 8 * Revert "fix bug for size 8" This reverts commit `cdb3f5df85`. * Revert "modify for size 8" This reverts commit `3ef0904bd9`. * good kernel with changes in lowerer * revert "good kernel with changes in lowerer" This reverts commit `975e2b5a4e`. * good kernel for relu! * refactor lowerer changes * add amx context var to helper * clean up amx flag * improve lowerer changes readability * improve check for amx * revert lowerer if * add float4 type rendering for clang * add amx definitions * enable indexing for clang if amx * working amx example, wrong because of dims * almost works for float 16, need to spot using double load in amx * cleaner render_kernel * revert chages in simple_matmul and delete env * add new var upcast_offset to get_optimized_ast * change axis for axes * invert if in rendering phi * fix some bugs * fix linearizer tests * fix vec/get pat for amx * remove clang tc if amx is disabled * add ops_python support * refactor into one complementary function in ops_python * add job for EMUALTE_AMX * improve checking for AMX in UPCAST and TC extra ops * fix lint issue * commit before refactor into autocontained AMX * start refactor by removing special rendering for AMX * all ready for amx handcoded kernel * working poc, most straightforward amx support * avoid local opts for tc if amx * fix merge bugs * skip test for clang * skip tc hand-coded opts if amx * remove hardcoded ops_python values * remove hardcoded sizes for amx kernel * fix ops_python bug where dim was hard-coded * change contract for vectorize * working without changes in lowerer * revert changes in gep rendering * fix ops_python * modify comment * skip test if clang for different type accumulation * move rename and bug for seperate pr * fix wrong path for test * addmm not implemented in torch for cpu * change struct for vector; equally slow but cleaner * revert modified test * simply wmma rendering * minor change * noqa:501 * add length 16 for AMX * fix vectorized half issue * fix error * remove comment * change set for dedup * split test of tensor_core_extra_ops so that cases that dont require locals run for AMX * add amx reference * load acc into amx registers * fix dtype rendering and remove noqa * moved tests change into another pr * add real AMX job for CI and fix bug * fix ops_python bug * fix test class * remove real AMX tests and fix uops_stats test * remove wrong test * acc folding * hotfix: bug * fix float4 tests for amx * hack for fixing flops counting * hotfix: mypy * add flop counts test for amx * improve test_float4_multidim_amx * improve test_float4_multidim_amx * improve test_float4_multidim_unaligned_load_amx * nits tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-09-06 09:01:10 +08:00
nimlgen	8e2a3fc165	raise lines count to 9300 for qcom (#6336 )	2024-09-02 18:57:57 +03:00

1 2 3 4 5 ...

441 Commits