tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-21 12:58:00 -05:00

Author	SHA1	Message	Date
chenyu	0061dc7447	fix benchmark allreduce and add to ci [pr] (#8521 )	2025-01-07 00:37:59 -05:00
nimlgen	9bc317d5d2	mockcuda (#8503 ) * init mockcuda * run gpu ocelot * fix * sfixes * disable broken tests * linter * these fails as well * pylint * myypy * this fails on real platforms as well * mypy please	2025-01-05 01:23:57 +03:00
uuuvn	5ffc50d58c	Clang JIT (#8481 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-03 11:12:55 -05:00
George Hotz	803a47494e	Revert "Clang JIT (#8312 )" (#8452 ) This reverts commit `b6266c8e41`.	2024-12-30 17:49:35 -05:00
uuuvn	b6266c8e41	Clang JIT (#8312 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-30 17:37:53 -05:00
George Hotz	0addbad36d	Happy New Year! Let's get AM merged	2024-12-30 13:15:10 -05:00
qazal	9defbc7d54	add symbolic_simple to the scheduler [pr] (#8419 )	2024-12-26 20:05:08 +08:00
George Hotz	9f62c80f68	hotfix: this is a loan	2024-12-20 14:47:04 -08:00
qazal	d78e75f710	hotfix: use ubuntu-22.04 ci from 8249 (#8251 )	2024-12-15 02:23:00 +02:00
George Hotz	8a50868264	touchup function.py [pr] (#8220 ) * touchup function.py [pr] * remove ALLOWED_READ_IMAGE * eh, keep it, just change it	2024-12-13 13:07:00 -08:00
ignaciosica	0a00187dce	add real AMX tests to benchmark (#8216 ) * add real amx to benchmark * add debug=2 to check tc are triggered	2024-12-13 14:03:41 -05:00
George Hotz	d9a0880d33	delete fuzz uops (not tested) [pr] (#8181 )	2024-12-12 01:41:27 -08:00
chenyu	26e049ab40	add ALLOWED_READ_IMAGE=2131 to openpilot (#8166 ) added as exact number check now as it's not clear if more/less than allowed is any better	2024-12-11 12:14:17 -08:00
Ahmed Harmouche	a73e3677d0	Test linearizer on webgpu (#8159 ) * Test linearizer on wgpu * Skip tests due to exceeded dims	2024-12-11 17:03:26 +01:00
chenyu	d462f8ace0	use HALF in cifar wino benchmarks (#8153 ) more representative as it hits tensor cores on tinyboxes	2024-12-10 20:21:00 -05:00
Ahmed Harmouche	a8cfdc70ed	Run more webgpu tests (#8142 )	2024-12-10 23:20:04 +01:00
Ahmed Harmouche	ed7318a3f5	Fix puppeteer install (#8148 ) Clean npm cache before puppeteer install	2024-12-10 23:06:33 +01:00
Ahmed Harmouche	71dd222f66	Fix setitem on wgpu (#8144 )	2024-12-10 19:34:25 +01:00
George Hotz	f83d715f41	move checks into compile3, delete compile2 [pr] (#8127 ) * move checks into compile3 [pr] * test_vs_onnx * test v torch works * float16 won't compile on compile3 * actually delete compile2	2024-12-09 14:21:42 -08:00
George Hotz	87c360c4b5	hotfix: add --size 8B to llama3	2024-12-09 07:53:20 -08:00
chenyu	e9692de42b	don't FUZZ_ALL_ACTIONS in fuzz_linearizer.py (#8096 ) mostly for speed, this is just making sure the script runs	2024-12-06 17:22:17 -05:00
Ahmed Harmouche	ce72fe1411	u32 to f16 in tinygrad (#8074 ) * f16 decompression in tinygrad * Typing and cleanup	2024-12-06 12:00:13 +01:00
Ahmed Harmouche	ff9a89f714	Proper dtypes for input/output of exported WebGPU model (#8053 ) * Respect input/output dtypes in exported WebGPU model * Add some comments about skipped dtypes	2024-12-05 10:38:05 +01:00
George Hotz	83aecbdc70	do gpuocelot copy manually [pr] (#8050 )	2024-12-05 11:51:20 +08:00
George Hotz	4a208bfb28	bump download cache version	2024-12-05 11:42:34 +08:00
Ahmed Harmouche	13eedd373b	Run WebGPU tests on ubuntu (#8033 )	2024-12-04 12:42:04 +01:00
Ahmed Harmouche	db330a3110	Remove WebGL (#8012 )	2024-12-03 16:02:53 +01:00
George Hotz	dddfb494d7	don't mutate the uop/lazybuffer, just the Buffer [pr] (#8000 ) * don't mutate the uop/lazybuffer, just the Buffer [pr] * fix red test * try different fix * that * that's the right fix * test for fixed behavior * bump to 3.12	2024-12-03 19:03:51 +08:00
chenyu	17d5719a38	add process replay to webgpu tests (#7998 )	2024-12-02 20:27:29 -05:00
chenyu	3c8c98253a	BEAM_DEBUG=1 in speed_v_theoretical (#7942 ) * DEBUG=3 in speed_v_theoretical * BEAM_DEBUG=1	2024-11-28 08:30:55 -05:00
chenyu	a6171cbe71	add stable diffusion v2 to mac benchmark (#7917 ) this caught #7902	2024-11-26 22:09:43 -05:00
qazal	345457f518	webgpu cache packages (#7911 ) * webgpu -n=auto * fix webgpu ci cache	2024-11-27 00:17:36 +08:00
qazal	6102e3159c	webgpu -n=auto (#7910 )	2024-11-26 21:13:12 +08:00
George Hotz	4e5bf9dc7a	test assignment in jit (#7906 ) * test assignment in jit * don't waste lines * skip broken test in webgpu	2024-11-26 17:37:00 +08:00
Ahmed Harmouche	10618aba98	Bring back WebGPU (#7063 ) * Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-26 12:26:40 +08:00
chenyu	ac57d82a13	test_tiny on real NV/CUDA/AMD/HIP (#7886 ) simple tests that run on real CUDA and HIP	2024-11-24 16:34:54 -05:00
chenyu	5c5b1b994c	less flaky benchmarks (#7855 ) JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830	2024-11-22 16:39:39 -05:00
chenyu	d5c9fafff5	default run stable diffusion benchmark with fp16 (#7831 ) and keep the non-fp16 one in mac	2024-11-21 15:58:17 -05:00
chenyu	46aa23539f	generate and print mypy lineprecision report (#7809 )	2024-11-20 16:53:17 -05:00
chenyu	c815d7b56e	run bfloat16 tensor core in metal benchmark (#7808 ) * run bfloat16 tensor core in metal benchmark * separate task	2024-11-20 15:34:07 -05:00
chenyu	d5f76462c8	fix CI beautiful_mnist dir (#7790 ) fixed `fatal: not a git repository (or any of the parent directories): .git` because $HOME is not $GITHUB_WORKSPACE	2024-11-19 09:59:02 -05:00
George Hotz	fbb4099b3c	add test for compile3 [pr] (#7783 ) Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-19 19:26:51 +08:00
chenyu	9fb396f660	test_ops maxpool2d -> max_pool2d (#7696 ) and avgpool2d -> avg_pool2d for better grepping the tests	2024-11-14 10:39:12 -05:00
chenyu	e6cfaaa496	metal benchmark JIT=2 -> JIT=1 (#7661 )	2024-11-12 22:55:27 -05:00
chenyu	1884f021e3	add conv3x3 to speed_v_theoretical (#7658 ) * add conv3x3 to speed_v_theoretical * show test duration	2024-11-12 16:41:56 -05:00
chenyu	a88a15c7e8	setup perflevel in red CI (#7645 ) runs v4.1 bert setup. ``` rocm-smi --setprofile compute rocm-smi --setmclk 3 rocm-smi --setperflevel high ```	2024-11-11 18:44:55 -05:00
chenyu	773d5b60bf	beam benchmark tests (#7638 ) * beam benchmark tests * lower AMD number somehow * less flaky	2024-11-11 18:11:18 -05:00
chenyu	bfab03288d	fix HALF=1 in test_speed_v_torch (#7642 ) * fix HALF=1 in test_speed_v_torch "operation cache defeats" adds 1 to all arg, which were centered around 0. adding 1 makes big matmul and matvec go inf. fixed by subtract 1 after and bumpped tolerance for half input * bigger tol for BIG=2, update CI too * bigger tol	2024-11-11 14:29:37 -05:00
George Hotz	b4cb6b89f9	hotfix: CI mac uses python 3.11	2024-11-11 23:42:35 +08:00
George Hotz	9648372ee6	hotfix: mac uses python 3.12	2024-11-11 23:23:48 +08:00

... 8 9 10 11 12 ...

1088 Commits