tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 21:38:10 -05:00

Author	SHA1	Message	Date
chenyu	ac57d82a13	test_tiny on real NV/CUDA/AMD/HIP (#7886 ) simple tests that run on real CUDA and HIP	2024-11-24 16:34:54 -05:00
chenyu	5c5b1b994c	less flaky benchmarks (#7855 ) JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830	2024-11-22 16:39:39 -05:00
chenyu	d5c9fafff5	default run stable diffusion benchmark with fp16 (#7831 ) and keep the non-fp16 one in mac	2024-11-21 15:58:17 -05:00
chenyu	46aa23539f	generate and print mypy lineprecision report (#7809 )	2024-11-20 16:53:17 -05:00
chenyu	c815d7b56e	run bfloat16 tensor core in metal benchmark (#7808 ) * run bfloat16 tensor core in metal benchmark * separate task	2024-11-20 15:34:07 -05:00
chenyu	d5f76462c8	fix CI beautiful_mnist dir (#7790 ) fixed `fatal: not a git repository (or any of the parent directories): .git` because $HOME is not $GITHUB_WORKSPACE	2024-11-19 09:59:02 -05:00
George Hotz	fbb4099b3c	add test for compile3 [pr] (#7783 ) Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-19 19:26:51 +08:00
chenyu	9fb396f660	test_ops maxpool2d -> max_pool2d (#7696 ) and avgpool2d -> avg_pool2d for better grepping the tests	2024-11-14 10:39:12 -05:00
chenyu	e6cfaaa496	metal benchmark JIT=2 -> JIT=1 (#7661 )	2024-11-12 22:55:27 -05:00
chenyu	1884f021e3	add conv3x3 to speed_v_theoretical (#7658 ) * add conv3x3 to speed_v_theoretical * show test duration	2024-11-12 16:41:56 -05:00
chenyu	a88a15c7e8	setup perflevel in red CI (#7645 ) runs v4.1 bert setup. ``` rocm-smi --setprofile compute rocm-smi --setmclk 3 rocm-smi --setperflevel high ```	2024-11-11 18:44:55 -05:00
chenyu	773d5b60bf	beam benchmark tests (#7638 ) * beam benchmark tests * lower AMD number somehow * less flaky	2024-11-11 18:11:18 -05:00
chenyu	bfab03288d	fix HALF=1 in test_speed_v_torch (#7642 ) * fix HALF=1 in test_speed_v_torch "operation cache defeats" adds 1 to all arg, which were centered around 0. adding 1 makes big matmul and matvec go inf. fixed by subtract 1 after and bumpped tolerance for half input * bigger tol for BIG=2, update CI too * bigger tol	2024-11-11 14:29:37 -05:00
George Hotz	b4cb6b89f9	hotfix: CI mac uses python 3.11	2024-11-11 23:42:35 +08:00
George Hotz	9648372ee6	hotfix: mac uses python 3.12	2024-11-11 23:23:48 +08:00
George Hotz	d40673505f	new cloud is cloudy [pr] (#7631 ) * new cloud is cloudy [pr] * waste lines to add security * safety, with speed and less lines * timing and del * lines * cleanups * restore CloudSession * bump to 3.10 * quotes * renderer security	2024-11-11 20:18:04 +08:00
chenyu	e7b18cf5c0	fix load_worlds filter_novariable (#7564 ) filter based on "DEFINE_VAR" instead of "Variable". also added a unit test to make sure dataset includes image and variable kernels	2024-11-05 16:06:39 -05:00
chenyu	207bca6cea	set PAGE_SIZE=1 and generate new dataset (#7559 ) 13080 rows in total. both generating and loading this are pretty broken now. filters are wrong for example	2024-11-05 11:25:01 -05:00
George Hotz	6f93e91deb	hotfix: lower mnist threshold for non determinism	2024-11-03 11:05:12 +08:00
George Hotz	72a9ac27e9	support image dtype in cloud [pr] (#7482 ) * support image dtype in cloud [pr] * remove outdated osx hack * unused imports	2024-11-02 23:54:27 +08:00
George Hotz	133fe81cc5	Revert "Revert "move up migrate + new gated fold (#7403 )" (#7406 )" (#7407 ) * Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)" This reverts commit `ea5654a9bc`. * test padded in emulation too * bring back early folding	2024-10-30 23:25:45 +08:00
George Hotz	d9d4dd6756	faster ci [pr] (#7348 )	2024-10-29 14:01:44 +08:00
George Hotz	a5e0f59e41	move autogen to different CI runner [pr] (#7346 ) * move autogen to different CI runner [pr] * balance a bit * readme back there * compile enet in autogen	2024-10-29 13:35:22 +08:00
George Hotz	f55c3dcff8	hotfix: bump ocelot	2024-10-29 12:46:24 +08:00
George Hotz	4fed358511	hotfix: timeouts to 20 minutes. better no stats update than a red x	2024-10-25 16:31:52 +08:00
chenyu	d4c94d0d32	disable llama 1 4gpu and 6gpu benchmark (#7276 ) having llama3 4gpu and 6gpu should be good enough	2024-10-24 14:19:22 -04:00
chenyu	e6929f2402	RUN_PROCESS_REPLAY=0 on llama 70B and resnet training (#7272 ) * RUN_PROCESS_REPLAY=0 on llama 70B and resnet training also added a 15 minutes total timeout, this cannot grow indefinitely * add a few more * a few more just for NV	2024-10-24 12:09:54 -04:00
qazal	4cf7cca91a	delete fuzz_schedule [pr] (#7144 )	2024-10-18 15:09:39 +03:00
George Hotz	9f4ca88218	hotfix: relax target pct for beautiful_mnist	2024-10-17 12:36:07 +08:00
chenyu	d12c87dc8e	use ubuntu-22.04 in CI (#7068 ) ubuntu-latest points to 24.04 now, maybe it's this?	2024-10-15 09:44:59 -04:00
chenyu	fbaab30fe3	add timing to fuzz_linearizer (#7056 ) and applied smaller FUZZ_MAX_SIZE. this is getting quite slow in CI	2024-10-14 11:57:41 -04:00
nimlgen	feb0bcb58b	qcom bench bind to perf cluster (#6996 )	2024-10-11 12:21:52 +03:00
George Hotz	f50d0e0ee0	cloud device [pr] (#6964 ) * first try at cloud device [pr] * real separation * we're free * clang works * unhappy with timeout * better timeouts and free * unrelated * use http verbs + add test * lines + better test * fix DELETE * shorter cloud * split key * fix sending renderer * PTXRenderer serialization * add sessions * http.client * minor timeout bump * fix keep-alive * inc server timeout * real fix timeout * that one too	2024-10-11 12:24:06 +08:00
nimlgen	f9d454aed5	correct kernargs alignment (#6984 )	2024-10-11 00:06:28 +03:00
qazal	3724a66716	move test_viz to test/, prereq for tinygrad/viz [pr] (#6972 )	2024-10-10 11:40:46 +03:00
qazal	b82023c97e	process replay cleanup to generic _pmap [pr] (#6929 ) * process replay cleanup to generic _pmap [pr] * delete `COMPARE_SCHEDULE`	2024-10-07 13:57:05 +08:00
George Hotz	0d6216aba1	bump the download cache (#6896 )	2024-10-05 10:23:18 +08:00
George Hotz	0f28e93224	add pickle support for pattern matchers [run_process_replay] (#6816 ) * add pickle support for pattern matchers [run_process_replay] * cleaner and all * no closures * fix tests * revert that * final * cleaner * python 3.8 fix * add round trip back * this * waste lines on this. that's the final line count * max print better * more targetted fix * regrettably add 3.8 support	2024-09-30 21:54:46 +08:00
wozeparrot	2b899164c6	no numpy (#6751 )	2024-09-26 16:40:18 +08:00
wozeparrot	c100f3d406	default threefry (#6116 )	2024-09-25 17:45:13 +08:00
George Hotz	dd575da7ee	real minimum cstyle change (#6709 ) * real minimum cstyle change * make it match * bring back DEFINE_GLOBAL store marking writable * bump line count to 9800 * closer * precompute don't render * cast/bitcast too * smem_align * vectorize * more pr match * remove that test * less PR diff	2024-09-25 12:40:46 +08:00
George Hotz	f45d178a55	hotfix: support JIT_BATCH_SIZE=0, make that the default	2024-09-25 10:36:04 +08:00
George Hotz	52e7f1c108	add new model CI	2024-09-25 10:23:06 +08:00
George Hotz	b0ffe2452b	bump line count to 9800	2024-09-25 09:15:30 +08:00
George Hotz	de259e3f09	hotfix: add compile3 to comma CI	2024-09-23 18:25:49 +08:00
qazal	e2d6e10ddf	hotfix: reset benchmarks cache for process replay (#6671 )	2024-09-23 15:13:02 +08:00
chenyu	26ebb7cab4	don't use div_folding in lt_folding (#6666 ) * don't use div_folding in lt_folding valids 35 -> 13 * fails the same as before	2024-09-23 01:50:18 -04:00
chenyu	da5b741656	removed valid in openpilot conv (#6619 ) 35 valids left	2024-09-23 00:30:18 -04:00
chenyu	1923932339	canonicalize simplex lt (#6658 ) (X := a0x0 + a1x1 + ...) > 0 is equivalent to x0 + x1 + ... > 0 if xi >= 0 and ai > 0 for ints	2024-09-22 23:04:47 -04:00
chenyu	5707503048	x//a<b -> x <a*b for positive a (#6622 ) openpilot valids 47 -> 37	2024-09-20 04:38:47 -04:00

... 7 8 9 10 11 ...

1003 Commits