tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-17 10:02:00 -05:00

Author	SHA1	Message	Date
chenyu	4ee3243c93	JITBEAM=2 for LLaMA-3 8B on 4 GPUs [pr] (#8623 ) is it fast?	2025-01-14 19:52:38 -05:00
George Hotz	bfbe81df71	remove cast before view (#8613 ) * remove cast before view * greener * indexing * that passes too * openpilot too * ack --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-14 15:04:58 -05:00
chenyu	393eec3201	raise RuntimeError for uneven shard [pr] (#8593 ) no 7B llama on 6 GPUs skip 70B	2025-01-14 14:51:48 -05:00
nimlgen	1ff6862a3d	ci: sleep a bit to let the driver unload the prev pid (#8605 )	2025-01-14 15:55:23 +03:00
nimlgen	74b83c4c41	am in ci (#8532 ) * try am in ci * no sudo * temp * run more am test * run half on am * insert amdgpu * other machine as well	2025-01-13 19:55:17 +03:00
qazal	60503c8621	use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564 )	2025-01-11 06:03:48 -05:00
chenyu	85a4397f27	fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522 ) * fix create_schedule_with_vars usage in allreduce benchmark [pr] because i didn't know how to use it... * increase time limit because tiny17 is slow	2025-01-07 01:30:01 -05:00
chenyu	0061dc7447	fix benchmark allreduce and add to ci [pr] (#8521 )	2025-01-07 00:37:59 -05:00
ignaciosica	0a00187dce	add real AMX tests to benchmark (#8216 ) * add real amx to benchmark * add debug=2 to check tc are triggered	2024-12-13 14:03:41 -05:00
chenyu	d462f8ace0	use HALF in cifar wino benchmarks (#8153 ) more representative as it hits tensor cores on tinyboxes	2024-12-10 20:21:00 -05:00
George Hotz	f83d715f41	move checks into compile3, delete compile2 [pr] (#8127 ) * move checks into compile3 [pr] * test_vs_onnx * test v torch works * float16 won't compile on compile3 * actually delete compile2	2024-12-09 14:21:42 -08:00
George Hotz	87c360c4b5	hotfix: add --size 8B to llama3	2024-12-09 07:53:20 -08:00
chenyu	3c8c98253a	BEAM_DEBUG=1 in speed_v_theoretical (#7942 ) * DEBUG=3 in speed_v_theoretical * BEAM_DEBUG=1	2024-11-28 08:30:55 -05:00
chenyu	a6171cbe71	add stable diffusion v2 to mac benchmark (#7917 ) this caught #7902	2024-11-26 22:09:43 -05:00
chenyu	ac57d82a13	test_tiny on real NV/CUDA/AMD/HIP (#7886 ) simple tests that run on real CUDA and HIP	2024-11-24 16:34:54 -05:00
chenyu	5c5b1b994c	less flaky benchmarks (#7855 ) JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830	2024-11-22 16:39:39 -05:00
chenyu	d5c9fafff5	default run stable diffusion benchmark with fp16 (#7831 ) and keep the non-fp16 one in mac	2024-11-21 15:58:17 -05:00
chenyu	c815d7b56e	run bfloat16 tensor core in metal benchmark (#7808 ) * run bfloat16 tensor core in metal benchmark * separate task	2024-11-20 15:34:07 -05:00
chenyu	e6cfaaa496	metal benchmark JIT=2 -> JIT=1 (#7661 )	2024-11-12 22:55:27 -05:00
chenyu	1884f021e3	add conv3x3 to speed_v_theoretical (#7658 ) * add conv3x3 to speed_v_theoretical * show test duration	2024-11-12 16:41:56 -05:00
chenyu	a88a15c7e8	setup perflevel in red CI (#7645 ) runs v4.1 bert setup. ``` rocm-smi --setprofile compute rocm-smi --setmclk 3 rocm-smi --setperflevel high ```	2024-11-11 18:44:55 -05:00
chenyu	773d5b60bf	beam benchmark tests (#7638 ) * beam benchmark tests * lower AMD number somehow * less flaky	2024-11-11 18:11:18 -05:00
chenyu	bfab03288d	fix HALF=1 in test_speed_v_torch (#7642 ) * fix HALF=1 in test_speed_v_torch "operation cache defeats" adds 1 to all arg, which were centered around 0. adding 1 makes big matmul and matvec go inf. fixed by subtract 1 after and bumpped tolerance for half input * bigger tol for BIG=2, update CI too * bigger tol	2024-11-11 14:29:37 -05:00
George Hotz	b4cb6b89f9	hotfix: CI mac uses python 3.11	2024-11-11 23:42:35 +08:00
George Hotz	9648372ee6	hotfix: mac uses python 3.12	2024-11-11 23:23:48 +08:00
George Hotz	6f93e91deb	hotfix: lower mnist threshold for non determinism	2024-11-03 11:05:12 +08:00
George Hotz	4fed358511	hotfix: timeouts to 20 minutes. better no stats update than a red x	2024-10-25 16:31:52 +08:00
chenyu	d4c94d0d32	disable llama 1 4gpu and 6gpu benchmark (#7276 ) having llama3 4gpu and 6gpu should be good enough	2024-10-24 14:19:22 -04:00
chenyu	e6929f2402	RUN_PROCESS_REPLAY=0 on llama 70B and resnet training (#7272 ) * RUN_PROCESS_REPLAY=0 on llama 70B and resnet training also added a 15 minutes total timeout, this cannot grow indefinitely * add a few more * a few more just for NV	2024-10-24 12:09:54 -04:00
George Hotz	9f4ca88218	hotfix: relax target pct for beautiful_mnist	2024-10-17 12:36:07 +08:00
nimlgen	feb0bcb58b	qcom bench bind to perf cluster (#6996 )	2024-10-11 12:21:52 +03:00
nimlgen	f9d454aed5	correct kernargs alignment (#6984 )	2024-10-11 00:06:28 +03:00
qazal	b82023c97e	process replay cleanup to generic _pmap [pr] (#6929 ) * process replay cleanup to generic _pmap [pr] * delete `COMPARE_SCHEDULE`	2024-10-07 13:57:05 +08:00
George Hotz	f45d178a55	hotfix: support JIT_BATCH_SIZE=0, make that the default	2024-09-25 10:36:04 +08:00
George Hotz	52e7f1c108	add new model CI	2024-09-25 10:23:06 +08:00
George Hotz	de259e3f09	hotfix: add compile3 to comma CI	2024-09-23 18:25:49 +08:00
qazal	e2d6e10ddf	hotfix: reset benchmarks cache for process replay (#6671 )	2024-09-23 15:13:02 +08:00
nimlgen	d22b46a2ac	qcom in benchmarks (#6337 )	2024-09-02 19:59:11 +03:00
chenyu	7d46fb0c83	load balance NV benchmark ci (#6107 )	2024-08-16 10:08:08 -04:00
nimlgen	8f787785d9	fix openpilot benchmark (#6049 )	2024-08-12 21:12:32 +03:00
qazal	266afad8ed	hotfix: skip schedule capture in benchmarks (#6012 )	2024-08-10 17:13:53 +03:00
chenyu	adba5efc64	enable llama 2 70B in tinybox green CI (#5905 ) runnable with MAX_CONTEXT=256	2024-08-04 18:48:46 -04:00
wozeparrot	acadccf344	comma benchmark (#5518 )	2024-08-02 14:36:54 -07:00
wozeparrot	eebb1b9922	feat: temperature 0 llama3 benchmark (#5806 )	2024-07-30 12:05:36 -07:00
qazal	3e49d86c01	process replay diffs 3 things now (#5731 ) * github api infra * process replay is 3 parts now * parse benchmarks * add gh_token * complete diff * move process replay tests * last successful run * add tempdir * skip master	2024-07-27 12:52:20 +03:00
George Hotz	db1d093b29	reenable LLaMA-3 8B BEAM on NV (#5746 )	2024-07-26 16:56:41 -07:00
wozeparrot	6ccb2390c3	feat: update_benchmark_staging (#5529 )	2024-07-17 20:40:57 -07:00
wozeparrot	218e157f00	benchmark on update_benchmark_staging (#5541 )	2024-07-17 17:11:52 -07:00
chenyu	b17e4adb3a	add `-c advice.detachedHead=false` to process replay git checkout (#5419 ) remove the noisy `Note: switching to 'origin/master'. You are in 'detached HEAD' state. You can look around, make experimental changes...` in log	2024-07-12 15:13:26 -04:00
qazal	31fcc516dc	more process replay tooling (#5407 ) * replays * what's in there * can it be up there * sha is enough * insert sha as the key * fix str * update reset utils * that nested try/except was terrible * github_context can go	2024-07-12 13:11:34 +03:00

1 2 3 4

181 Commits