tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-11 07:05:04 -05:00

Author	SHA1	Message	Date
George Hotz	4fed358511	hotfix: timeouts to 20 minutes. better no stats update than a red x	2024-10-25 16:31:52 +08:00
chenyu	d4c94d0d32	disable llama 1 4gpu and 6gpu benchmark (#7276 ) having llama3 4gpu and 6gpu should be good enough	2024-10-24 14:19:22 -04:00
chenyu	e6929f2402	RUN_PROCESS_REPLAY=0 on llama 70B and resnet training (#7272 ) * RUN_PROCESS_REPLAY=0 on llama 70B and resnet training also added a 15 minutes total timeout, this cannot grow indefinitely * add a few more * a few more just for NV	2024-10-24 12:09:54 -04:00
George Hotz	9f4ca88218	hotfix: relax target pct for beautiful_mnist	2024-10-17 12:36:07 +08:00
nimlgen	feb0bcb58b	qcom bench bind to perf cluster (#6996 )	2024-10-11 12:21:52 +03:00
nimlgen	f9d454aed5	correct kernargs alignment (#6984 )	2024-10-11 00:06:28 +03:00
qazal	b82023c97e	process replay cleanup to generic _pmap [pr] (#6929 ) * process replay cleanup to generic _pmap [pr] * delete `COMPARE_SCHEDULE`	2024-10-07 13:57:05 +08:00
George Hotz	f45d178a55	hotfix: support JIT_BATCH_SIZE=0, make that the default	2024-09-25 10:36:04 +08:00
George Hotz	52e7f1c108	add new model CI	2024-09-25 10:23:06 +08:00
George Hotz	de259e3f09	hotfix: add compile3 to comma CI	2024-09-23 18:25:49 +08:00
qazal	e2d6e10ddf	hotfix: reset benchmarks cache for process replay (#6671 )	2024-09-23 15:13:02 +08:00
nimlgen	d22b46a2ac	qcom in benchmarks (#6337 )	2024-09-02 19:59:11 +03:00
chenyu	7d46fb0c83	load balance NV benchmark ci (#6107 )	2024-08-16 10:08:08 -04:00
nimlgen	8f787785d9	fix openpilot benchmark (#6049 )	2024-08-12 21:12:32 +03:00
qazal	266afad8ed	hotfix: skip schedule capture in benchmarks (#6012 )	2024-08-10 17:13:53 +03:00
chenyu	adba5efc64	enable llama 2 70B in tinybox green CI (#5905 ) runnable with MAX_CONTEXT=256	2024-08-04 18:48:46 -04:00
wozeparrot	acadccf344	comma benchmark (#5518 )	2024-08-02 14:36:54 -07:00
wozeparrot	eebb1b9922	feat: temperature 0 llama3 benchmark (#5806 )	2024-07-30 12:05:36 -07:00
qazal	3e49d86c01	process replay diffs 3 things now (#5731 ) * github api infra * process replay is 3 parts now * parse benchmarks * add gh_token * complete diff * move process replay tests * last successful run * add tempdir * skip master	2024-07-27 12:52:20 +03:00
George Hotz	db1d093b29	reenable LLaMA-3 8B BEAM on NV (#5746 )	2024-07-26 16:56:41 -07:00
wozeparrot	6ccb2390c3	feat: update_benchmark_staging (#5529 )	2024-07-17 20:40:57 -07:00
wozeparrot	218e157f00	benchmark on update_benchmark_staging (#5541 )	2024-07-17 17:11:52 -07:00
chenyu	b17e4adb3a	add `-c advice.detachedHead=false` to process replay git checkout (#5419 ) remove the noisy `Note: switching to 'origin/master'. You are in 'detached HEAD' state. You can look around, make experimental changes...` in log	2024-07-12 15:13:26 -04:00
qazal	31fcc516dc	more process replay tooling (#5407 ) * replays * what's in there * can it be up there * sha is enough * insert sha as the key * fix str * update reset utils * that nested try/except was terrible * github_context can go	2024-07-12 13:11:34 +03:00
chenyu	322c37e621	use helpers.JIT in llama and gpt2 examples (#5350 ) * use helpers.JIT in llama and gpt2 examples replaced getenv("JIT"), effectively made gpt2 default jit * fix test_gpt2	2024-07-09 15:04:43 -04:00
chenyu	191463a919	add timing to SDXL (#5273 )	2024-07-02 23:29:54 -04:00
chenyu	5808c37302	hotfix disable flaky llama3 beam benchmark on green (#5249 )	2024-07-01 15:00:47 -04:00
chenyu	b9122ecdaf	revert stable diffusion validation with threefry (#5248 ) * Revert "use threefry in stable diffusion benchmark (#4988)" This reverts commit `44dfa37c70`. * sdxl and validation fix * relax threshold	2024-07-01 14:43:47 -04:00
chenyu	88763eb9ff	fix stable_diffusion with fp16 (#5239 )	2024-06-30 12:59:31 -04:00
nimlgen	6b08cb5e38	ptx runs on nv in benchmarks (#5224 )	2024-06-29 11:06:44 +03:00
chenyu	7090eac8cb	validate sdxl output and put it in benchmark (#5211 ) * validate sdxl output and put it in benchmark * don't print fetch progress_bar in CI	2024-06-28 11:40:52 -04:00
chenyu	d8dc43ad06	remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark (#5198 ) this no longer helps	2024-06-27 15:20:34 -04:00
chenyu	83da8b3558	use NV instead of CUDA in benchmark (#5192 ) also reenabled mixtral on green	2024-06-27 13:52:58 -04:00
chenyu	0c6c7c5f7b	CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark (#5191 ) ignoring beam cache but using compile cache should be fine, saved some benchmark time. also updated `beam_search` to check flag value before accessing diskcache	2024-06-27 13:15:18 -04:00
chenyu	c12de4f47d	benchmark use JITBEAM for llama and gpt2 (#5189 )	2024-06-27 12:56:02 -04:00
chenyu	e9c6a36894	remove CACHELEVEL=0 in llama3 benchmark (#5025 )	2024-06-17 22:43:16 -04:00
George Hotz	bee8fc29ee	add GPT2 half/half+beam to AMD (#5000 ) * add GPT2 half/half+beam to AMD * winograd in training. half and half/beam file upload	2024-06-16 14:07:14 -07:00
chenyu	44dfa37c70	use threefry in stable diffusion benchmark (#4988 ) also updated default steps to 10. easier to tell the image is following the prompt.	2024-06-15 20:25:29 -04:00
wozeparrot	ce1ed374c9	more tinychat fixes (#4971 )	2024-06-15 16:29:39 -07:00
qazal	ff8e9eefc3	hotfix: don't use ASSERT_COMPILE for benchmarks process replay (#4981 ) * use replay_codegen [run_process_replay] * disable for now [run_process_replay]	2024-06-15 16:57:47 +03:00
uuuvn	92f49efd06	Trigger process replay from pull request title [run_process_replay] (#4980 ) * Trigger process replay from pull request title * idk how this thing works btw * test if it will work * try 2 * Revert "idk how this thing works btw" This reverts commit `580da51b07`. * Revert "try 2" This reverts commit `7ff1e86d5d`. * test if it works * meh * Reapply "idk how this thing works btw" This reverts commit `dd33ad7c14`. * revert	2024-06-15 16:21:00 +03:00
George Hotz	f42183ba28	hotfix: relax cifar to 93.2	2024-06-09 13:09:21 +02:00
nimlgen	6327b50e51	amd in benchmarks (#4861 ) * amd in benchmarks * remove all hsa	2024-06-08 23:24:46 +03:00
qazal	240d6b5bc0	process replay benchmarks (#4668 )	2024-06-01 14:36:21 +03:00
chenyu	38bc38cdff	fix llama example quantize (#4699 ) * fix llama example quantize import quantize layers from new example llama3 add to mac benchmark * fix that * save the files	2024-05-23 15:35:26 -04:00
chenyu	72560e30fe	add CACHELEVEL=0 to tinybox green GEMM BEAM (#4693 ) * add CACHELEVEL=0 to tinybox green GEMM BEAM * BEAM=4 is more stable	2024-05-22 23:59:50 -04:00
wozeparrot	00432496d7	feat: tinyboxgreen (#4366 ) * feat: tinyboxgreen * feat: tinyboxgreenv2 * fix symlink weights * fix: remove llama 2 70b for now * feat: naming * fix: remove extra cifar steps * feat: disable mixtral on nvidia	2024-05-20 22:39:34 -04:00
chenyu	8a0d1ca7bb	CI test timeout 20 min -> 10 min (#4645 ) if it takes more than 10 usually setup fails anyway. also updated matmul_kfd -> matmul_amd in benchmark	2024-05-18 13:58:28 -04:00
George Hotz	07b350a8f4	new uops is an actual graph (#4560 ) * new uops is an actual graph * it's way slower * simpler * fix define acc * render_loop unique * ops test pass * add pattern matcher back, there's bugs * rewrite * use priority queue * recursive children * fix tests * fix tests with SINK * fix abstractions * fix assembly * simpler * link define_acc * fix DEFINE_ACC placement * type verify * full cmp * fix cmp * ACCESS_ACC * insert DEFINE_ACC * fix PHI * recursive rewrite * fix many tests * sum collapse * more patterns * correct change * fold arange * fix that lin test * space * big folding rule works * close * has more maxes, meh * cached node replace * set changed * simplest folding yet * works * works * DIV * all tests pass * del * fuzz linearizer fails * sum_collapse * test depth 2 cf * fix lin test 14 * fix clang depth * disable that * failure 14 is fixed * fix ptx * failure 27 is fixed * fix llama * run_cnt * Revert "Optimize PTX gated loads index calculation (#4304)" This reverts commit `d97d5a7689`. * fix uops loop * fix ptx bugs * add barrier * print * mem_type in ptx direct * bypass tests that fail in CI but pass locally * ptx remove ptr_ar * more ptx passing * fix ptx tests * assert compile support * remove model inference benchmark from red	2024-05-17 18:00:18 -07:00
chenyu	ca1df20fa9	benchmark name fix - resnet eval is on eval data (#4628 )	2024-05-17 12:56:12 -04:00

1 2 3 4

155 Commits