tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-05 20:24:57 -05:00

Author	SHA1	Message	Date
George Hotz	f45d178a55	hotfix: support JIT_BATCH_SIZE=0, make that the default	2024-09-25 10:36:04 +08:00
George Hotz	52e7f1c108	add new model CI	2024-09-25 10:23:06 +08:00
George Hotz	de259e3f09	hotfix: add compile3 to comma CI	2024-09-23 18:25:49 +08:00
qazal	e2d6e10ddf	hotfix: reset benchmarks cache for process replay (#6671 )	2024-09-23 15:13:02 +08:00
nimlgen	d22b46a2ac	qcom in benchmarks (#6337 )	2024-09-02 19:59:11 +03:00
chenyu	7d46fb0c83	load balance NV benchmark ci (#6107 )	2024-08-16 10:08:08 -04:00
nimlgen	8f787785d9	fix openpilot benchmark (#6049 )	2024-08-12 21:12:32 +03:00
qazal	266afad8ed	hotfix: skip schedule capture in benchmarks (#6012 )	2024-08-10 17:13:53 +03:00
chenyu	adba5efc64	enable llama 2 70B in tinybox green CI (#5905 ) runnable with MAX_CONTEXT=256	2024-08-04 18:48:46 -04:00
wozeparrot	acadccf344	comma benchmark (#5518 )	2024-08-02 14:36:54 -07:00
wozeparrot	eebb1b9922	feat: temperature 0 llama3 benchmark (#5806 )	2024-07-30 12:05:36 -07:00
qazal	3e49d86c01	process replay diffs 3 things now (#5731 ) * github api infra * process replay is 3 parts now * parse benchmarks * add gh_token * complete diff * move process replay tests * last successful run * add tempdir * skip master	2024-07-27 12:52:20 +03:00
George Hotz	db1d093b29	reenable LLaMA-3 8B BEAM on NV (#5746 )	2024-07-26 16:56:41 -07:00
wozeparrot	6ccb2390c3	feat: update_benchmark_staging (#5529 )	2024-07-17 20:40:57 -07:00
wozeparrot	218e157f00	benchmark on update_benchmark_staging (#5541 )	2024-07-17 17:11:52 -07:00
chenyu	b17e4adb3a	add `-c advice.detachedHead=false` to process replay git checkout (#5419 ) remove the noisy `Note: switching to 'origin/master'. You are in 'detached HEAD' state. You can look around, make experimental changes...` in log	2024-07-12 15:13:26 -04:00
qazal	31fcc516dc	more process replay tooling (#5407 ) * replays * what's in there * can it be up there * sha is enough * insert sha as the key * fix str * update reset utils * that nested try/except was terrible * github_context can go	2024-07-12 13:11:34 +03:00
chenyu	322c37e621	use helpers.JIT in llama and gpt2 examples (#5350 ) * use helpers.JIT in llama and gpt2 examples replaced getenv("JIT"), effectively made gpt2 default jit * fix test_gpt2	2024-07-09 15:04:43 -04:00
chenyu	191463a919	add timing to SDXL (#5273 )	2024-07-02 23:29:54 -04:00
chenyu	5808c37302	hotfix disable flaky llama3 beam benchmark on green (#5249 )	2024-07-01 15:00:47 -04:00
chenyu	b9122ecdaf	revert stable diffusion validation with threefry (#5248 ) * Revert "use threefry in stable diffusion benchmark (#4988)" This reverts commit `44dfa37c70`. * sdxl and validation fix * relax threshold	2024-07-01 14:43:47 -04:00
chenyu	88763eb9ff	fix stable_diffusion with fp16 (#5239 )	2024-06-30 12:59:31 -04:00
nimlgen	6b08cb5e38	ptx runs on nv in benchmarks (#5224 )	2024-06-29 11:06:44 +03:00
chenyu	7090eac8cb	validate sdxl output and put it in benchmark (#5211 ) * validate sdxl output and put it in benchmark * don't print fetch progress_bar in CI	2024-06-28 11:40:52 -04:00
chenyu	d8dc43ad06	remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark (#5198 ) this no longer helps	2024-06-27 15:20:34 -04:00
chenyu	83da8b3558	use NV instead of CUDA in benchmark (#5192 ) also reenabled mixtral on green	2024-06-27 13:52:58 -04:00
chenyu	0c6c7c5f7b	CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark (#5191 ) ignoring beam cache but using compile cache should be fine, saved some benchmark time. also updated `beam_search` to check flag value before accessing diskcache	2024-06-27 13:15:18 -04:00
chenyu	c12de4f47d	benchmark use JITBEAM for llama and gpt2 (#5189 )	2024-06-27 12:56:02 -04:00
chenyu	e9c6a36894	remove CACHELEVEL=0 in llama3 benchmark (#5025 )	2024-06-17 22:43:16 -04:00
George Hotz	bee8fc29ee	add GPT2 half/half+beam to AMD (#5000 ) * add GPT2 half/half+beam to AMD * winograd in training. half and half/beam file upload	2024-06-16 14:07:14 -07:00
chenyu	44dfa37c70	use threefry in stable diffusion benchmark (#4988 ) also updated default steps to 10. easier to tell the image is following the prompt.	2024-06-15 20:25:29 -04:00
wozeparrot	ce1ed374c9	more tinychat fixes (#4971 )	2024-06-15 16:29:39 -07:00
qazal	ff8e9eefc3	hotfix: don't use ASSERT_COMPILE for benchmarks process replay (#4981 ) * use replay_codegen [run_process_replay] * disable for now [run_process_replay]	2024-06-15 16:57:47 +03:00
uuuvn	92f49efd06	Trigger process replay from pull request title [run_process_replay] (#4980 ) * Trigger process replay from pull request title * idk how this thing works btw * test if it will work * try 2 * Revert "idk how this thing works btw" This reverts commit `580da51b07`. * Revert "try 2" This reverts commit `7ff1e86d5d`. * test if it works * meh * Reapply "idk how this thing works btw" This reverts commit `dd33ad7c14`. * revert	2024-06-15 16:21:00 +03:00
George Hotz	f42183ba28	hotfix: relax cifar to 93.2	2024-06-09 13:09:21 +02:00
nimlgen	6327b50e51	amd in benchmarks (#4861 ) * amd in benchmarks * remove all hsa	2024-06-08 23:24:46 +03:00
qazal	240d6b5bc0	process replay benchmarks (#4668 )	2024-06-01 14:36:21 +03:00
chenyu	38bc38cdff	fix llama example quantize (#4699 ) * fix llama example quantize import quantize layers from new example llama3 add to mac benchmark * fix that * save the files	2024-05-23 15:35:26 -04:00
chenyu	72560e30fe	add CACHELEVEL=0 to tinybox green GEMM BEAM (#4693 ) * add CACHELEVEL=0 to tinybox green GEMM BEAM * BEAM=4 is more stable	2024-05-22 23:59:50 -04:00
wozeparrot	00432496d7	feat: tinyboxgreen (#4366 ) * feat: tinyboxgreen * feat: tinyboxgreenv2 * fix symlink weights * fix: remove llama 2 70b for now * feat: naming * fix: remove extra cifar steps * feat: disable mixtral on nvidia	2024-05-20 22:39:34 -04:00
chenyu	8a0d1ca7bb	CI test timeout 20 min -> 10 min (#4645 ) if it takes more than 10 usually setup fails anyway. also updated matmul_kfd -> matmul_amd in benchmark	2024-05-18 13:58:28 -04:00
George Hotz	07b350a8f4	new uops is an actual graph (#4560 ) * new uops is an actual graph * it's way slower * simpler * fix define acc * render_loop unique * ops test pass * add pattern matcher back, there's bugs * rewrite * use priority queue * recursive children * fix tests * fix tests with SINK * fix abstractions * fix assembly * simpler * link define_acc * fix DEFINE_ACC placement * type verify * full cmp * fix cmp * ACCESS_ACC * insert DEFINE_ACC * fix PHI * recursive rewrite * fix many tests * sum collapse * more patterns * correct change * fold arange * fix that lin test * space * big folding rule works * close * has more maxes, meh * cached node replace * set changed * simplest folding yet * works * works * DIV * all tests pass * del * fuzz linearizer fails * sum_collapse * test depth 2 cf * fix lin test 14 * fix clang depth * disable that * failure 14 is fixed * fix ptx * failure 27 is fixed * fix llama * run_cnt * Revert "Optimize PTX gated loads index calculation (#4304)" This reverts commit `d97d5a7689`. * fix uops loop * fix ptx bugs * add barrier * print * mem_type in ptx direct * bypass tests that fail in CI but pass locally * ptx remove ptr_ar * more ptx passing * fix ptx tests * assert compile support * remove model inference benchmark from red	2024-05-17 18:00:18 -07:00
chenyu	ca1df20fa9	benchmark name fix - resnet eval is on eval data (#4628 )	2024-05-17 12:56:12 -04:00
chenyu	e5d4e6a8aa	BEAM=2 in green CI for 100 TFLOPS (#4624 )	2024-05-16 23:28:28 -04:00
George Hotz	fd02ab1e8b	move disassemblers and openpilot (#4592 ) * move disassemblers and openpilot * delete junk * put that in pre-commit * fixup readme	2024-05-14 19:30:02 -07:00
chenyu	5de4a46f10	re-enable gpt2 half/beam mac benchmark (#4496 ) * re-enable gpt2 half/beam mac benchmark from fuzzer it seems to be flaky due to numerical issue, not kernel bug. we used to have half in splitted reduce. run this in M1 Max for 20 loops and it's fine * that should be jitted	2024-05-09 19:15:32 -04:00
chenyu	c508eb7425	revert the removal of CAST_BEFORE_VIEW (#4471 ) this brings most of the memory gain for resnet back.	2024-05-08 00:14:29 -04:00
chenyu	d4062cb6fc	NV tensor_cores in kernel.py (#4399 )	2024-05-02 22:33:08 -04:00
chenyu	dce7ac0160	NOCLANG=1 for tinybox green ci. (#4378 ) CLANG was disabled for tinybox red for speed	2024-05-01 13:31:01 -04:00
wozeparrot	4a26718ca9	feat: tinyboxgreen (#4365 )	2024-04-30 19:05:37 -04:00

1 2 3

148 Commits