tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 13:28:06 -05:00

Author	SHA1	Message	Date
nimlgen	6b08cb5e38	ptx runs on nv in benchmarks (#5224 )	2024-06-29 11:06:44 +03:00
nimlgen	b4c49ae3fa	remove cudacpu in favour of mockgpu (#5225 ) * remove cudacpu in favour of mockgpu * remove unused import * not used as well	2024-06-29 11:05:16 +03:00
chenyu	7090eac8cb	validate sdxl output and put it in benchmark (#5211 ) * validate sdxl output and put it in benchmark * don't print fetch progress_bar in CI	2024-06-28 11:40:52 -04:00
chenyu	d8dc43ad06	remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark (#5198 ) this no longer helps	2024-06-27 15:20:34 -04:00
chenyu	83da8b3558	use NV instead of CUDA in benchmark (#5192 ) also reenabled mixtral on green	2024-06-27 13:52:58 -04:00
chenyu	0c6c7c5f7b	CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark (#5191 ) ignoring beam cache but using compile cache should be fine, saved some benchmark time. also updated `beam_search` to check flag value before accessing diskcache	2024-06-27 13:15:18 -04:00
chenyu	c12de4f47d	benchmark use JITBEAM for llama and gpt2 (#5189 )	2024-06-27 12:56:02 -04:00
qazal	3af17849bf	safely parse quoted titles [run_process_replay] (#5183 )	2024-06-27 16:39:48 +03:00
qazal	6ca7b13ed1	limit pickled objects [run_process_replay] (#5154 ) * limit pickled objects * delete uop from the list * debug metal * need self.opts for TC * dont need device * [run_process_replay] * minor	2024-06-26 13:51:32 +03:00
qazal	8aa786232d	docs for running process replay locally (#5083 )	2024-06-21 09:55:08 -04:00
nimlgen	fb1bf48cfe	io_uring for copies from disk (#5035 ) * exp uring * fixes and old version * nv * cleaner * cmp vs aio * fix * no lib * fix nv * linter * disk_speed_test now runs default * fixes * uring -> io_uring * linter happy * get_temp_buf comment added * tiny nits * put wait back * test runs everywhere * remove consts * remove mmap consts * do not require iouring to run test, they are generic	2024-06-21 11:36:51 +03:00
qazal	97f1347dd9	fix check_process_replay for special characters (#5072 ) * 'test' [run_process_replay] [no_assert] * test with ( ) { } '' " " * remove the log [run_process_replay] '' () { } '{ * helpful echos [run_process_replay] [no_assert] () '' * test [run_process_replay] [no_assert] * test2 [run_process_replay] [no_assert] * test3 [run_process_replay] [no_assert] * it's also correct this way [run_process_replay] [no_assert] * remove extras [run_process_replay]	2024-06-20 20:23:29 +03:00
qazal	a6a5dba637	Revert "UPat for has_valid in load/store (#5052 )" (#5056 ) * manually insert in the Linearizer * fix process replay	2024-06-19 20:53:36 +03:00
qazal	ee01e464e3	use process replay as a diff creator (#4903 ) * add no_assert option [run_process_replay] [no_assert] * test [run_process_replay] [no_assert] * [run_process_replay] * back to normal [run_process_replay] * remove the log	2024-06-19 18:17:31 +03:00
chenyu	dc942bf1f6	jit sampling functionn in test_randomness.test_multinomial (#5034 ) * jit sampling functionn in test_randomness.test_multinomial `THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec * skip that	2024-06-18 14:21:05 -04:00
chenyu	e9c6a36894	remove CACHELEVEL=0 in llama3 benchmark (#5025 )	2024-06-17 22:43:16 -04:00
chenyu	acaf9a490d	RECIP(-0.0) should be -inf (#5024 ) * RECIP(-0.0) should be -inf added test_dtype_alu for PYTHON backend * catcht that * fix those two	2024-06-17 22:26:58 -04:00
George Hotz	bee8fc29ee	add GPT2 half/half+beam to AMD (#5000 ) * add GPT2 half/half+beam to AMD * winograd in training. half and half/beam file upload	2024-06-16 14:07:14 -07:00
chenyu	44dfa37c70	use threefry in stable diffusion benchmark (#4988 ) also updated default steps to 10. easier to tell the image is following the prompt.	2024-06-15 20:25:29 -04:00
wozeparrot	ce1ed374c9	more tinychat fixes (#4971 )	2024-06-15 16:29:39 -07:00
qazal	ff8e9eefc3	hotfix: don't use ASSERT_COMPILE for benchmarks process replay (#4981 ) * use replay_codegen [run_process_replay] * disable for now [run_process_replay]	2024-06-15 16:57:47 +03:00
uuuvn	92f49efd06	Trigger process replay from pull request title [run_process_replay] (#4980 ) * Trigger process replay from pull request title * idk how this thing works btw * test if it will work * try 2 * Revert "idk how this thing works btw" This reverts commit `580da51b07`. * Revert "try 2" This reverts commit `7ff1e86d5d`. * test if it works * meh * Reapply "idk how this thing works btw" This reverts commit `dd33ad7c14`. * revert	2024-06-15 16:21:00 +03:00
wozeparrot	62dc36d371	autogen _try_dlopen (#4949 )	2024-06-14 12:12:18 -07:00
chenyu	f902af4f0b	increase metal ci test timeout to 20 minutes (#4920 ) make it less annoying for now	2024-06-11 18:45:51 -04:00
qazal	7f3d9e6d94	revert hsa autogen removal (#4914 ) * Revert "only install comgr in AMD CI (#4909)" This reverts commit `7f03420d05`. * rocm-llvm only removal	2024-06-11 12:55:45 -04:00
qazal	7f03420d05	only install comgr in AMD CI (#4909 ) * test * delete hsa autogen	2024-06-11 06:19:33 -04:00
qazal	8b5bcf309a	process replay in all of CI (#4884 )	2024-06-10 14:49:29 -04:00
George Hotz	f42183ba28	hotfix: relax cifar to 93.2	2024-06-09 13:09:21 +02:00
nimlgen	654a8b9ef7	retire hsa (#4885 ) * retire hsa * EMULATE_AMD	2024-06-09 11:33:03 +03:00
nimlgen	6327b50e51	amd in benchmarks (#4861 ) * amd in benchmarks * remove all hsa	2024-06-08 23:24:46 +03:00
qazal	66dfd5e7bf	faster codegen process replay (#4858 ) * faster codegen process replay * use self.copy * regenerate * delete copy * test a real error [run_process_replay] * revert the error change	2024-06-07 16:20:57 +03:00
qazal	0db9674dea	skip process replay on master (#4808 )	2024-06-03 12:29:28 +03:00
qazal	f64fa51a64	process replay for test/* (#4799 ) * add input to unit tests [run_process_replay] * add setup [run_process_replay] * run tests [run_process_replay] * add cuda and amd [run_process_replay] * run everything but BEAM=2 [run_process_replay] * skip export_model [run_process_replay] * fix amd CI * add concurrency back	2024-06-03 12:01:58 +03:00
qazal	240d6b5bc0	process replay benchmarks (#4668 )	2024-06-01 14:36:21 +03:00
nimlgen	bd2e7c8b31	amd registers from file (#4778 ) * amd registers from file * remove commentes * linetr * no off	2024-05-31 18:48:57 +03:00
Szymon Ożóg	a4de81e9a6	Update ocelot version (#4715 )	2024-05-24 14:32:53 -04:00
chenyu	38bc38cdff	fix llama example quantize (#4699 ) * fix llama example quantize import quantize layers from new example llama3 add to mac benchmark * fix that * save the files	2024-05-23 15:35:26 -04:00
chenyu	72560e30fe	add CACHELEVEL=0 to tinybox green GEMM BEAM (#4693 ) * add CACHELEVEL=0 to tinybox green GEMM BEAM * BEAM=4 is more stable	2024-05-22 23:59:50 -04:00
Yury Zhuravlev	af56f0e68a	fix HSA/KFD load for system-wide installation (#4218 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2024-05-22 20:33:21 -07:00
nimlgen	12339f6564	disable cuda test in ci (#4630 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-05-22 23:23:32 -04:00
qazal	498cf3e7e0	fuzzer path search for DEFINE_ACC (#4656 ) * insert acc * add test_ops * find toposorts * todo - not yet ready * remove the import * atol and childless children	2024-05-23 00:50:01 +03:00
qazal	458a3961eb	catch compile errors in uops tests (#4672 ) * use helper and compile * llama beam=2 * ast length * skip float4, fix hsa * use empty tensors	2024-05-21 12:20:35 +03:00
wozeparrot	00432496d7	feat: tinyboxgreen (#4366 ) * feat: tinyboxgreen * feat: tinyboxgreenv2 * fix symlink weights * fix: remove llama 2 70b for now * feat: naming * fix: remove extra cifar steps * feat: disable mixtral on nvidia	2024-05-20 22:39:34 -04:00
chenyu	8a0d1ca7bb	CI test timeout 20 min -> 10 min (#4645 ) if it takes more than 10 usually setup fails anyway. also updated matmul_kfd -> matmul_amd in benchmark	2024-05-18 13:58:28 -04:00
George Hotz	b74cc1d01a	uops cleanup (#4634 ) * def add cleanup * minor speedup * add back ptx speed * a little faster * merge that * only linearize once for ptx * two graph rewrites for ptx, bug?	2024-05-17 20:02:38 -07:00
George Hotz	07b350a8f4	new uops is an actual graph (#4560 ) * new uops is an actual graph * it's way slower * simpler * fix define acc * render_loop unique * ops test pass * add pattern matcher back, there's bugs * rewrite * use priority queue * recursive children * fix tests * fix tests with SINK * fix abstractions * fix assembly * simpler * link define_acc * fix DEFINE_ACC placement * type verify * full cmp * fix cmp * ACCESS_ACC * insert DEFINE_ACC * fix PHI * recursive rewrite * fix many tests * sum collapse * more patterns * correct change * fold arange * fix that lin test * space * big folding rule works * close * has more maxes, meh * cached node replace * set changed * simplest folding yet * works * works * DIV * all tests pass * del * fuzz linearizer fails * sum_collapse * test depth 2 cf * fix lin test 14 * fix clang depth * disable that * failure 14 is fixed * fix ptx * failure 27 is fixed * fix llama * run_cnt * Revert "Optimize PTX gated loads index calculation (#4304)" This reverts commit `d97d5a7689`. * fix uops loop * fix ptx bugs * add barrier * print * mem_type in ptx direct * bypass tests that fail in CI but pass locally * ptx remove ptr_ar * more ptx passing * fix ptx tests * assert compile support * remove model inference benchmark from red	2024-05-17 18:00:18 -07:00
chenyu	ca1df20fa9	benchmark name fix - resnet eval is on eval data (#4628 )	2024-05-17 12:56:12 -04:00
chenyu	e5d4e6a8aa	BEAM=2 in green CI for 100 TFLOPS (#4624 )	2024-05-16 23:28:28 -04:00
nimlgen	eb9689336e	nv mockgpu (#4600 ) * mockgpu nv * works * comment that out * fix merge * setup gpuocelot * install packages * not run all of them * passes * fix ci * almost * should pass * linter * linter 2 * try this? * ugn, not supported * ci * remove ticket from description * better descs	2024-05-15 23:46:08 +03:00
George Hotz	5ba611787d	move image into tensor.py. delete features (#4603 ) * move image into tensor.py * change setup.py * openpilot tests need pythonpath now	2024-05-15 10:50:25 -07:00

... 10 11 12 13 14 ...

1021 Commits