tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-17 02:48:03 -05:00

Author	SHA1	Message	Date
chenyu	285534ce64	delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES (#12744 ) does nothing now	2025-10-16 14:11:33 -04:00
chenyu	53478c741d	relax ASSERT_MIN_STEP_TIME for space lab policy (#12742 )	2025-10-16 11:40:36 -04:00
chenyu	b8cf35fb77	print macOS version in CI (#12705 )	2025-10-15 15:05:33 -04:00
chenyu	89df6f611d	reenable sdxl mac benchmark (#12680 ) also updated faster sd step times	2025-10-14 17:36:17 -04:00
Sieds Lykles	e625c27598	update min step times openpilot (#12600 )	2025-10-10 11:24:27 +02:00
chenyu	be05028419	move ASSERT_MIN_STEP_TIME to compile3 (#12535 ) threshold is current time +20%	2025-10-08 22:16:59 -04:00
chenyu	5986d656a2	tighter ASSERT_MIN_STEP_TIME (#12531 ) set to about 1.2x of actual time now	2025-10-08 21:22:54 -04:00
George Hotz	3b0b3a2e64	fast RANGEIFY (#12504 ) * rtoposort is fast, can replace rangeify with this * fast rangeify * work * fast rangeify works for mnist * should work * progress * pad fix * FAST * tests passing * don't delete those shape ops * put in rangeify map * ending ranges fix * tests * mstack/mselect no hacks * move to indexing.py * touch up tests + add comments * disable failing test * actually make the file readable * failing * error	2025-10-08 19:38:06 +08:00
chenyu	eb3bc277b3	remove ASSERT_MIN_STEP_TIME in external_benchmark_openpilot (#12495 ) should add for compile3 and compile 3 only	2025-10-07 22:13:42 -04:00
chenyu	fe774a4319	more skip WINO on benchmark (#12482 )	2025-10-07 03:43:51 -04:00
chenyu	8ad5f9e74f	skip slow benchmarks (#12481 ) * skip slow benchmarks padded tc is already slow, rest are slow with rangeify (correct if run locally) * relax more	2025-10-07 03:28:56 -04:00
Sieds Lykles	e74be4a140	UOp.factor and add chain sorting (#12413 ) * add ordering * fix some tests * fix more tests * shorten comment * update test * add rule and test * add rule and test * remove check * use fold_divmod_congruence instead of simplify * adjust tests * shorten line * new algo * add test * add function to un-nest the div * add UOp.factor * test UOp.factor * uop_given_valid tries to factor simplex expression * shorten line * symbolic_flat is back * change that back * fix those new tests * new rule for ordering * factor multiple factors * no symbolic_flat * symbolic_flat to there * move that back * fix imports * merge correctly * linter happy * add rule * add a test * cleanup * revert that for now * UOp.factor returns self instead of None * try all_candidates * remove or_else * post index symbolic * add test * maket this closer to the original * increase mac hlb_cifar min step time * add some ordering tests * cleanup * increase pytest timeout time * check dtype	2025-10-04 06:05:38 +02:00
chenyu	494bb12500	skip slow cifar bf16 on red benchmark (#12213 ) very slow to compile the fake bf16	2025-09-16 14:55:01 -04:00
chenyu	419e997187	increase benchmark timeout (#12212 ) account for compile cache, and it's annoying that job died due to timeout also messes the machine	2025-09-16 14:09:02 -04:00
nimlgen	fb96394ff5	auto-select available compilers (#12094 ) * device: auto select compilers * fix * metal+opencl * nv/cuda * test without ptx * ptx * fix tests * fix * fix test * rename * test + cleaner * xx * ops * better test * win? * um? * types * debug * win?? * sep rung * wtf? * debug * skip win * revert this * types	2025-09-10 19:52:01 +03:00
Sieds Lykles	5b73076e48	assert benchmark times (#12042 ) * assert jitted times in openpilot * better error * better error * add ASSERT_MIN_STEP_TIME to more models * t is step_times * update benchmark times * update times	2025-09-09 23:40:02 +02:00
nimlgen	1c6c42715f	unify cpu and llvm (#11982 ) * try unify cpu and llvm * fixes * fix * ops * no llvm * fix * rm * lvmm is ot * oops * override * no llvm * ignore * skip llvm * ooops	2025-09-09 13:54:44 +03:00
George Hotz	433581f8ed	make POSTOPT=2 the default (#12034 ) * make POSTOPT=2 the default * more matching tc * fix winograd * fix that test * add matvec to Scheduler * flip tc sort order * similar speed * fix beam on image * disable slow tests * slow	2025-09-05 14:34:05 -07:00
George Hotz	560df206cc	split tc test (#12003 ) * split tc test * split hand coded opts * remove some skipped tests * skips on emulated	2025-09-04 11:47:56 -07:00
George Hotz	9dee724fc4	make EMULATE a context var (#12002 ) * make EMULATE a context var * fix test amx	2025-09-04 11:15:43 -07:00
nimlgen	897254ad6c	ci: add dev<->cpu copy speeds (#11959 )	2025-09-02 15:22:44 +03:00
George Hotz	8af8808c61	cleanup tests, bump caches (#11746 )	2025-08-19 21:21:07 -07:00
George Hotz	1d307f568c	move device tests to test/device + test cleanups (#11735 ) * move device tests to test/device * test speedups * test device * linalg to unit * upd * so pytest just works * more divide and skip * speed * test devectorize * add pillow	2025-08-19 16:02:20 -07:00
wozeparrot	71260a5ea4	feat: only bench openpilot 0.9.9 models (#11664 )	2025-08-14 19:27:18 -04:00
geohotstan	1e904155e3	Add Onnx Huggingface to test/models/test_onnx.py (#11468 ) * BOOM * cache extra/huggingface/models/ * why max buffer size is not 0 * override MAX_BUFFER_SIZE * less models * remove more models and change cache dir to already cached dir * only metal * less is more? * remove check ops * why is this not setting the ENVVAR * ughhhhh just test in models * only cpu and gpu * only cpu actually * just override it idk * final * move extra dependencies up top * simplification * fix print * make README better * revert ops_disk fix for now * clean up test_onnx * remove testing fashion clip model cuz sloooowwwwww * actually let METAL run this * fix comment mistake * fix download path in run_models * does this work? * cleanup setup and teardown * contextvar like this? * prove model is cached * do I need to increment DOWNLOAD_CACHE_VERSION? * see if cached with incremented DOWNLOAD_CACHE_VERSION * use warnings to see if the model exists * revert DOWNLOAD_CACHE_VERSION stuff and clean up * add retry to download * nit	2025-08-14 11:16:41 -04:00
chenyu	b232c60def	benchmark openpilot 0.9.9 (#11575 ) * benchmark openpilot 0.9.9 not sure what to do with the 0.9.7 ones with IMAGE=2 and validate * name	2025-08-08 01:26:14 -04:00
chenyu	702e38dc19	remove FUSE_ARANGE_UINT (#11567 ) also add IGNORE_OOB=1 to bert runs. lowered BS on tinybox to 90 since 96 oom during eval without reset	2025-08-07 16:49:06 -04:00
chenyu	594cbdc66f	skip AM ResNet50 benchmark (#11565 ) hanging with FUSE_ARANGE?	2025-08-07 14:07:01 -04:00
nimlgen	1afb290027	ci: fix runner in nv (#11527 )	2025-08-06 10:38:04 +03:00
chenyu	3f742a5a7c	comma space lab models benchmark (#11461 )	2025-07-31 19:06:18 -04:00
nimlgen	5fc5bb5237	ci: clear processes (#11434 ) * unified hcq_smi for managment * fix * fix * no reset for amd	2025-07-30 22:15:18 +03:00
nimlgen	4b4ba5454c	ci: move driver start higher (#11431 )	2025-07-30 10:48:38 +03:00
chenyu	204da24cfc	increase driverbenchmark timeout-minutes to 15 (#11428 )	2025-07-29 19:45:05 -04:00
nimlgen	c88e401d0e	ci: fix typos in h machine benchmarks (#11423 )	2025-07-29 22:11:47 +03:00
George Hotz	1f1f99c287	hotfix: add DEBUG=3 to driver CI	2025-07-29 11:03:47 -07:00
nimlgen	d38d285489	ci: add h machines (#11416 ) * ci: add h machines * more * fix names * names not collide * 20 * 10	2025-07-29 19:21:51 +03:00
chenyu	2b48b961be	fix a few broken AMX tests (#11204 )	2025-07-12 21:42:38 -04:00
George Hotz	0597735f28	remove TC=3 not porting this (#11045 )	2025-06-30 15:12:49 -07:00
chenyu	126fcf4129	clean up AMD_LLVM in tests (#11021 )	2025-06-28 22:45:47 -04:00
chenyu	d71bb6a7b2	remove comma 0.9.4 from benchmark (#10867 )	2025-06-18 12:43:59 -04:00
chenyu	4f535641f7	add one huggingface_onnx test to mac benchmark ci (#10700 ) this crashed for me on onnx parser pr but seems fine for the author. see if ci mac is fine	2025-06-08 12:26:12 -04:00
wozeparrot	37e1ef1be3	feat: cleanup old AM processes (#10653 )	2025-06-05 15:41:00 -07:00
wozeparrot	5e3c4a8431	fix: comma testsig (#10568 )	2025-05-29 19:00:07 -07:00
George Hotz	6b8eb5fec2	split mlperf to its own red benchmark run (#10492 ) * Add mmapeak implementation for 7900 XTX * Change identation * Use a template instead of multiple assebly files * Fix output formatting * Reduce register file bank conflicts * More accurate measurement for quick instructions * Add support for gfx1201 * RDNA4 wmma requires less VGRPs * RDNA4 does not have s_cmpk instructions * Add v_wmma_i32_16x16x32_iu4 for gfx1201 * Add sparse wmma instructions * split to tinybox red MLPerf Benchmark --------- Co-authored-by: Panagiotis Kourouklidis <panagiotis.kourouklidis@gmail.com>	2025-05-23 17:12:41 -07:00
uuuvn	3ca5680920	Test remote in benchmark (#10304 ) hlb cifar is fast so added it, can add bert too if you think it's ok 6 real gpus to test multigraph and transfers + accuracy validation should probably be added to tinystats too, i don't know how though Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-23 12:12:57 -04:00
qazal	90eb3c0e5d	add MobileNetV2 benchmark to comma CI (#10250 ) * add MobileNetV2 to comma CI * symlink imagenet * also the signature * comment that out * need imagenetmock * same train and test set * quantize on CPU=1 * verbose * need __hexagon_divsf3 * 0x858d6c15 * quant cpu + CC=clang-19	2025-05-19 18:22:50 +03:00
George Hotz	b06291077c	no amdgpu kernel driver (#10408 ) * no amdgpu kernel driver * don't test hip * lower req	2025-05-18 20:52:39 -07:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
Ignacio Sica	47b3055fe2	set fail-fast behavior (#10336 )	2025-05-15 11:24:45 -07:00
George Hotz	7a3d4de59a	hotfix: add GRAPH_ONE_KERNEL=1 to UsbGPU openpilot test	2025-05-14 14:50:37 -07:00

1 2 3 4 5 ...

261 Commits