tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 23:38:58 -05:00

Author	SHA1	Message	Date
qazal	c5f5755328	correctness test for multireduce nested locals (#4682 ) * nested locals test * move st	2024-05-22 19:35:35 +03:00
chenyu	bc9be39dec	set timeout in search _try_compile_linearized_w_idx (#4677 )	2024-05-22 12:30:31 -04:00
qazal	d12d412e8b	revert uops dtype in pattern matcher (#4681 ) This reverts commit `5f84cbb5df`.	2024-05-22 14:45:51 +03:00
Elias Wahl	acc0039cfc	Resume fix + scheduler for non weight decay params (#4679 ) * move ckpt dir * fix resume. Add scheduler group	2024-05-21 19:38:13 -04:00
chenyu	0f21aa0416	example kernel that triggers Memory access fault for resnet on red (#4678 )	2024-05-21 18:59:36 -04:00
qazal	5f84cbb5df	keep UOps.CAST in PHI-GEP fold for unmatching dtypes (#4674 ) * these should be val.dtype * cast float4 and float2 to root * document tests * 2 args * fix assert * match dtype * no extra lines * better fix	2024-05-21 14:59:49 -04:00
qazal	458a3961eb	catch compile errors in uops tests (#4672 ) * use helper and compile * llama beam=2 * ast length * skip float4, fix hsa * use empty tensors	2024-05-21 12:20:35 +03:00
wozeparrot	00432496d7	feat: tinyboxgreen (#4366 ) * feat: tinyboxgreen * feat: tinyboxgreenv2 * fix symlink weights * fix: remove llama 2 70b for now * feat: naming * fix: remove extra cifar steps * feat: disable mixtral on nvidia	2024-05-20 22:39:34 -04:00
Timmy	de733d73cf	Multireduce Linearizer Tests (#4665 ) * updated tests * make sure the upcasting tests actually causes the problem * diff cleanup * use UOpGraph utils --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-05-21 02:43:25 +03:00
chenyu	5e3fbbb33e	llama3 example add manual seed and log seed (#4667 )	2024-05-20 19:09:57 -04:00
chenyu	8c99cc17f5	remove link to old adding_new_accelerators.md (#4666 ) fix #4657	2024-05-20 19:05:23 -04:00
chenyu	c4089d169f	update BEAM_LOCAL_MAX to 1024 (#4664 ) we used 1024 for mlperf submission and result steps time is 20% faster. the default should not be worse	2024-05-20 18:06:32 -04:00
chenyu	704cb1d8a0	fix conversation.py quantize (#4663 ) it used to be true for int8, not it's a string for int8 or nf4	2024-05-20 17:36:37 -04:00
chenyu	ae861325ce	update llama sample for mac 32 input buffer limit (#4662 ) set default sampling params to function call to 0, and top k in llama3 to 25.	2024-05-20 17:23:39 -04:00
Elias Wahl	993091adfa	loss scaler + nan fixes (#4661 )	2024-05-20 17:08:35 -04:00
qazal	b33c827aed	UOps.RANGE toposort spec (#4660 ) * use iterator * nested loops and outer loads * uop after phi	2024-05-20 23:38:20 +03:00
qazal	0d9e623d83	consolidate uops tests (#4659 ) * merge uoptimize * move tests * fix skip message	2024-05-20 21:42:31 +03:00
Szymon Ożóg	1e7b7b2c3c	Fix flop coutning for mulacc (#4640 ) * Fix flop coutning for mulacc * add test_simple_mulacc * Update test_uops_stats.py * Update test_uops_stats.py * revert test_mulacc * Test for MULACC vs MUL+ADD	2024-05-20 12:06:00 -04:00
wozeparrot	b144d4b460	new llama3 example (#4576 )	2024-05-19 22:42:23 -07:00
nimlgen	c9f7f2da70	nv hcq bind api (#4629 ) * hcq bind api for nv * linter * linter * add test * small comment	2024-05-19 23:17:10 +03:00
qazal	d308f4fa9a	correctly insert UOps.END* in fuzz result (#4653 )	2024-05-19 21:10:28 +03:00
chenyu	456aa0b656	update test_search kernel count (#4652 ) integration test that beaming 1 kernel increments kernel count by 1, and moved exiting test_kernel_count to TestTimeLinearizer	2024-05-19 13:54:52 -04:00
qazal	954718e6bf	reorder DEFINE_GLOBAL in fuzz_uops (#4651 ) * globals base * test: opt out of DEFINE_GLOBAL * do it like ExecItem	2024-05-19 20:51:31 +03:00
Léo	967e35f8b8	fix(beam): GlobalCounters kernel count increasing when clearing l2 (#4598 ) * fix(beam): GlobalCounters kernel count increasing when clearing l2 * fix: removed the NOSTATS var by adding do_update_stats to Tensor.realize() * test(search): regression test for _time_program, should not increment kernel_count * fix(test_search): unused var and now properly checking when l2 is cleared * fix(test_search): added assert message * fix(test_search): now testing public beam api for kcount * ruff fixes --------- Co-authored-by: Léo Paillé <leo.paille@enseirb-matmeca.fr>	2024-05-19 10:03:47 -07:00
George Hotz	4753283221	LOOP -> RANGE (#4650 )	2024-05-19 06:40:20 -07:00
chenyu	286b4dbdf2	compile raise CompileError and skip only RuntimeError in multiprocess… (#4646 ) * compile raise CompileError and skip only RuntimeError in multiprocess beam renderer error with multiprocess should not be skipped by beam * use `==` for dtype to dtype comparison * that needs to be is * typo	2024-05-19 00:25:25 -04:00
chenyu	8a0d1ca7bb	CI test timeout 20 min -> 10 min (#4645 ) if it takes more than 10 usually setup fails anyway. also updated matmul_kfd -> matmul_amd in benchmark	2024-05-18 13:58:28 -04:00
qazal	b0cb02f719	uops fuzzing infra (#4641 ) * base with bfs * find paths * get last * try blocks * Revert "try blocks" This reverts commit `25f8e3fe85`. * this should be simpler * full exec * support debug * fix lint * add todo * copy in_degree	2024-05-18 20:19:57 +03:00
qazal	bf8f855838	assert kernel counts in unsupported fusions (#4643 ) * replace with comments * not relevant * update comment * custom exception maybe * fix LoadOps.VIEW	2024-05-18 20:14:37 +03:00
qazal	a5204fe89d	refactor UOps.CONST (#4639 ) * delete more * nit: dont need assign * can this be simpler * use scalars * always cast * clang needs cast * format	2024-05-18 10:07:36 +03:00
qazal	d0a2d40df3	root cause fix for UOps.CONST bad args (#4638 ) * delete that * real fix	2024-05-18 09:15:25 +03:00
George Hotz	9b464e34ea	increase speed of uops (#4637 ) * increase speed of uops * not equal * minor speedup	2024-05-17 21:04:39 -07:00
George Hotz	b74cc1d01a	uops cleanup (#4634 ) * def add cleanup * minor speedup * add back ptx speed * a little faster * merge that * only linearize once for ptx * two graph rewrites for ptx, bug?	2024-05-17 20:02:38 -07:00
George Hotz	07b350a8f4	new uops is an actual graph (#4560 ) * new uops is an actual graph * it's way slower * simpler * fix define acc * render_loop unique * ops test pass * add pattern matcher back, there's bugs * rewrite * use priority queue * recursive children * fix tests * fix tests with SINK * fix abstractions * fix assembly * simpler * link define_acc * fix DEFINE_ACC placement * type verify * full cmp * fix cmp * ACCESS_ACC * insert DEFINE_ACC * fix PHI * recursive rewrite * fix many tests * sum collapse * more patterns * correct change * fold arange * fix that lin test * space * big folding rule works * close * has more maxes, meh * cached node replace * set changed * simplest folding yet * works * works * DIV * all tests pass * del * fuzz linearizer fails * sum_collapse * test depth 2 cf * fix lin test 14 * fix clang depth * disable that * failure 14 is fixed * fix ptx * failure 27 is fixed * fix llama * run_cnt * Revert "Optimize PTX gated loads index calculation (#4304)" This reverts commit `d97d5a7689`. * fix uops loop * fix ptx bugs * add barrier * print * mem_type in ptx direct * bypass tests that fail in CI but pass locally * ptx remove ptr_ar * more ptx passing * fix ptx tests * assert compile support * remove model inference benchmark from red	2024-05-17 18:00:18 -07:00
nimlgen	daf57af3eb	move tc to renderers (#4631 ) * move tc to renderers * missed import * fix typo * fix * fix imports * remove from tests * fix 4607 * nv emulate timestamp * time is int * correct time	2024-05-18 00:36:29 +03:00
chenyu	d70988dddf	add blob and raw=true for image in docs showcase (#4632 ) this should render the image correctly	2024-05-17 16:57:15 -04:00
nimlgen	10cf8e459b	hcq update queue in place (#4626 ) * do not self wait in hcq * faster enqueue * comments * tests * linter * fix typo	2024-05-17 22:18:20 +03:00
chenyu	ca1df20fa9	benchmark name fix - resnet eval is on eval data (#4628 )	2024-05-17 12:56:12 -04:00
chenyu	c86adabe15	time with real global buffers in search (#4621 ) * filter fake buffers in search * test that * update test	2024-05-17 12:36:23 -04:00
chenyu	e5d4e6a8aa	BEAM=2 in green CI for 100 TFLOPS (#4624 )	2024-05-16 23:28:28 -04:00
chenyu	b3dd885ffb	cleanup double import from tinygrad.device in tensor.py (#4620 )	2024-05-16 14:21:22 -04:00
uuuvn	639ea5b0f2	Metal linearizer failure 22 is flaky not just on CI (#4617 ) * METAL doesn't fail anymore, not just on CI * oops	2024-05-16 11:31:23 -04:00
qazal	f3f2b96583	pick schedule tests from external_test_opt (#4615 ) * conv tests * misc * that shouldnt const fold	2024-05-16 15:43:41 +03:00
qazal	13200c6894	check simple_pads in all views (#4614 )	2024-05-16 14:34:39 +03:00
qazal	0b464df605	base change scheduling spec (#4613 ) * spec and kernel cnt * dont use half * skip half	2024-05-16 13:30:49 +03:00
nimlgen	65f7e3b3ab	nv setup constbuf4 (#4511 ) * nv correct constbuf 4 * compare results to cuda * test fixed * failed kernel * repro * revert this change	2024-05-16 10:42:35 +03:00
chenyu	04f2327ca3	fix abs of diff of uint (#4411 )	2024-05-15 18:39:11 -04:00
chenyu	2119e0456d	redo simpler abs and sign (#4611 ) moved Sign logic to function.py, and backward always returns 0 to match torch. rewrite abs as `self * self.sign()`, so it's backward also matches torch.	2024-05-15 18:19:46 -04:00
nimlgen	eb9689336e	nv mockgpu (#4600 ) * mockgpu nv * works * comment that out * fix merge * setup gpuocelot * install packages * not run all of them * passes * fix ci * almost * should pass * linter * linter 2 * try this? * ugn, not supported * ci * remove ticket from description * better descs	2024-05-15 23:46:08 +03:00
chenyu	3c11ca452e	skip CLANG test casts between double and half for now (#4609 ) start breaking after github CI image update	2024-05-15 16:17:06 -04:00

1 2 3 4 5 ...

4491 Commits