tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-31 01:38:20 -05:00

Author	SHA1	Message	Date
chenyu	0fa57b8ce9	raise error if setitem tensors have requires_grad (#4575 ) * raise error if setitem tensors have requires_grad working on supporting this, first properly raises error * NotImplementedError	2024-05-13 18:56:47 -04:00
Filip Brzek	f7d08bd454	feat: add acc_dtype to einsum (#4571 )	2024-05-13 14:02:07 -04:00
Szymon Ożóg	d97d5a7689	Optimize PTX gated loads index calculation (#4304 ) * WIP but working * Cleanup * Remove float4 pred and alt * Cleanup * this is somehow slowin it down * Simplify * add define var to ignore when optimizing gates * Update assembly.py * Test for optimizing gated loads * Cleanup * Fix NEG needed before if * Remove unused parameters * Update assembly.py * Fix for cachable gone --------- Co-authored-by: oz <oz@oz-MS-7B86.NAT.gliwice.vectranet.pl> Co-authored-by: chenyu <chenyu@fastmail.com>	2024-05-13 10:14:01 -07:00
qazal	c67b70ca67	small scheduler refactor (#4569 ) * outputs * consistent * more style * doesnt need tuple	2024-05-13 10:47:39 +03:00
qazal	77aa8659f5	use assign_targets in LazyOp creation (#4568 ) * start * correct error * this is possible * document it	2024-05-13 10:24:35 +03:00
qazal	b0fa97e176	assert error detail in test_assign (#4567 ) * use regex assert * that shouldnt raise	2024-05-13 09:56:05 +03:00
chenyu	25ec40ca93	cleanup dtype of tensor creation from list (#4566 )	2024-05-13 02:47:41 -04:00
qazal	4e1135a0bc	assign buffer read/write tests (#4565 ) * simple tests * more tests	2024-05-13 09:43:36 +03:00
George Hotz	b660f60125	all uops are now cachable (#4564 ) * all uops are now cachable * cachable is gone	2024-05-12 22:34:35 -07:00
George Hotz	02327b8adf	simple stuff from new_uops branch (#4563 )	2024-05-12 22:18:05 -07:00
ziereis	f53a23d21e	Test for optim assertion (#4558 ) * add test for assertion * whitespace * restore state --------- Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>	2024-05-12 14:21:28 -07:00
wozeparrot	d7670f8141	quantized llama multilazybuffer fix (#4557 )	2024-05-12 14:19:21 -07:00
ziereis	bcee4743ce	fix error message (#4556 ) * fix error messgae * typo * add suggestion to fix error --------- Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>	2024-05-12 12:35:51 -07:00
chenyu	01a0c1a948	slightly faster nf4 llama (#4542 )	2024-05-12 14:24:42 -04:00
qazal	4c232dc0ae	refactor LoadOps scheduling (#4553 ) * refactor * op -> lop	2024-05-12 12:59:24 +03:00
qazal	3da152f0fe	scheduler docs 2 (#4551 ) * docs * delete cleanups	2024-05-12 12:15:39 +03:00
wozeparrot	e07c7668b3	nf4 llama (#4540 )	2024-05-11 22:22:34 -07:00
George Hotz	7a26bdac65	move scheduleitem to schedule.py (#4541 ) * move scheduleitem to schedule.py * don't need that type checking anymore	2024-05-11 21:13:04 -07:00
George Hotz	508e8a6666	add cpu objdump to LLVM/CLANG (#4537 )	2024-05-11 14:28:44 -07:00
chenyu	bed70b130c	mlperf bert getenv-able EVAL_STEP_FREQ (#4534 )	2024-05-11 14:36:56 -04:00
George Hotz	328b083e66	lil profiling script	2024-05-11 11:02:44 -07:00
chenyu	da10cf0be1	extra/threefry.py for mem usage (#4533 ) for now it needs 8N mem to generate size N rand	2024-05-11 13:46:44 -04:00
chenyu	8a0fb3d765	delete old extra/autopad.py (#4532 )	2024-05-11 13:06:10 -04:00
chenyu	04a4980a51	touchup bert script (#4531 ) small adjustments, remove duplicated training setting and stop the script once target is hit	2024-05-11 13:02:02 -04:00
qazal	4871476a1e	move copy kernel to out of schedule ordering (#4530 ) * delete from sorting * move the logic	2024-05-11 14:44:44 +03:00
qazal	2fb564c125	multi reduce linearizer tests start (#4529 ) * test_end_local * test_early_end_local * todos * mean+std * skip no locals	2024-05-11 14:06:40 +03:00
qazal	3cba22920f	test_linearizer_correctness (#4458 ) * test helper * uops asserts * cleanup args * nits	2024-05-11 13:02:08 +03:00
qazal	b3d9fd48d0	infra for testing linearizer correctness (#4528 ) * refactor outbufs * delete helper	2024-05-11 12:10:33 +03:00
George Hotz	2f970a4fc2	all realize 2 (#4527 ) * all realize 2 * tests fixup * fix more tests * fix openpilot * fix tests * unneeded	2024-05-10 22:43:09 -07:00
wozeparrot	d2c347fc74	faster gather for bert (#4526 )	2024-05-10 22:28:48 -07:00
George Hotz	922e6e056a	hotfix: fix docs	2024-05-10 21:51:35 -07:00
George Hotz	347a3acb37	add renderer class (#4524 ) * add renderer class * tests pass * fix pylint * fix tensor cores	2024-05-10 21:40:02 -07:00
chenyu	b00b6b16f0	fix TRAIN_BEAM and Tensor.training for mlperf bert (#4525 ) also hard coded bert model config instead of looking up a file	2024-05-11 00:18:36 -04:00
chenyu	7fab8c9e17	add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit (#4523 ) * add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit 2d symbolic mean in jit does not quite work, order of the variable inputs are not deterministic? * skip	2024-05-10 23:19:55 -04:00
George Hotz	827058f030	update tests get_runner (#4522 )	2024-05-10 20:09:22 -07:00
George Hotz	a0448ff595	use copy kernel in schedule (#4520 ) * use copy kernel in schedule * imports	2024-05-10 15:30:33 -07:00
chenyu	b15e2309bd	verbose error message in getitem (#4519 ) * verbose error message in getitem still hard to undetstand, at least it prints what it's trying to expand * sure * :	2024-05-10 17:25:41 -04:00
George Hotz	d438d5698d	bring buffer back to device (#4517 )	2024-05-10 11:22:31 -07:00
qazal	a2b707a3eb	scheduler comments 1 (#4515 )	2024-05-10 20:44:28 +03:00
George Hotz	4eef1ee9bf	move renderer into options (#4514 ) * move renderer into options * fix tests * renders are functions	2024-05-10 10:01:51 -07:00
George Hotz	7c630a9a53	hotfix: fix llama spacing + fix hcq	2024-05-10 15:10:13 +00:00
George Hotz	58e7256ce9	restore hcq graph (#4513 ) * Reapply "hcq graph (#4380)" (#4512) This reverts commit `06c1e7498e`. * bring back hcq graph	2024-05-10 07:45:05 -07:00
George Hotz	06c1e7498e	Revert "hcq graph (#4380 )" (#4512 ) This reverts commit `84a2e2b8c1`.	2024-05-10 07:18:09 -07:00
nimlgen	84a2e2b8c1	hcq graph (#4380 ) * start hcq graph * hack-fix sync on amd * nv * fix nv * multigrah * fixes * temp fix for graph * this is not needed * fix * cleaner * linetr * fix none * faster cuda copy * faster amd copy * temp nv fixes * alloc on gpu * exp: faster amd * Revert "exp: faster amd" This reverts commit 2e4cfd1f7d8a33634c50fb5655cff1b40269d28c. * revert, unrelated * not in this pr * linter	2024-05-10 07:15:12 -07:00
qazal	2b7ab60584	dfs fusion (#4491 ) * use continue * simplify * flip * track r * derive forced_realize * scheduler needs comments	2024-05-10 17:00:48 +03:00
qazal	bd8bb82555	move fusion out of child iteration (#4509 )	2024-05-10 12:03:32 +03:00
qazal	ff216a383a	refactor fused children (#4508 ) * realized_children -> group * use a set	2024-05-10 11:49:23 +03:00
chenyu	b399d98e41	fix resnet eval (#4507 )	2024-05-10 00:49:00 -04:00
wozeparrot	a602dc67d3	feat: more mlperf fixes (#4505 )	2024-05-09 20:50:20 -07:00
chenyu	0e8aa0e288	use fake data in beam searching resnet (#4504 )	2024-05-09 23:43:50 -04:00

... 119 120 121 122 123 ...

10417 Commits