tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 23:48:01 -05:00

Author	SHA1	Message	Date
qazal	3da152f0fe	scheduler docs 2 (#4551 ) * docs * delete cleanups	2024-05-12 12:15:39 +03:00
wozeparrot	e07c7668b3	nf4 llama (#4540 )	2024-05-11 22:22:34 -07:00
George Hotz	7a26bdac65	move scheduleitem to schedule.py (#4541 ) * move scheduleitem to schedule.py * don't need that type checking anymore	2024-05-11 21:13:04 -07:00
George Hotz	508e8a6666	add cpu objdump to LLVM/CLANG (#4537 )	2024-05-11 14:28:44 -07:00
chenyu	bed70b130c	mlperf bert getenv-able EVAL_STEP_FREQ (#4534 )	2024-05-11 14:36:56 -04:00
George Hotz	328b083e66	lil profiling script	2024-05-11 11:02:44 -07:00
chenyu	da10cf0be1	extra/threefry.py for mem usage (#4533 ) for now it needs 8N mem to generate size N rand	2024-05-11 13:46:44 -04:00
chenyu	8a0fb3d765	delete old extra/autopad.py (#4532 )	2024-05-11 13:06:10 -04:00
chenyu	04a4980a51	touchup bert script (#4531 ) small adjustments, remove duplicated training setting and stop the script once target is hit	2024-05-11 13:02:02 -04:00
qazal	4871476a1e	move copy kernel to out of schedule ordering (#4530 ) * delete from sorting * move the logic	2024-05-11 14:44:44 +03:00
qazal	2fb564c125	multi reduce linearizer tests start (#4529 ) * test_end_local * test_early_end_local * todos * mean+std * skip no locals	2024-05-11 14:06:40 +03:00
qazal	3cba22920f	test_linearizer_correctness (#4458 ) * test helper * uops asserts * cleanup args * nits	2024-05-11 13:02:08 +03:00
qazal	b3d9fd48d0	infra for testing linearizer correctness (#4528 ) * refactor outbufs * delete helper	2024-05-11 12:10:33 +03:00
George Hotz	2f970a4fc2	all realize 2 (#4527 ) * all realize 2 * tests fixup * fix more tests * fix openpilot * fix tests * unneeded	2024-05-10 22:43:09 -07:00
wozeparrot	d2c347fc74	faster gather for bert (#4526 )	2024-05-10 22:28:48 -07:00
George Hotz	922e6e056a	hotfix: fix docs	2024-05-10 21:51:35 -07:00
George Hotz	347a3acb37	add renderer class (#4524 ) * add renderer class * tests pass * fix pylint * fix tensor cores	2024-05-10 21:40:02 -07:00
chenyu	b00b6b16f0	fix TRAIN_BEAM and Tensor.training for mlperf bert (#4525 ) also hard coded bert model config instead of looking up a file	2024-05-11 00:18:36 -04:00
chenyu	7fab8c9e17	add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit (#4523 ) * add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit 2d symbolic mean in jit does not quite work, order of the variable inputs are not deterministic? * skip	2024-05-10 23:19:55 -04:00
George Hotz	827058f030	update tests get_runner (#4522 )	2024-05-10 20:09:22 -07:00
George Hotz	a0448ff595	use copy kernel in schedule (#4520 ) * use copy kernel in schedule * imports	2024-05-10 15:30:33 -07:00
chenyu	b15e2309bd	verbose error message in getitem (#4519 ) * verbose error message in getitem still hard to undetstand, at least it prints what it's trying to expand * sure * :	2024-05-10 17:25:41 -04:00
George Hotz	d438d5698d	bring buffer back to device (#4517 )	2024-05-10 11:22:31 -07:00
qazal	a2b707a3eb	scheduler comments 1 (#4515 )	2024-05-10 20:44:28 +03:00
George Hotz	4eef1ee9bf	move renderer into options (#4514 ) * move renderer into options * fix tests * renders are functions	2024-05-10 10:01:51 -07:00
George Hotz	7c630a9a53	hotfix: fix llama spacing + fix hcq	2024-05-10 15:10:13 +00:00
George Hotz	58e7256ce9	restore hcq graph (#4513 ) * Reapply "hcq graph (#4380)" (#4512) This reverts commit `06c1e7498e`. * bring back hcq graph	2024-05-10 07:45:05 -07:00
George Hotz	06c1e7498e	Revert "hcq graph (#4380 )" (#4512 ) This reverts commit `84a2e2b8c1`.	2024-05-10 07:18:09 -07:00
nimlgen	84a2e2b8c1	hcq graph (#4380 ) * start hcq graph * hack-fix sync on amd * nv * fix nv * multigrah * fixes * temp fix for graph * this is not needed * fix * cleaner * linetr * fix none * faster cuda copy * faster amd copy * temp nv fixes * alloc on gpu * exp: faster amd * Revert "exp: faster amd" This reverts commit 2e4cfd1f7d8a33634c50fb5655cff1b40269d28c. * revert, unrelated * not in this pr * linter	2024-05-10 07:15:12 -07:00
qazal	2b7ab60584	dfs fusion (#4491 ) * use continue * simplify * flip * track r * derive forced_realize * scheduler needs comments	2024-05-10 17:00:48 +03:00
qazal	bd8bb82555	move fusion out of child iteration (#4509 )	2024-05-10 12:03:32 +03:00
qazal	ff216a383a	refactor fused children (#4508 ) * realized_children -> group * use a set	2024-05-10 11:49:23 +03:00
chenyu	b399d98e41	fix resnet eval (#4507 )	2024-05-10 00:49:00 -04:00
wozeparrot	a602dc67d3	feat: more mlperf fixes (#4505 )	2024-05-09 20:50:20 -07:00
chenyu	0e8aa0e288	use fake data in beam searching resnet (#4504 )	2024-05-09 23:43:50 -04:00
George Hotz	5bfc33948a	hotfix: only run optimize_local_size once	2024-05-09 20:01:53 -07:00
wozeparrot	29daea4e60	fix: core count and os (#4503 )	2024-05-09 19:55:07 -07:00
George Hotz	89e119bc58	move Allocator to buffer.py (#4502 ) * move Allocator to buffer.py * move those to realize * memory file * cleanup	2024-05-09 19:45:56 -07:00
George Hotz	1e843d495e	cleaning up search with Program (#4500 ) * cleaning up search * fix tests * test fix * minor compiler cleanup	2024-05-09 19:01:53 -07:00
chenyu	d3dc332c2e	Tensor.logsumexp (#4442 ) the subtract max part should share with safe softmax cleaner	2024-05-09 20:49:06 -04:00
chenyu	78b298aa2a	move 0d tensor reduce axis check to _reduce (#4499 )	2024-05-09 20:29:55 -04:00
George Hotz	c9e84ed0da	refactor to Program class (#4476 ) * refactor to Program class * switch to Program * fix tests * smaller diff * self.p * more tests * fix metal test * tests * fix openpilot * move that to linearizer * p.launchdims	2024-05-09 17:29:07 -07:00
chenyu	5de4a46f10	re-enable gpt2 half/beam mac benchmark (#4496 ) * re-enable gpt2 half/beam mac benchmark from fuzzer it seems to be flaky due to numerical issue, not kernel bug. we used to have half in splitted reduce. run this in M1 Max for 20 loops and it's fine * that should be jitted	2024-05-09 19:15:32 -04:00
nimlgen	a2e2ba380c	nv tune shmem size (#4495 ) * nv tune shmem size * compare them * linter * linter2	2024-05-10 00:35:01 +03:00
chenyu	ef93e41a15	resnet mlperf systems add tinygrad commit and python / runtime versions (#4494 )	2024-05-09 16:04:15 -04:00
chenyu	b5afdfbc5b	first draft resnet mlperf readme (#4493 ) * start readme * something	2024-05-09 15:51:44 -04:00
chenyu	047c7f3e5b	polish resnet mlperf logging (#4490 ) don't include save final check point time in run time, and some cosmetic order changes	2024-05-09 13:04:24 -04:00
chenyu	d78e159aa3	resnet logging move RUN_START to start of the script (#4488 )	2024-05-09 12:32:32 -04:00
chenyu	1bcb58479d	resnet setup power cap red box gpu to 350W (#4484 ) 1%-2% faster	2024-05-08 23:32:41 -04:00
chenyu	0ed755bcf5	resnet use EVAL_BS=192 (#4482 ) * resnet use EVAL_BS=192 also lower green run BEAM_MIN_PROGRESS from 10 to 5 * BEAM_MIN_PROGRESS 5 is too close to setup limit	2024-05-08 22:29:27 -04:00

1 2 3 4 5 ...

4402 Commits