tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 15:38:29 -05:00

Author	SHA1	Message	Date
George Hotz	7c630a9a53	hotfix: fix llama spacing + fix hcq	2024-05-10 15:10:13 +00:00
George Hotz	58e7256ce9	restore hcq graph (#4513 ) * Reapply "hcq graph (#4380)" (#4512) This reverts commit `06c1e7498e`. * bring back hcq graph	2024-05-10 07:45:05 -07:00
George Hotz	06c1e7498e	Revert "hcq graph (#4380 )" (#4512 ) This reverts commit `84a2e2b8c1`.	2024-05-10 07:18:09 -07:00
nimlgen	84a2e2b8c1	hcq graph (#4380 ) * start hcq graph * hack-fix sync on amd * nv * fix nv * multigrah * fixes * temp fix for graph * this is not needed * fix * cleaner * linetr * fix none * faster cuda copy * faster amd copy * temp nv fixes * alloc on gpu * exp: faster amd * Revert "exp: faster amd" This reverts commit 2e4cfd1f7d8a33634c50fb5655cff1b40269d28c. * revert, unrelated * not in this pr * linter	2024-05-10 07:15:12 -07:00
qazal	2b7ab60584	dfs fusion (#4491 ) * use continue * simplify * flip * track r * derive forced_realize * scheduler needs comments	2024-05-10 17:00:48 +03:00
qazal	bd8bb82555	move fusion out of child iteration (#4509 )	2024-05-10 12:03:32 +03:00
qazal	ff216a383a	refactor fused children (#4508 ) * realized_children -> group * use a set	2024-05-10 11:49:23 +03:00
chenyu	b399d98e41	fix resnet eval (#4507 )	2024-05-10 00:49:00 -04:00
wozeparrot	a602dc67d3	feat: more mlperf fixes (#4505 )	2024-05-09 20:50:20 -07:00
chenyu	0e8aa0e288	use fake data in beam searching resnet (#4504 )	2024-05-09 23:43:50 -04:00
George Hotz	5bfc33948a	hotfix: only run optimize_local_size once	2024-05-09 20:01:53 -07:00
wozeparrot	29daea4e60	fix: core count and os (#4503 )	2024-05-09 19:55:07 -07:00
George Hotz	89e119bc58	move Allocator to buffer.py (#4502 ) * move Allocator to buffer.py * move those to realize * memory file * cleanup	2024-05-09 19:45:56 -07:00
George Hotz	1e843d495e	cleaning up search with Program (#4500 ) * cleaning up search * fix tests * test fix * minor compiler cleanup	2024-05-09 19:01:53 -07:00
chenyu	d3dc332c2e	Tensor.logsumexp (#4442 ) the subtract max part should share with safe softmax cleaner	2024-05-09 20:49:06 -04:00
chenyu	78b298aa2a	move 0d tensor reduce axis check to _reduce (#4499 )	2024-05-09 20:29:55 -04:00
George Hotz	c9e84ed0da	refactor to Program class (#4476 ) * refactor to Program class * switch to Program * fix tests * smaller diff * self.p * more tests * fix metal test * tests * fix openpilot * move that to linearizer * p.launchdims	2024-05-09 17:29:07 -07:00
chenyu	5de4a46f10	re-enable gpt2 half/beam mac benchmark (#4496 ) * re-enable gpt2 half/beam mac benchmark from fuzzer it seems to be flaky due to numerical issue, not kernel bug. we used to have half in splitted reduce. run this in M1 Max for 20 loops and it's fine * that should be jitted	2024-05-09 19:15:32 -04:00
nimlgen	a2e2ba380c	nv tune shmem size (#4495 ) * nv tune shmem size * compare them * linter * linter2	2024-05-10 00:35:01 +03:00
chenyu	ef93e41a15	resnet mlperf systems add tinygrad commit and python / runtime versions (#4494 )	2024-05-09 16:04:15 -04:00
chenyu	b5afdfbc5b	first draft resnet mlperf readme (#4493 ) * start readme * something	2024-05-09 15:51:44 -04:00
chenyu	047c7f3e5b	polish resnet mlperf logging (#4490 ) don't include save final check point time in run time, and some cosmetic order changes	2024-05-09 13:04:24 -04:00
chenyu	d78e159aa3	resnet logging move RUN_START to start of the script (#4488 )	2024-05-09 12:32:32 -04:00
chenyu	1bcb58479d	resnet setup power cap red box gpu to 350W (#4484 ) 1%-2% faster	2024-05-08 23:32:41 -04:00
chenyu	0ed755bcf5	resnet use EVAL_BS=192 (#4482 ) * resnet use EVAL_BS=192 also lower green run BEAM_MIN_PROGRESS from 10 to 5 * BEAM_MIN_PROGRESS 5 is too close to setup limit	2024-05-08 22:29:27 -04:00
chenyu	1f6bf9d2f7	real diskcache_clear in model_train resnet (#4445 ) clear cache if INITMLPERF is set, or running run_and_time. dev_beam and dev_run do not clear cache	2024-05-08 19:06:09 -04:00
chenyu	1b4645bea6	hotfix resnet move init_start to start of the script (#4481 )	2024-05-08 19:03:52 -04:00
wozeparrot	a347ae94d6	feat: remove wandb (#4480 )	2024-05-08 15:31:16 -07:00
qazal	00c309dfe2	trigger tc in remu (#4479 )	2024-05-08 23:23:46 +03:00
nimlgen	e14d5b6fd7	nv fix oob qmd ptr (#4478 ) * nv fix oob qmd ptr * test kernargs no oob	2024-05-08 23:11:04 +03:00
chenyu	db7e15c46f	hotfix resnet only log epoch start with RUNMLPERF (#4477 )	2024-05-08 15:14:41 -04:00
chenyu	062c6dd65d	mlperf logging, truncate dir in logs and log seed (#4475 )	2024-05-08 12:54:02 -04:00
chenyu	b62a65b617	redo faster sparse_categorical_crossentropy (#4461 ) update LR and DECAY in resnet default that help convergence too	2024-05-08 11:21:43 -04:00
Elias Wahl	e87460c7e2	bump version (#4474 )	2024-05-08 07:48:42 -07:00
Szymon Ożóg	4eb6aef73c	Speed up graph rewrite (#4473 ) * Speed up graph rewrite * Bring back old name	2024-05-08 07:15:15 -07:00
Nicklas Boman	cc33947fa5	Update links in new docs (#4363 ) tensor and nn links to tensor.md and nn.md	2024-05-08 06:13:00 -07:00
chenyu	36a1f38049	lazy folding: mul -1 is neg, and neg neg is noop (#4472 )	2024-05-08 01:52:22 -04:00
chenyu	c508eb7425	revert the removal of CAST_BEFORE_VIEW (#4471 ) this brings most of the memory gain for resnet back.	2024-05-08 00:14:29 -04:00
George Hotz	5dbab7fae6	bring thneed back (#4467 ) * bring thneed back * simple thneed * bug fixes in new thneed * needs_load * context * move that there * fix thneed size * fix CI * one memory planner * assert on buffer size	2024-05-07 20:55:03 -07:00
chenyu	7eb035e7c5	stronger test case for half mean overflow (#4470 )	2024-05-07 22:40:09 -04:00
chenyu	ca7300c783	fix half mean and its backward (#4469 ) * fix half mean and its backward cast to sum_acc_type, sum, div, then cast back * mean dtype tests	2024-05-07 21:46:41 -04:00
Francis Lam	7da1b41f38	fuzz_linearizer: add FUZZ_REQUIRE_TC option to require TC in opts (#4468 ) useful for checking late opts after TC such as GROUP, etc.	2024-05-07 17:14:21 -04:00
chenyu	46a793111b	test for LazyBuffer._view when mask out and degrade into const (#4465 ) changed the condition from all 0 in masked dims to any 0 in masked. it's no-op because shapetracker rewrites whole mask to 0 if any dim has 0 as part of canonicalization	2024-05-07 12:56:23 -04:00
nimlgen	a1d350a810	nv timeline semaphores (#4464 ) * nv timeline semaphores * nv hcq fixes	2024-05-07 17:31:19 +03:00
nimlgen	e3bb85fd0e	amd timeline semaphores (#4416 ) * amd timeline semaphores * v2 * fixes * reset signals * fix * rollover test * small fixes * linter * copyin	2024-05-07 11:17:32 +03:00
George Hotz	17faae091b	optimizer shouldn't be run without training (#4460 ) * optimizer shouldn't be run without training * set training in relevant tests * fix multitensor * that too	2024-05-06 15:34:12 -07:00
qazal	35dfbc6354	rand_for_dtype helper (#4459 )	2024-05-07 00:03:42 +03:00
nimlgen	a3140c9767	nv boost subdevice (#4456 )	2024-05-06 23:05:20 +03:00
Francis Lam	47750e65fd	kernel: un-reverse the order of the local indices (#4454 ) no change to performance or behavior. new LOCALS are added to the left side of the LOCALS block (to the left of the first_reduce).	2024-05-06 15:21:27 -04:00
chenyu	5e036cd0b3	test unary and more reduces in test_flopcounter (#4455 ) cannot really catch a spec change error without testing the new spec explicitly, but we don't intended to change the lazy spec lightly another possible way to catch reduce flopcounter shape would be type checking InterpretedFlopCounter and throw error if `in` results in `Never`	2024-05-06 15:15:16 -04:00

1 2 3 4 5 ...

4427 Commits