tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-03 19:25:06 -05:00

Author	SHA1	Message	Date
nimlgen	e3bb85fd0e	amd timeline semaphores (#4416 ) * amd timeline semaphores * v2 * fixes * reset signals * fix * rollover test * small fixes * linter * copyin	2024-05-07 11:17:32 +03:00
George Hotz	17faae091b	optimizer shouldn't be run without training (#4460 ) * optimizer shouldn't be run without training * set training in relevant tests * fix multitensor * that too	2024-05-06 15:34:12 -07:00
qazal	35dfbc6354	rand_for_dtype helper (#4459 )	2024-05-07 00:03:42 +03:00
nimlgen	a3140c9767	nv boost subdevice (#4456 )	2024-05-06 23:05:20 +03:00
Francis Lam	47750e65fd	kernel: un-reverse the order of the local indices (#4454 ) no change to performance or behavior. new LOCALS are added to the left side of the LOCALS block (to the left of the first_reduce).	2024-05-06 15:21:27 -04:00
chenyu	5e036cd0b3	test unary and more reduces in test_flopcounter (#4455 ) cannot really catch a spec change error without testing the new spec explicitly, but we don't intended to change the lazy spec lightly another possible way to catch reduce flopcounter shape would be type checking InterpretedFlopCounter and throw error if `in` results in `Never`	2024-05-06 15:15:16 -04:00
nimlgen	d0b8862dea	fix out of resource kernels on nv (#4450 ) * fix out of resource kernels on nv * better comment * noqa * noqa 2 * linter	2024-05-06 19:24:20 +03:00
George Hotz	f4e49a7c1a	resnet 50 opt: correct loop + LARS (#4449 ) * correct loop + LARS * ops	2024-05-06 08:01:26 -07:00
chenyu	292ce64ad7	move acc_dt out of lazy (#4382 ) move the logic to tensor.py for forward, and function.py for two places in backward (expand and max)	2024-05-06 07:41:25 -07:00
nimlgen	113c2f00b9	amd doorbell size is 64bits (#4448 ) * amd doorbell size ids 64bits * add test * test to pass 32bit boundary is more correct * no need to round there	2024-05-06 16:59:59 +03:00
George Hotz	fc995d4446	add backward to handcode_resnet50_opt	2024-05-06 06:42:26 -07:00
qazal	6dbe5585b0	batchnorm + conv backward in test_schedule (#4420 ) * test both optims * batchnorm_backward	2024-05-06 16:40:17 +03:00
Timmy	3f3c973022	Multiple Reduce Kernels - kernel properly orders reduceops (#4418 ) * enable kernel with multiple reduceops * copy self.reduceops * assert only one reduceop per kernel * kernel.py dfs order * linters --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-05-06 13:54:44 +03:00
wozeparrot	603d3a351b	feat: allow keeping multiple cookies (#4440 )	2024-05-05 19:26:48 -07:00
chenyu	afe020710d	disable PADTO on upcasted axis (#4444 ) fixed test_failure_31. PADTO upcasted is at best a no-op, and might fail at edge cases.	2024-05-05 21:52:03 -04:00
Francis Lam	709410071c	mlperf/resnet: updated BEAM params to increase performance (#4443 )	2024-05-05 21:49:46 -04:00
Francis Lam	c8595a9655	update sops.gz, fix tests and add new linearizer test (#4437 ) * update sops.gz, fix tests and add new linearizer test * remove METAL CI skip for test_failure_22 * re-add skip to METAL CI to test_failure_22	2024-05-05 17:31:25 -04:00
wozeparrot	9ad3d0520a	hotfix: npy is also ok (#4439 )	2024-05-05 13:48:54 -07:00
chenyu	d0eb1540d5	helpers.diskcache_clear (#4436 ) drop all tables in diskcache. added a unit test but disabled it by default because it will drop all cache...	2024-05-05 14:19:01 -04:00
George Hotz	595a6e3069	test_fold_conv_relu_backward test	2024-05-05 11:13:43 -07:00
George Hotz	cc16f644d0	hotfix: remove FAKE buffer from graph	2024-05-05 10:52:41 -07:00
qazal	760776c59d	merge EfficientNet to C with clang job (#4426 ) * merge ImageNet to C with linters * add to clang * delete from linter	2024-05-05 20:33:12 +03:00
chenyu	3b30756cbb	update mlperf submission system (#4435 ) more required fields.	2024-05-05 13:19:07 -04:00
George Hotz	f95658bc3e	hotfix: pickle jit works if you delete the function	2024-05-05 10:14:03 -07:00
George Hotz	12be536c06	Clang graph (#4424 ) * clang graph runner * render_dtype * name it ClangGraph * JIT=2 * JIT=2 goes there * JIT as context var	2024-05-05 09:54:12 -07:00
David Hou	544431c388	refactor: pass reduceop into global_load (#4417 ) * pass reduceop directly to global_load * typing * make mypy happy :/ * cede a line to mypy :( * fold in acc_const * add todo	2024-05-05 19:43:48 +03:00
geohotstan	874dfc556c	update setitem tests to test for currently supported cases (#4334 ) * tests, tests, tests * one more test * tests tests tests tests * t e s t * a few more	2024-05-05 11:59:13 -04:00
chenyu	fc9e58e482	Revert "refactor sparse_categorical_crossentropy (#4406 )" (#4429 ) This reverts commit `c7368515d2`.	2024-05-05 02:30:37 -04:00
David Hou	c0a048c044	batchnorm d(var)/d(mean) = 0 (#4430 ) * d(var)/d(mean) = 0 * drop the number in test_schedule!	2024-05-05 00:25:45 -04:00
George Hotz	e2eab9c2b3	hotfix: disk is okay in child process	2024-05-04 18:18:31 +00:00
George Hotz	cf33afa778	don't open devices from children (#4425 ) * don't open devices from children * correct way to do this * fix Device.DEFAULT and add back JITBEAM	2024-05-04 10:35:40 -07:00
qazal	fa17dcaf07	Fix llm.c/export.py (#4423 ) * fix headers * add CI * add stdio * merge clang tests * revert llm.c * revert ci * Revert "revert llm.c" This reverts commit `5fd17e3c8b`.	2024-05-04 19:37:10 +03:00
George Hotz	cb7289f9c9	remove clang program header (#4422 ) * remove clang program header * proper max * bools are numbers * fix compile enet	2024-05-04 08:38:01 -07:00
qazal	267bbb57f9	Revert "Add `insert_before` to Linearizer Functions (#4320 )" (#4421 ) This reverts commit `664b563c91`.	2024-05-04 17:50:21 +03:00
qazal	5f3bae378f	search children in fusion (#4322 ) * scheduler diff * tests diff * new changes * realizes * chores * assign * kind of r3 * forced_realize wont do it * with forced_realize * start with children * test search * r3 with parents * diff cleanup * add children * crossing assign * late fuse descendants * update kernel counts * assign diff doesnt belong here	2024-05-04 17:22:15 +03:00
qazal	249cadd106	fusing crossing diamond assign (#4403 ) * refactor scheduler parents search * assign target * unit test * can't chase this	2024-05-04 15:19:48 +03:00
George Hotz	9fc4465557	subbuffer support (#4397 ) * subbuffer support * diskbuffer offset * cuda subbuffer works * use subbuffer * more subbuffer tests * consecutive * cast * consec * offset * view is a better name * offset is in nbytes * fix view + memory planner * delete unused DiskRunner * reverse order * no subbuffers on unrealized consts * only enabled for disk * don't reverse memory * view supported devices * pickle buffer view * ring jit * support extra view inputs in jit * fix JIT=2 issue * test copy jit * p2p isn't an option anymore * fix dep tracking issue * fix mypy * fix pickle * from_nv is contents now	2024-05-03 18:05:57 -07:00
chenyu	c7368515d2	refactor sparse_categorical_crossentropy (#4406 ) factor out the -1 * and / loss_mask.sum() for both smoothing and non-smoothing terms	2024-05-03 14:28:36 -04:00
qazal	3401734e54	infra for scheduler process replay (#4405 ) * use getenv * capture ast * fix graph * replay schedules * exec	2024-05-03 20:29:13 +03:00
chenyu	473ecb978a	remove SPLIT_REDUCEOP=1 from resnet scripts (#4404 ) SPLIT_REDUCEOP=1 is default	2024-05-03 12:36:23 -04:00
David Hou	b767d59684	resnet trainer: keep old cookie around until next step has been queued (#4401 ) * keep old cookie around until next step has been queued (-10ms 6gpu) * also for eval * drop cookie before data_get? * Revert "drop cookie before data_get?" This reverts commit `b01e6aa2b2`. * Revert "Revert "drop cookie before data_get?"" This reverts commit `23464e73d4`.	2024-05-03 12:15:21 -04:00
qazal	cf3ccb809f	refactor scheduler parents search (#4402 )	2024-05-03 17:16:34 +03:00
George-the-1st	0627e26140	Added missing unittest execution code (#4400 ) same code as on every other test file, just missing from this one for some reason.	2024-05-02 22:34:30 -04:00
chenyu	d4062cb6fc	NV tensor_cores in kernel.py (#4399 )	2024-05-02 22:33:08 -04:00
qazal	0deaaf2bc8	partial fusion spec (#4398 )	2024-05-03 04:14:23 +03:00
chenyu	2c3b7f8e70	pad resnet training data with training data mean (#4369 ) update model_train resnet to pad training	2024-05-02 20:26:15 -04:00
Francis Lam	3cf8291f2f	mlperf/resnet: update beam params to increase time and quality (#4396 ) * mlperf/resnet: update beam params to increase time and quality * revert upcast 8 in search space and add rocm setup function * refactor to independent setup.sh script	2024-05-02 20:14:46 -04:00
nimlgen	ca6c8ae739	factor out resource access logic in multigraph base class (#4385 ) * factor out resource access logic in multigraph base class * hsa fixes * clean * linter * linter 2 * not need this	2024-05-03 00:38:22 +03:00
chenyu	ab01a9433d	resnet eval 4n+3 if epoch < 33 (#4391 ) the rule is as thoroughly as 4n+k and we can stop the clock as soon as eval hits target. this can save 24 evals or 12 minutes	2024-05-02 16:52:07 -04:00
Francis Lam	7c8401fc65	search: skip timing the unoptimized kernel (#4395 ) * search: skip timing the unoptimized kernel also ensure the return the unoptimized kernel if no opts are valid and refactor debugging to a single BEAM_DEBUG variable * stop early on fast kernels that can't improve enough	2024-05-02 16:48:49 -04:00

... 125 126 127 128 129 ...

10633 Commits