tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 07:18:40 -05:00

Author	SHA1	Message	Date
George Hotz	9fc4465557	subbuffer support (#4397 ) * subbuffer support * diskbuffer offset * cuda subbuffer works * use subbuffer * more subbuffer tests * consecutive * cast * consec * offset * view is a better name * offset is in nbytes * fix view + memory planner * delete unused DiskRunner * reverse order * no subbuffers on unrealized consts * only enabled for disk * don't reverse memory * view supported devices * pickle buffer view * ring jit * support extra view inputs in jit * fix JIT=2 issue * test copy jit * p2p isn't an option anymore * fix dep tracking issue * fix mypy * fix pickle * from_nv is contents now	2024-05-03 18:05:57 -07:00
chenyu	c7368515d2	refactor sparse_categorical_crossentropy (#4406 ) factor out the -1 * and / loss_mask.sum() for both smoothing and non-smoothing terms	2024-05-03 14:28:36 -04:00
qazal	3401734e54	infra for scheduler process replay (#4405 ) * use getenv * capture ast * fix graph * replay schedules * exec	2024-05-03 20:29:13 +03:00
chenyu	473ecb978a	remove SPLIT_REDUCEOP=1 from resnet scripts (#4404 ) SPLIT_REDUCEOP=1 is default	2024-05-03 12:36:23 -04:00
David Hou	b767d59684	resnet trainer: keep old cookie around until next step has been queued (#4401 ) * keep old cookie around until next step has been queued (-10ms 6gpu) * also for eval * drop cookie before data_get? * Revert "drop cookie before data_get?" This reverts commit `b01e6aa2b2`. * Revert "Revert "drop cookie before data_get?"" This reverts commit `23464e73d4`.	2024-05-03 12:15:21 -04:00
qazal	cf3ccb809f	refactor scheduler parents search (#4402 )	2024-05-03 17:16:34 +03:00
George-the-1st	0627e26140	Added missing unittest execution code (#4400 ) same code as on every other test file, just missing from this one for some reason.	2024-05-02 22:34:30 -04:00
chenyu	d4062cb6fc	NV tensor_cores in kernel.py (#4399 )	2024-05-02 22:33:08 -04:00
qazal	0deaaf2bc8	partial fusion spec (#4398 )	2024-05-03 04:14:23 +03:00
chenyu	2c3b7f8e70	pad resnet training data with training data mean (#4369 ) update model_train resnet to pad training	2024-05-02 20:26:15 -04:00
Francis Lam	3cf8291f2f	mlperf/resnet: update beam params to increase time and quality (#4396 ) * mlperf/resnet: update beam params to increase time and quality * revert upcast 8 in search space and add rocm setup function * refactor to independent setup.sh script	2024-05-02 20:14:46 -04:00
nimlgen	ca6c8ae739	factor out resource access logic in multigraph base class (#4385 ) * factor out resource access logic in multigraph base class * hsa fixes * clean * linter * linter 2 * not need this	2024-05-03 00:38:22 +03:00
chenyu	ab01a9433d	resnet eval 4n+3 if epoch < 33 (#4391 ) the rule is as thoroughly as 4n+k and we can stop the clock as soon as eval hits target. this can save 24 evals or 12 minutes	2024-05-02 16:52:07 -04:00
Francis Lam	7c8401fc65	search: skip timing the unoptimized kernel (#4395 ) * search: skip timing the unoptimized kernel also ensure the return the unoptimized kernel if no opts are valid and refactor debugging to a single BEAM_DEBUG variable * stop early on fast kernels that can't improve enough	2024-05-02 16:48:49 -04:00
Francis Lam	5c5b40880f	search: fix edge cases on screening potential ops (#4394 ) * search: fix edge cases on screening potential ops won't change correctness, but will save a little python time by properly deduplicating potential actions * check for de-duplication instead of exact valid actions * refactor long line	2024-05-02 14:53:05 -04:00
George Hotz	89030b238a	add consecutive property to shapetracker	2024-05-02 10:41:28 -07:00
George Hotz	2786dff26d	new disk tensor tests (#4393 )	2024-05-02 08:54:44 -07:00
chenyu	7492e5d3e7	resnet correct log name for red (#4390 )	2024-05-02 10:58:55 -04:00
chenyu	bf31837e6d	resnet correct steps_in_val_epoch in logging (#4389 ) also added random seed from system in scripts	2024-05-02 10:51:36 -04:00
George Hotz	c8a2047377	testing for all reduce (#4387 )	2024-05-02 06:34:10 -07:00
ym555	3113785604	Llama 3 Models (#4339 ) * Full Impl * fix test * Fix inference loop --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-05-02 06:06:07 -07:00
qazal	0b47818e0f	simpler reduceop children chasing (#4350 ) * simplest case * midreduce case * all tests * pending things * unify tests	2024-05-02 15:15:30 +03:00
chenyu	22376e53b7	resnet mlperf logging (#4361 ) * resnet mlperf logging * cropping too much?	2024-05-02 00:00:04 -04:00
George Hotz	f635c4d273	fix define global (#4383 ) * fix define global * remove name from DEFINE_GLOBAL * fix fuzzing * fix ptx * fix python	2024-05-01 22:32:56 -04:00
chenyu	ad116dc5c6	fill in mlperf system description (#4381 ) it did not ask too many details. will put software versions later with tinygrad commit. ``` python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_red.json training 4.0.0 INFO - System description checker passed for tinybox red ``` ``` python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_green.json training 4. 0.0 INFO - System description checker passed for tinybox green ```	2024-05-01 16:47:45 -04:00
chenyu	9358b62073	rename resnet script to dev_beam.sh and dev_run.sh (#4379 ) final run_and_time needs to be one script for both. rename the old scripts	2024-05-01 14:41:35 -04:00
chenyu	6628e13a5f	pad resnet eval data in model_train (#4374 ) asserted if eval sample count is different from total eval file count.	2024-05-01 14:33:42 -04:00
George Hotz	105fbd7925	add 3080 support to NV	2024-05-01 11:17:01 -07:00
chenyu	826cccd54d	fix mean underflow for half tensor (#4377 ) * fix mean underflow for half tensor divide only the reduce factor. added unit test and non-nan assertion in resnet training. also added a failed test cast for symbolic shape var * skip for python backend	2024-05-01 13:38:57 -04:00
chenyu	dce7ac0160	NOCLANG=1 for tinybox green ci. (#4378 ) CLANG was disabled for tinybox red for speed	2024-05-01 13:31:01 -04:00
George Hotz	272bea5100	GraphRunner (#4375 ) * GraphRunner * new metal graph * update hsa for graph runner * put var_vals back * move that clear after the capture	2024-05-01 10:27:13 -07:00
chenyu	077ea6926c	remove downcast_half in sum (#4376 ) breaks boolean mean and other stuff	2024-05-01 11:46:44 -04:00
George Hotz	bd49d2854a	hotfix: skip fetch tests always	2024-05-01 08:43:26 -07:00
George Hotz	b683d0f496	hotfix: 100% accuracy is wrong	2024-05-01 08:07:18 -07:00
George Hotz	8bcf533a84	gitignore open-images-v6TEST	2024-05-01 13:55:38 +00:00
qazal	ea06f657df	fusion tests from test_opt (#4357 ) * opt tests * more sgd * batchnorm * models stay in external	2024-05-01 16:44:12 +03:00
George Hotz	995d264666	hotfix: add CNAME to put docs at docs.tinygrad.org	2024-04-30 23:17:35 -07:00
chenyu	683b7c605a	pad first batch of imagenet dataloader and update eval (#4368 ) * pad first batch of imagenet dataloader and update eval * pad zero instead of empty for training	2024-05-01 00:21:52 -04:00
wozeparrot	4a26718ca9	feat: tinyboxgreen (#4365 )	2024-04-30 19:05:37 -04:00
Francis Lam	16838eae08	mlperf/resnet: update tinybox_red parameters to new best values (#4364 ) about 27 minutes to setup and 345ms/110TF steps	2024-04-30 18:08:12 -04:00
George Hotz	27ee49bf30	tensor variable (#4362 ) * tensor variable support * consttype without variable? * __setitem__ * symbolic mean works * arange test * more tests * a few more tests	2024-04-30 14:08:57 -07:00
nimlgen	d2f89615b2	remove aql remnants in amd (#4346 )	2024-04-30 23:36:02 +03:00
Francis Lam	0d33c54d99	kernel: change PADTO check to allow up to 4x padding (#4354 ) * kernel: change PADTO check to allow up to 4x padding also optionally remove PADTO from the search action space with BEAM_PADTO=0. * fix test_linearizer test_tensor_cores_padded tests * update resnet runs to use SPLIT_REDUCEOP=1 * fix up search TC axis and amt checking * fix up the dimensions of the TC tests	2024-04-30 15:29:34 -04:00
Elias Wahl	babe87a8ae	BERT: Checkpoint loading tests (#4359 ) * Move checkpoint init to helpers. Add test * linters * Move the steps outside of the main train loop * Move data_get * data_get belongs to helpers	2024-04-30 14:43:41 -04:00
Francis Lam	c12bcabb07	search: fix actions space checks to ignore TC axis and amt (#4360 ) * search: fix actions space checks to ignore TC axis and amt * add test for number of actions in get_linearizer_actions	2024-04-30 14:02:22 -04:00
chenyu	fdc8fabae5	disable flaky mac gpt2 beam benchmark and add back cifar mac with JIT=2 (#4358 ) * debug flaky mac gpt2 beam run * disable for now	2024-04-30 10:41:37 -04:00
George Hotz	d325be2540	update docs (#4356 ) * update docs * nn.md * mnist cleanups * rhip test is very slow	2024-04-30 16:51:42 +09:00
Sohaib	a2d81514fd	just get dtype from kwargs (#4355 )	2024-04-30 16:26:14 +09:00
Francis Lam	a9a1fa6bbf	wmma: add reduce axis choice to TC action space (#4328 ) * wmma: add reduce axis choice to TC action space * add test for TC multi-reduce axis choice	2024-04-29 19:15:39 -04:00
chenyu	93abcd3113	fix function.py sum backward without downcast_half (#4353 ) without downcast_half, sum output dtype can be different from input dtype. cast back to input dtype in function.py	2024-04-29 17:53:02 -04:00

1 2 3 4 5 ...

4297 Commits