tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 06:18:01 -05:00

Author	SHA1	Message	Date
qazal	4ffb722d4e	var_vals prereq for deleting LBScheduleItem [run_process_replay] (#6511 )	2024-09-14 17:00:30 +08:00
George Hotz	9188245677	Viz (#6502 ) * start viz tool * start work * more readme * graceful shutdown that reloader * add VIZ=1 * aesthetics * typings * more work * work left * more work on rewrites saving * maybe try zoom * add some metadata * generic extra, show code and ast * more tooling * add rewritten graphs * show graph_rewrites * small details * more diff cleanups * differ as the cherry on top * no useless styles --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-09-14 16:15:29 +08:00
nimlgen	052bf43ed4	dsp check buffers count (#6509 )	2024-09-14 10:16:58 +03:00
qazal	ee5902d347	hotfix: remove rewrite.py from ops [run_process_replay] (#6508 )	2024-09-14 10:02:47 +08:00
nimlgen	81a4a9623c	add qcom dsp runtime (#6112 ) * calling qualcomm dsp from python * include so files * add include file * adsprpc.py * running with adsprpc * work * 32-bit support in elf * compilation works * ion * msm_ion * working DSP backend * getting 500 MFLOPS on matmul * beam works with timing * move to autogen * disasm * progress * simple tests pass * qcom_dsp * more dsp autogen * progress * some progress * works w/o lib * checkpoint * no lib * ugh, better * cleaner, but with lib. test good, but with the hack * remove autogens * small * push * simpler * revert this * run_3 * simpler * android * handle * run it * why? * run2 * to gen * cc * cleaner * elf * part of autogen * comemnt * no lib * autohen * linter * bug reproducer * cleaner * this repro is almost empty and doesn't work!!!! * with this test_ops passes, no crashes anymore * cleaner * linter * renames * shorter * remoev contextlib * ugh * myoy * cleaner * cleaner * remove import * conn * import * revert this * remove heavy .so * shorter alloc * not tue anymore --------- Co-authored-by: Comma Device <device@comma.ai> Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <george@comma.ai>	2024-09-13 21:01:33 +03:00
nimlgen	ca63207d23	clang compiler args (#6505 )	2024-09-13 19:22:27 +03:00
George Hotz	774bf39f85	saving rewrites [run_process_replay] (#6501 ) * save rewrites with TRACK_MATCH_STATS=2 [run_process_replay] * cleaner	2024-09-13 15:02:27 +08:00
Tim Becker	7c078191ce	Misc rewrite perf improvements (#6500 ) * Make UOp a normal class and use __slots__ * Use __slots__ in UPat * Cache dtypes.{min,max} * Use faster iterables in ops.py * extend is a lot faster than nested listcomp Co-authored-by: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com> --------- Co-authored-by: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com>	2024-09-13 11:31:50 +08:00
Tim Becker	8c4cab8d6e	Even faster enums (#6483 ) * Even faster enums * simpler _generate_next_value impl * FastEnum in ops only * Better uniqueness for FastEnum	2024-09-12 20:08:02 +08:00
George Hotz	9543e4c92e	more expand prereqs [run_process_replay] (#6499 )	2024-09-12 17:46:12 +08:00
George Hotz	327eb12600	folding for vectorized consts [run_process_replay] (#6498 ) * folding for vectorized consts [run_process_replay] * remove that if statement * inf loop	2024-09-12 17:29:37 +08:00
George Hotz	a532d59bbd	gep tuple [run_process_replay] (#6495 ) * gep tuple [run_process_replay] * no inf loop, that goes in expander * fix ops python * unbreak gep 0 * fix tests * fix tests * VECTORIZE/GEP * oops, broken	2024-09-12 16:37:31 +08:00
George Hotz	6dfa63cb21	more vconst stuff + gep tuple [run_process_replay] (#6494 ) * more vconst stuff [run_process_replay] * revert that * fix inf loop	2024-09-12 14:58:14 +08:00
qazal	4507ab8016	more upat styling changes [run_process_replay] (#6492 ) * more upat styling * single to doulbe quotes * wrap line * comments	2024-09-12 14:40:16 +08:00
qazal	63ea446339	s/None/dtypes.void in docs [run_process_replay] (#6493 ) * s/None/dtypes.void in docs [run_process_replay] * not arg * now the asts in docs * more fixup	2024-09-12 14:27:37 +08:00
George Hotz	119b0ea4af	add UOps.VCONST [run_process_replay] (#6487 ) * add UOps.VCONST [run_process_replay] * VCONST folding * simpler devectorize * alu * revert that type	2024-09-12 14:03:39 +08:00
qazal	4dc9436d63	use more UPat.var and UPat.cvar [run_process_replay] (#6491 )	2024-09-12 13:52:41 +08:00
qazal	e5e14fc4ef	all UOp methods need dtype [run_process_replay] (#6490 ) * all UOp methods need dtype [run_process_replay] * delete all type: ignores yay	2024-09-12 13:38:14 +08:00
George Hotz	76487a3533	remove nop, use upat [run_process_replay] (#6489 ) * remove nop, use upat [run_process_replay] * mypy passes * no wonder nothing worked * fixes	2024-09-12 12:16:19 +08:00
George Hotz	f12f0857d8	add UOps.VCONST (just the uop) [run_process_replay] (#6488 ) * empty branch process replay * add VCONST	2024-09-12 11:16:20 +08:00
qazal	00d4bf16d8	new utils for scheduler graph rewrite [run_process_replay] (#6485 )	2024-09-12 10:01:24 +08:00
qazal	a17ea53340	delete USE_COPY_KERNEL from the scheduler [run_process_replay] (#6482 )	2024-09-12 07:45:31 +08:00
nimlgen	eac046ea55	hcq check queue size before submit (#6481 )	2024-09-11 23:13:13 +03:00
qazal	dda5c63f4a	things we can delete after dtypes.void [run_process_replay] (#6480 )	2024-09-11 19:21:41 +08:00
qazal	bce73c9a54	more scheduler graph_rewrite cleanups [run_process_replay] (#6479 )	2024-09-11 18:26:35 +08:00
George Hotz	bdd0c06f29	add void type to uop (#6471 ) * unwrap_dtype maybe * uopgraph stuff that hardcoded None * test_ops passes * dtypes.py fixups * update test_linearizer and friends * more ast updates * test_beam and test_schedule too * add void type to uop [run_process_replay] * remove dumb casts * start making it green * more cast cleanups * more cls methods to fix * regenerate dataset * split UOp and NOp const * maybe that too * fix docs * update test_uop_symbolic * test_verify_ast * new sops with no diff * meh, type_ignore is alright * remove that assert --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-09-11 18:16:28 +08:00
George Hotz	1b4d1823b7	add pyint to DTYPES_DICT [run_process_replay] (#6477 ) * add pyint to DTYPES_DICT [run_process_replay] * also fix uop alu bug * exclude pyint there too * ne ne * force explicit dtype	2024-09-11 17:31:59 +08:00
qazal	5cc142c8b8	add uop.swizzle(st) (#6476 )	2024-09-11 16:52:42 +08:00
qazal	78148e16d8	init changes from the dtypes_void branch [run_process_replay] (#6475 )	2024-09-11 16:34:50 +08:00
qazal	d6d9234985	cleanup some scheduler rewrites [run_process_replay] (#6474 )	2024-09-11 16:10:59 +08:00
George Hotz	1cadddee26	Revert "fold lt (#6472 )" (#6473 ) This reverts commit `81bda4d304`.	2024-09-11 15:59:25 +08:00
George Hotz	81bda4d304	fold lt (#6472 )	2024-09-11 15:56:57 +08:00
qazal	e645a0e766	allow selecting UPat files in TRACK_MATCH_STATS [run_process_replay] (#6470 )	2024-09-11 14:32:46 +08:00
qazal	3cde1503ce	enable graph rewrite in the scheduler (#6249 ) * test: enable * skip those * skip pads tests	2024-09-11 14:30:04 +08:00
chenyu	d9d1ae7248	more lt folding using gcd (#6469 )	2024-09-11 02:09:35 -04:00
madt2709	dfe1db1cff	Fix typo in docs (#6468 ) Co-authored-by: theordias <theo.dias@cresta.ai>	2024-09-11 01:47:26 -04:00
qazal	262569a3eb	green conv bw AST_REWRITE=1 (#6466 ) * green conv bw AST_REWRITE=1 * new strides and dtype fix	2024-09-11 10:51:24 +08:00
chenyu	15c4d4f406	fold unrolled arange div pattern (#6465 )	2024-09-10 22:35:52 -04:00
qazal	4259311006	merge views in conv swizzle (#6464 )	2024-09-11 10:11:01 +08:00
George Hotz	6d195fb653	small changes from new style expand [run_process_replay] (#6462 )	2024-09-11 09:10:56 +08:00
qazal	803b8b9313	conv bw schedule and correctness tests to iterate on (#6461 ) first to fix AST_REWRITE=1, then to implement the same fusion for dtypes.half.	2024-09-11 08:47:07 +08:00
chenyu	b574caadc9	fix UOp const_factor for ADD [run_process_replay] (#6459 ) currently not used, fixed for completeness	2024-09-10 20:04:26 -04:00
chenyu	2105832b87	_min_max of MUL of 2 non-positive inputs (#6454 )	2024-09-10 07:13:01 -04:00
Francis Lata	b7ce9a1530	UNet3D MLPerf (#3470 ) * add training set transforms * add DICE cross entropy loss * convert pred and label to Tensor when calculating DICE score * cleanups and allow train dataset batching * fix DICE CE loss calculation * jitted training step * clean up DICE CE loss calculation * initial support for sharding * Revert "initial support for sharding" This reverts commit `e3670813b8`. * minor updates * cleanup imports * add support for sharding * apply temp patch to try to avoid OOM * revert cstyle changes * add gradient acc * hotfix * add FP16 support * add ability to train on smaller image sizes * add support for saving and loading checkpoints + cleanup some various modes * fix issue with using smaller patch size + update W&B logging * disable LR_WARMUP_EPOCHS * updates * minor cleanups * cleanup * update order of transformations * more cleanups * realize loss * cleanup * more cleanup * some cleanups * add RAM usage * minor cleanups * add support for gradient accumulation * cleanup imports * minor updates to not use GA_STEPS * remove FP16 option since it's available now globally * update multi-GPU setup * add timing logs for training loop * go back to using existing dataloader and add ability to preprocess data to save time * clean up optimization and re-enable JIT and multi-GPU support for training and evaluation * free train and eval steps memory * cleanups and scale batch size based on the number of GPUs * fix GlobalCounters import * fix seed * fix W&B setup * update batch size default size * add back metric divergence check * put back JIT on UNet3d eval * move dataset preprocessing inside training code * add test for dice_loss * add config logging support to W&B and other cleanups * change how default float is getting retrieved * remove TinyJit import duplicate * update config logging to W&B and remove JIT on eval_step * no need for caching preprocessed data anymore * fix how evaluation is ran and how often * add support for LR scaling * fix issue with gaussian being moved to scipy.signal.windows * remove DICE loss unit test * fix issue where loss isn't compatible with multiGPU * add individual BEAM control for train and eval steps * fix ndimage scipy import * add BENCHMARK * cleanups on BENCHMARK + fix on rand_flip augmentation during training * cleanup train and eval BEAM envs * add checkpointing support after every eval * cleanup model_eval * disable grad during eval * use new preprocessing dataset mechanism * remove unused import * use training and inference_mode contexts * start eval after benchmarking * add data fetching time * cleanup decorators * more cleanups on training script * add message during benchmarking mode * realize when reassigning LR on scheduler and update default number of epochs * add JIT on eval step * remove JIT on eval_step * add train dataloader for unet3d * move checkpointing to be done after every epoch * revert removal of JIT on unet3d inference * save checkpoint if metric is not successful * Revert "add train dataloader for unet3d" This reverts commit `c166d129df`. * Revert "Revert "add train dataloader for unet3d"" This reverts commit `36366c65d2`. * hotfix: seed was defaulting to a value of 0 * fix SEED value * remove the usage of context managers for setting BEAM and going from training to inference * support new stack API for calculating eval loss and metric * Revert "remove the usage of context managers for setting BEAM and going from training to inference" This reverts commit `2c0ba8d322`. * check training and test preprocessed folders separately * clean up imports and log FUSE_CONV_BW * use train and val preprocessing constants * add kits19 dataset setup script * update to use the new test decorator for disabling grad * update kits19 dataset setup script * add docs on how to train the model * set default value for BASEDIR * add detailed instruction about BASEDIR usage --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-09-10 04:37:28 -04:00
qazal	f4f705a07c	can push SWIZZLE through reduce both ways (#6453 )	2024-09-10 16:00:50 +08:00
qazal	1347e49e82	second iteration on UOps.SWIZZLE (#6451 ) * new swizzle * fix the failing tests * test a double swizzle * ci	2024-09-10 14:43:21 +08:00
chenyu	e0d35e3657	update test_padto_sum_not_ok (#6450 ) updated the setup as `exp() < -1` could be folded to False	2024-09-09 22:46:42 -04:00
qazal	95c9fe841e	UOp.st infra for the new SWIZZLE (#6449 )	2024-09-10 09:39:45 +08:00
qazal	abfbd9fd2f	fix Variable init from the DEFINE_VAR refactor (#6448 ) prereq for UOps.VALID.	2024-09-10 09:14:29 +08:00
chenyu	fcc69adfc5	simplify c0x<c1 for negative int c0,c1 (#6431 ) simplify c0x<c1 for negative int c0,c1 fine if rhs is zero	2024-09-09 21:05:53 -04:00

1 2 3 4 5 ...

5911 Commits