Commit Graph

10490 Commits

Author SHA1 Message Date
qazal
a17ea53340 delete USE_COPY_KERNEL from the scheduler [run_process_replay] (#6482) 2024-09-12 07:45:31 +08:00
nimlgen
eac046ea55 hcq check queue size before submit (#6481) 2024-09-11 23:13:13 +03:00
qazal
dda5c63f4a things we can delete after dtypes.void [run_process_replay] (#6480) 2024-09-11 19:21:41 +08:00
qazal
bce73c9a54 more scheduler graph_rewrite cleanups [run_process_replay] (#6479) 2024-09-11 18:26:35 +08:00
George Hotz
bdd0c06f29 add void type to uop (#6471)
* unwrap_dtype maybe

* uopgraph stuff that hardcoded None

* test_ops passes

* dtypes.py fixups

* update test_linearizer and friends

* more ast updates

* test_beam and test_schedule too

* add void type to uop [run_process_replay]

* remove dumb casts

* start making it green

* more cast cleanups

* more cls methods to fix

* regenerate dataset

* split UOp and NOp const

* maybe that too

* fix docs

* update test_uop_symbolic

* test_verify_ast

* new sops with no diff

* meh, type_ignore is alright

* remove that assert

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-11 18:16:28 +08:00
George Hotz
1b4d1823b7 add pyint to DTYPES_DICT [run_process_replay] (#6477)
* add pyint to DTYPES_DICT [run_process_replay]

* also fix uop alu bug

* exclude pyint there too

* ne ne

* force explicit dtype
2024-09-11 17:31:59 +08:00
qazal
5cc142c8b8 add uop.swizzle(st) (#6476) 2024-09-11 16:52:42 +08:00
qazal
78148e16d8 init changes from the dtypes_void branch [run_process_replay] (#6475) 2024-09-11 16:34:50 +08:00
qazal
d6d9234985 cleanup some scheduler rewrites [run_process_replay] (#6474) 2024-09-11 16:10:59 +08:00
George Hotz
1cadddee26 Revert "fold lt (#6472)" (#6473)
This reverts commit 81bda4d304.
2024-09-11 15:59:25 +08:00
George Hotz
81bda4d304 fold lt (#6472) 2024-09-11 15:56:57 +08:00
qazal
e645a0e766 allow selecting UPat files in TRACK_MATCH_STATS [run_process_replay] (#6470) 2024-09-11 14:32:46 +08:00
qazal
3cde1503ce enable graph rewrite in the scheduler (#6249)
* test: enable

* skip those

* skip pads tests
2024-09-11 14:30:04 +08:00
chenyu
d9d1ae7248 more lt folding using gcd (#6469) 2024-09-11 02:09:35 -04:00
madt2709
dfe1db1cff Fix typo in docs (#6468)
Co-authored-by: theordias <theo.dias@cresta.ai>
2024-09-11 01:47:26 -04:00
qazal
262569a3eb green conv bw AST_REWRITE=1 (#6466)
* green conv bw AST_REWRITE=1

* new strides and dtype fix
2024-09-11 10:51:24 +08:00
chenyu
15c4d4f406 fold unrolled arange div pattern (#6465) 2024-09-10 22:35:52 -04:00
qazal
4259311006 merge views in conv swizzle (#6464) 2024-09-11 10:11:01 +08:00
George Hotz
6d195fb653 small changes from new style expand [run_process_replay] (#6462) 2024-09-11 09:10:56 +08:00
qazal
803b8b9313 conv bw schedule and correctness tests to iterate on (#6461)
first to fix AST_REWRITE=1, then to implement the same fusion for dtypes.half.
2024-09-11 08:47:07 +08:00
chenyu
b574caadc9 fix UOp const_factor for ADD [run_process_replay] (#6459)
currently not used, fixed for completeness
2024-09-10 20:04:26 -04:00
chenyu
2105832b87 _min_max of MUL of 2 non-positive inputs (#6454) 2024-09-10 07:13:01 -04:00
Francis Lata
b7ce9a1530 UNet3D MLPerf (#3470)
* add training set transforms

* add DICE cross entropy loss

* convert pred and label to Tensor when calculating DICE score

* cleanups and allow train dataset batching

* fix DICE CE loss calculation

* jitted training step

* clean up DICE CE loss calculation

* initial support for sharding

* Revert "initial support for sharding"

This reverts commit e3670813b8.

* minor updates

* cleanup imports

* add support for sharding

* apply temp patch to try to avoid OOM

* revert cstyle changes

* add gradient acc

* hotfix

* add FP16 support

* add ability to train on smaller image sizes

* add support for saving and loading checkpoints + cleanup some various modes

* fix issue with using smaller patch size + update W&B logging

* disable LR_WARMUP_EPOCHS

* updates

* minor cleanups

* cleanup

* update order of transformations

* more cleanups

* realize loss

* cleanup

* more cleanup

* some cleanups

* add RAM usage

* minor cleanups

* add support for gradient accumulation

* cleanup imports

* minor updates to not use GA_STEPS

* remove FP16 option since it's available now globally

* update multi-GPU setup

* add timing logs for training loop

* go back to using existing dataloader and add ability to preprocess data to save time

* clean up optimization and re-enable JIT and multi-GPU support for training and evaluation

* free train and eval steps memory

* cleanups and scale batch size based on the number of GPUs

* fix GlobalCounters import

* fix seed

* fix W&B setup

* update batch size default size

* add back metric divergence check

* put back JIT on UNet3d eval

* move dataset preprocessing inside training code

* add test for dice_loss

* add config logging support to W&B and other cleanups

* change how default float is getting retrieved

* remove TinyJit import duplicate

* update config logging to W&B and remove JIT on eval_step

* no need for caching preprocessed data anymore

* fix how evaluation is ran and how often

* add support for LR scaling

* fix issue with gaussian being moved to scipy.signal.windows

* remove DICE loss unit test

* fix issue where loss isn't compatible with multiGPU

* add individual BEAM control for train and eval steps

* fix ndimage scipy import

* add BENCHMARK

* cleanups on BENCHMARK + fix on rand_flip augmentation during training

* cleanup train and eval BEAM envs

* add checkpointing support after every eval

* cleanup model_eval

* disable grad during eval

* use new preprocessing dataset mechanism

* remove unused import

* use training and inference_mode contexts

* start eval after benchmarking

* add data fetching time

* cleanup decorators

* more cleanups on training script

* add message during benchmarking mode

* realize when reassigning LR on scheduler and update default number of epochs

* add JIT on eval step

* remove JIT on eval_step

* add train dataloader for unet3d

* move checkpointing to be done after every epoch

* revert removal of JIT on unet3d inference

* save checkpoint if metric is not successful

* Revert "add train dataloader for unet3d"

This reverts commit c166d129df.

* Revert "Revert "add train dataloader for unet3d""

This reverts commit 36366c65d2.

* hotfix: seed was defaulting to a value of 0

* fix SEED value

* remove the usage of context managers for setting BEAM and going from training to inference

* support new stack API for calculating eval loss and metric

* Revert "remove the usage of context managers for setting BEAM and going from training to inference"

This reverts commit 2c0ba8d322.

* check training and test preprocessed folders separately

* clean up imports and log FUSE_CONV_BW

* use train and val preprocessing constants

* add kits19 dataset setup script

* update to use the new test decorator for disabling grad

* update kits19 dataset setup script

* add docs on how to train the model

* set default value for BASEDIR

* add detailed instruction about BASEDIR usage

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-10 04:37:28 -04:00
qazal
f4f705a07c can push SWIZZLE through reduce both ways (#6453) 2024-09-10 16:00:50 +08:00
qazal
1347e49e82 second iteration on UOps.SWIZZLE (#6451)
* new swizzle

* fix the failing tests

* test a double swizzle

* ci
2024-09-10 14:43:21 +08:00
chenyu
e0d35e3657 update test_padto_sum_not_ok (#6450)
updated the setup as `exp() < -1` could be folded to False
2024-09-09 22:46:42 -04:00
qazal
95c9fe841e UOp.st infra for the new SWIZZLE (#6449) 2024-09-10 09:39:45 +08:00
qazal
abfbd9fd2f fix Variable init from the DEFINE_VAR refactor (#6448)
prereq for UOps.VALID.
2024-09-10 09:14:29 +08:00
chenyu
fcc69adfc5 simplify c0*x<c1 for negative int c0,c1 (#6431)
* simplify c0*x<c1 for negative int c0,c1

* fine if rhs is zero
2024-09-09 21:05:53 -04:00
kormann
f6f4f3222f whisper long batch (#6335)
* reset

* test

* only part refactor
2024-09-09 21:03:59 -04:00
qazal
29e63097a0 st is a cached_property on UOp [run_process_replay] (#6433) 2024-09-10 08:30:35 +08:00
qazal
cf64f8bb40 start with the UOps.VALID spec [run_process_replay] (#6435)
* document UOps.VALID [run_process_replay]

* now the assert
2024-09-10 08:00:19 +08:00
Tim Becker
58a1b4f427 Faster UOp hashing (#6447)
* Faster hashing of Enums and UOp

* NOp should not define __eq__

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-10 07:16:04 +08:00
George Hotz
92e4126793 Revert "Revert "RewriteContext [run_process_replay] (#6428)" (#6438)" (#6443)
This reverts commit e7dd08448f.
2024-09-10 07:00:26 +08:00
George Hotz
904f6a63fa Revert "Revert "cleanup process_replay/* namings [run_process_replay] (#6429)…" (#6442)
This reverts commit eda177da84.
2024-09-10 07:00:16 +08:00
nimlgen
8d3450ceab qcom remove unused commands (#6445)
* qcom remove unused commands

* linetr
2024-09-09 20:26:07 +03:00
nimlgen
f63a9fd649 hcq _cur_cmd_idx for readability (#6444)
* hcq _cur_cmd_idx for readability

* linter
2024-09-09 20:04:45 +03:00
George Hotz
dbd4536167 Revert "add UOps.VALID (#6387)" (#6441)
This reverts commit 8186e4e7d6.
2024-09-09 21:33:00 +08:00
George Hotz
e7dd08448f Revert "RewriteContext [run_process_replay] (#6428)" (#6438)
This reverts commit e1d61b048b.
2024-09-09 18:53:18 +08:00
George Hotz
eda177da84 Revert "cleanup process_replay/* namings [run_process_replay] (#6429)" (#6437)
This reverts commit f4e83b30b4.
2024-09-09 18:52:36 +08:00
George Hotz
d5bd38c278 add min max rule for expand [run_process_replay] (#6434) 2024-09-09 18:30:20 +08:00
George Hotz
42e5c8335e remove args from min/max [run_process_replay] (#6430)
* remove args from min/max [run_process_replay]

* it's a ConstType

* sconst_like unused

* any const is fine
2024-09-09 18:18:20 +08:00
qazal
f4e83b30b4 cleanup process_replay/* namings [run_process_replay] (#6429) 2024-09-09 16:59:04 +08:00
George Hotz
8186e4e7d6 add UOps.VALID (#6387)
* uops valid

* broke full_shape

* fixup that st (hardcoded asts still red)

* fixup DEFINE_VAR

debug

more debug

* start moving stuff to ast_const

* move test_linearizer

* move test_linearizer_failures to ast_const

* fixup test_schedule

* small diff change

* regenerate dataset

* fixup test_multitensor

* regen dataset try 2

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-09 16:58:43 +08:00
George Hotz
e1d61b048b RewriteContext [run_process_replay] (#6428) 2024-09-09 16:49:02 +08:00
qazal
935b6b658f delete seen from the scheduler api [run_process_replay] (#6427)
docs
2024-09-09 16:26:34 +08:00
George Hotz
6c7abd18df non-optional bounds (faster) [run_process_replay] (#6425)
* non-optional bounds (faster) [run_process_replay]

* pre-fetch min/max

* Revert "pre-fetch min/max"

This reverts commit cdd71840c5.
2024-09-09 16:00:16 +08:00
qazal
c5bae55ec8 new generate_dataset.sh (#6423)
* new generate_dataset.sh

* keep those there

* test: rm expected failures

* rename to extract
2024-09-09 15:13:07 +08:00
chenyu
1941e66cc9 real strides with uops (#6365)
* real strides with uops [run_process_replay]

* compare with old

* Revert "compare with old"

This reverts commit f53a8d4276.

* make those @unittest.expectedFailure
2024-09-09 03:06:27 -04:00
chenyu
ac98f5056e move lt-folding to a function [run_process_replay] (#6422)
and added more tests (some failed to match symbolic)
2024-09-09 02:04:52 -04:00