Commit Graph

5911 Commits

Author SHA1 Message Date
qazal
4ffb722d4e var_vals prereq for deleting LBScheduleItem [run_process_replay] (#6511) 2024-09-14 17:00:30 +08:00
George Hotz
9188245677 Viz (#6502)
* start viz tool

* start work

* more readme

* graceful shutdown that reloader

* add VIZ=1

* aesthetics

* typings

* more work

* work left

* more work on rewrites saving

* maybe try zoom

* add some metadata

* generic extra, show code and ast

* more tooling

* add rewritten graphs

* show graph_rewrites

* small details

* more diff cleanups

* differ as the cherry on top

* no useless styles

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-14 16:15:29 +08:00
nimlgen
052bf43ed4 dsp check buffers count (#6509) 2024-09-14 10:16:58 +03:00
qazal
ee5902d347 hotfix: remove rewrite.py from ops [run_process_replay] (#6508) 2024-09-14 10:02:47 +08:00
nimlgen
81a4a9623c add qcom dsp runtime (#6112)
* calling qualcomm dsp from python

* include so files

* add include file

* adsprpc.py

* running with adsprpc

* work

* 32-bit support in elf

* compilation works

* ion

* msm_ion

* working DSP backend

* getting 500 MFLOPS on matmul

* beam works with timing

* move to autogen

* disasm

* progress

* simple tests pass

* qcom_dsp

* more dsp autogen

* progress

* some progress

* works w/o lib

* checkpoint

* no lib

* ugh, better

* cleaner, but with lib. test good, but with the hack

* remove autogens

* small

* push

* simpler

* revert this

* run_3

* simpler

* android

* handle

* run it

* why?

* run2

* to gen

* cc

* cleaner

* elf

* part of autogen

* comemnt

* no lib

* autohen

* linter

* bug reproducer

* cleaner

* this repro is almost empty and doesn't work!!!!

* with this test_ops passes, no crashes anymore

* cleaner

* linter

* renames

* shorter

* remoev contextlib

* ugh

* myoy

* cleaner

* cleaner

* remove import

* conn

* import

* revert this

* remove heavy .so

* shorter alloc

* not tue anymore

---------

Co-authored-by: Comma Device <device@comma.ai>
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <george@comma.ai>
2024-09-13 21:01:33 +03:00
nimlgen
ca63207d23 clang compiler args (#6505) 2024-09-13 19:22:27 +03:00
George Hotz
774bf39f85 saving rewrites [run_process_replay] (#6501)
* save rewrites with TRACK_MATCH_STATS=2 [run_process_replay]

* cleaner
2024-09-13 15:02:27 +08:00
Tim Becker
7c078191ce Misc rewrite perf improvements (#6500)
* Make UOp a normal class and use __slots__

* Use __slots__ in UPat

* Cache dtypes.{min,max}

* Use faster iterables in ops.py

* extend is a lot faster than nested listcomp

Co-authored-by: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com>

---------

Co-authored-by: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com>
2024-09-13 11:31:50 +08:00
Tim Becker
8c4cab8d6e Even faster enums (#6483)
* Even faster enums

* simpler _generate_next_value impl

* FastEnum in ops only

* Better uniqueness for FastEnum
2024-09-12 20:08:02 +08:00
George Hotz
9543e4c92e more expand prereqs [run_process_replay] (#6499) 2024-09-12 17:46:12 +08:00
George Hotz
327eb12600 folding for vectorized consts [run_process_replay] (#6498)
* folding for vectorized consts [run_process_replay]

* remove that if statement

* inf loop
2024-09-12 17:29:37 +08:00
George Hotz
a532d59bbd gep tuple [run_process_replay] (#6495)
* gep tuple [run_process_replay]

* no inf loop, that goes in expander

* fix ops python

* unbreak gep 0

* fix tests

* fix tests

* VECTORIZE/GEP

* oops, broken
2024-09-12 16:37:31 +08:00
George Hotz
6dfa63cb21 more vconst stuff + gep tuple [run_process_replay] (#6494)
* more vconst stuff [run_process_replay]

* revert that

* fix inf loop
2024-09-12 14:58:14 +08:00
qazal
4507ab8016 more upat styling changes [run_process_replay] (#6492)
* more upat styling

* single to doulbe quotes

* wrap line

* comments
2024-09-12 14:40:16 +08:00
qazal
63ea446339 s/None/dtypes.void in docs [run_process_replay] (#6493)
* s/None/dtypes.void in docs [run_process_replay]

* not arg

* now the asts in docs

* more fixup
2024-09-12 14:27:37 +08:00
George Hotz
119b0ea4af add UOps.VCONST [run_process_replay] (#6487)
* add UOps.VCONST [run_process_replay]

* VCONST folding

* simpler devectorize

* alu

* revert that type
2024-09-12 14:03:39 +08:00
qazal
4dc9436d63 use more UPat.var and UPat.cvar [run_process_replay] (#6491) 2024-09-12 13:52:41 +08:00
qazal
e5e14fc4ef all UOp methods need dtype [run_process_replay] (#6490)
* all UOp methods need dtype [run_process_replay]

* delete all type: ignores yay
2024-09-12 13:38:14 +08:00
George Hotz
76487a3533 remove nop, use upat [run_process_replay] (#6489)
* remove nop, use upat [run_process_replay]

* mypy passes

* no wonder nothing worked

* fixes
2024-09-12 12:16:19 +08:00
George Hotz
f12f0857d8 add UOps.VCONST (just the uop) [run_process_replay] (#6488)
* empty branch process replay

* add VCONST
2024-09-12 11:16:20 +08:00
qazal
00d4bf16d8 new utils for scheduler graph rewrite [run_process_replay] (#6485) 2024-09-12 10:01:24 +08:00
qazal
a17ea53340 delete USE_COPY_KERNEL from the scheduler [run_process_replay] (#6482) 2024-09-12 07:45:31 +08:00
nimlgen
eac046ea55 hcq check queue size before submit (#6481) 2024-09-11 23:13:13 +03:00
qazal
dda5c63f4a things we can delete after dtypes.void [run_process_replay] (#6480) 2024-09-11 19:21:41 +08:00
qazal
bce73c9a54 more scheduler graph_rewrite cleanups [run_process_replay] (#6479) 2024-09-11 18:26:35 +08:00
George Hotz
bdd0c06f29 add void type to uop (#6471)
* unwrap_dtype maybe

* uopgraph stuff that hardcoded None

* test_ops passes

* dtypes.py fixups

* update test_linearizer and friends

* more ast updates

* test_beam and test_schedule too

* add void type to uop [run_process_replay]

* remove dumb casts

* start making it green

* more cast cleanups

* more cls methods to fix

* regenerate dataset

* split UOp and NOp const

* maybe that too

* fix docs

* update test_uop_symbolic

* test_verify_ast

* new sops with no diff

* meh, type_ignore is alright

* remove that assert

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-11 18:16:28 +08:00
George Hotz
1b4d1823b7 add pyint to DTYPES_DICT [run_process_replay] (#6477)
* add pyint to DTYPES_DICT [run_process_replay]

* also fix uop alu bug

* exclude pyint there too

* ne ne

* force explicit dtype
2024-09-11 17:31:59 +08:00
qazal
5cc142c8b8 add uop.swizzle(st) (#6476) 2024-09-11 16:52:42 +08:00
qazal
78148e16d8 init changes from the dtypes_void branch [run_process_replay] (#6475) 2024-09-11 16:34:50 +08:00
qazal
d6d9234985 cleanup some scheduler rewrites [run_process_replay] (#6474) 2024-09-11 16:10:59 +08:00
George Hotz
1cadddee26 Revert "fold lt (#6472)" (#6473)
This reverts commit 81bda4d304.
2024-09-11 15:59:25 +08:00
George Hotz
81bda4d304 fold lt (#6472) 2024-09-11 15:56:57 +08:00
qazal
e645a0e766 allow selecting UPat files in TRACK_MATCH_STATS [run_process_replay] (#6470) 2024-09-11 14:32:46 +08:00
qazal
3cde1503ce enable graph rewrite in the scheduler (#6249)
* test: enable

* skip those

* skip pads tests
2024-09-11 14:30:04 +08:00
chenyu
d9d1ae7248 more lt folding using gcd (#6469) 2024-09-11 02:09:35 -04:00
madt2709
dfe1db1cff Fix typo in docs (#6468)
Co-authored-by: theordias <theo.dias@cresta.ai>
2024-09-11 01:47:26 -04:00
qazal
262569a3eb green conv bw AST_REWRITE=1 (#6466)
* green conv bw AST_REWRITE=1

* new strides and dtype fix
2024-09-11 10:51:24 +08:00
chenyu
15c4d4f406 fold unrolled arange div pattern (#6465) 2024-09-10 22:35:52 -04:00
qazal
4259311006 merge views in conv swizzle (#6464) 2024-09-11 10:11:01 +08:00
George Hotz
6d195fb653 small changes from new style expand [run_process_replay] (#6462) 2024-09-11 09:10:56 +08:00
qazal
803b8b9313 conv bw schedule and correctness tests to iterate on (#6461)
first to fix AST_REWRITE=1, then to implement the same fusion for dtypes.half.
2024-09-11 08:47:07 +08:00
chenyu
b574caadc9 fix UOp const_factor for ADD [run_process_replay] (#6459)
currently not used, fixed for completeness
2024-09-10 20:04:26 -04:00
chenyu
2105832b87 _min_max of MUL of 2 non-positive inputs (#6454) 2024-09-10 07:13:01 -04:00
Francis Lata
b7ce9a1530 UNet3D MLPerf (#3470)
* add training set transforms

* add DICE cross entropy loss

* convert pred and label to Tensor when calculating DICE score

* cleanups and allow train dataset batching

* fix DICE CE loss calculation

* jitted training step

* clean up DICE CE loss calculation

* initial support for sharding

* Revert "initial support for sharding"

This reverts commit e3670813b8.

* minor updates

* cleanup imports

* add support for sharding

* apply temp patch to try to avoid OOM

* revert cstyle changes

* add gradient acc

* hotfix

* add FP16 support

* add ability to train on smaller image sizes

* add support for saving and loading checkpoints + cleanup some various modes

* fix issue with using smaller patch size + update W&B logging

* disable LR_WARMUP_EPOCHS

* updates

* minor cleanups

* cleanup

* update order of transformations

* more cleanups

* realize loss

* cleanup

* more cleanup

* some cleanups

* add RAM usage

* minor cleanups

* add support for gradient accumulation

* cleanup imports

* minor updates to not use GA_STEPS

* remove FP16 option since it's available now globally

* update multi-GPU setup

* add timing logs for training loop

* go back to using existing dataloader and add ability to preprocess data to save time

* clean up optimization and re-enable JIT and multi-GPU support for training and evaluation

* free train and eval steps memory

* cleanups and scale batch size based on the number of GPUs

* fix GlobalCounters import

* fix seed

* fix W&B setup

* update batch size default size

* add back metric divergence check

* put back JIT on UNet3d eval

* move dataset preprocessing inside training code

* add test for dice_loss

* add config logging support to W&B and other cleanups

* change how default float is getting retrieved

* remove TinyJit import duplicate

* update config logging to W&B and remove JIT on eval_step

* no need for caching preprocessed data anymore

* fix how evaluation is ran and how often

* add support for LR scaling

* fix issue with gaussian being moved to scipy.signal.windows

* remove DICE loss unit test

* fix issue where loss isn't compatible with multiGPU

* add individual BEAM control for train and eval steps

* fix ndimage scipy import

* add BENCHMARK

* cleanups on BENCHMARK + fix on rand_flip augmentation during training

* cleanup train and eval BEAM envs

* add checkpointing support after every eval

* cleanup model_eval

* disable grad during eval

* use new preprocessing dataset mechanism

* remove unused import

* use training and inference_mode contexts

* start eval after benchmarking

* add data fetching time

* cleanup decorators

* more cleanups on training script

* add message during benchmarking mode

* realize when reassigning LR on scheduler and update default number of epochs

* add JIT on eval step

* remove JIT on eval_step

* add train dataloader for unet3d

* move checkpointing to be done after every epoch

* revert removal of JIT on unet3d inference

* save checkpoint if metric is not successful

* Revert "add train dataloader for unet3d"

This reverts commit c166d129df.

* Revert "Revert "add train dataloader for unet3d""

This reverts commit 36366c65d2.

* hotfix: seed was defaulting to a value of 0

* fix SEED value

* remove the usage of context managers for setting BEAM and going from training to inference

* support new stack API for calculating eval loss and metric

* Revert "remove the usage of context managers for setting BEAM and going from training to inference"

This reverts commit 2c0ba8d322.

* check training and test preprocessed folders separately

* clean up imports and log FUSE_CONV_BW

* use train and val preprocessing constants

* add kits19 dataset setup script

* update to use the new test decorator for disabling grad

* update kits19 dataset setup script

* add docs on how to train the model

* set default value for BASEDIR

* add detailed instruction about BASEDIR usage

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-10 04:37:28 -04:00
qazal
f4f705a07c can push SWIZZLE through reduce both ways (#6453) 2024-09-10 16:00:50 +08:00
qazal
1347e49e82 second iteration on UOps.SWIZZLE (#6451)
* new swizzle

* fix the failing tests

* test a double swizzle

* ci
2024-09-10 14:43:21 +08:00
chenyu
e0d35e3657 update test_padto_sum_not_ok (#6450)
updated the setup as `exp() < -1` could be folded to False
2024-09-09 22:46:42 -04:00
qazal
95c9fe841e UOp.st infra for the new SWIZZLE (#6449) 2024-09-10 09:39:45 +08:00
qazal
abfbd9fd2f fix Variable init from the DEFINE_VAR refactor (#6448)
prereq for UOps.VALID.
2024-09-10 09:14:29 +08:00
chenyu
fcc69adfc5 simplify c0*x<c1 for negative int c0,c1 (#6431)
* simplify c0*x<c1 for negative int c0,c1

* fine if rhs is zero
2024-09-09 21:05:53 -04:00