* fp16 resnet
* cast running mean and var back to default float
* extra cast
* check symbolic no overflow
* add linearizer failure
* loss scaler after grad contig
* oops
* i think this works
* don't loss scale fp32
* remove overflow test case
* remove symbolic bounds check
* loss scaler should be float
* temporarily disable padto cuz bug
shruggie
* make running stats in batchnorm float32?
* calculate lars stuff in fp32?
* oops
* remove most changes
* move loss scaler out of optimizer
* no more FP16 var
* oops
---------
Co-authored-by: chenyu <chenyu@fastmail.com>