* change reduceop heruistics
* add model ema and jit hack
* add ema eval
* have to create a duplicate eval function for jit
* remove manual seed
* 94% achieveable with normal eval
* ema is outputting the same results as normal
* fix ema bug
* ema achieves 94% with fix seed
* multigpu tested
* constant fold decay, fix jit, adjust message for multigpu
* pull SpeedyResNet out of train_cifar()