CVPR Daily - Tuesday

Ekin Dogus Cubuk is a research scientist at Google Brain. Before this, he got his PhD in physics, but used to apply machine learning to physical systems. He did the residency at Google Brain which is for people who have an interest in deep learning but are not experts. At Google Brain, he has been working on AutoML applications, mainly for data augmentation. He is a first-timer at CVPR and speaks to us ahead of his first oral presentation. Even as he was doing his PhD, Dogus says it was clear that machine learning had many interesting directions. He says if you look at the research divisions now, all the physics PhDs are using their physics skills in machine learning. They are also interested in using machine learning to study physics, and there’s still an active research area at Google Brain and Google Accelerated Science. Dogus finds that data augmentation is an underutilized tool in deep learning. Although there are many papers that come up with new data augmentation operations – like mixup, Cutout, geometric operations – it’s not clear how you would combine them and get the optimum result. It’s interesting to ask how far you can push the impact of data augmentation on these models. In this paper, they tried to combine the already existing operations and get as good a result as possible on the test set. During the process, they actually found out some more fundamental and important things that should be done for data augmentation. One of them is variability. Diversity in your data augmentation policy, for example. Most of the time when people apply operations, they either apply it to every mini-batch and every image, or they don’t apply it. Cutout is very helpful on some data sets, but how you use it is you apply it to every single image and every single mini-batch. They found that instead of having one strategy, having hundreds of strategies and choosing one of them randomly for each image in each mini-batch actually gives you a huge improvement. If you were to just do that and not use any of AutoAugment’s search capability, you already get a big improvement. AutoAugment: Learning Augmentation Strategies from Data 10 DAILY CVPR Tuesday Presentation “ A lot of the work is focused on architectural improvements, rather than processing data!”