Computer Vision News

often hidden in a lot of papers, which makes it difficult to compare models and reproduce results. People often report really good numbers and attribute that to the model, but even though the model is good, it is the training recipe that is required to make those numbers great. ” Figure 2 : Fast scaling applied state-of-the-art CNNs at larger scales. Left: error vs flops for various CNNs; right : runtime versus flops. Fast scaling is effective for a number of different CNNs and results in as accurate but faster models than the previous best scaling strategies. In this work, the team wanted to make sure the training recipe was easy to reproduce and not overly complex, but still gave good results. This was challenging because while there are very simple recipes one can use that are easy to reproduce, their absolute numbers are not very good. The team spent a lot of time perfecting a training recipe that was reproducible, worked for a variety of settings and networks, and that generalized to lots of models. Reproducibility is something the community is challenged by in general, so they wanted to make sure they got it right. This was orthogonal to the main goal but is a valuable secondary contribution of the work. A general challenge for the computer vision community has been how to get to a very large scale in terms of data and models . What bigger models should they use? CNNs or ViTs (vision transformers)? And self-supervised learning or supervised datasets? Piotr’s work aims to address one aspect of the puzzle, which is how to scale CNNs to a very large scale, but many challenges and open questions remain. “ I think our community is still in some sense lagging the NLP community , ” Piotr tells us. “ In NLP, self-supervised learning, things like BERT and so on, have been very successful, and their datasets and models are much larger as well. Our community is still trying to figure out what are the right giant models. Our work is on CNNs and now we’re also looking at ViTs, which are a very promising model Presentation 20 Best of CVPR 2021

Computer Vision News - July 2021