Computer Vision News

Figure 1 : An analysis of four model scaling strategies : width scaling ( w ), in which only the width of a base model is scaled; compound scaling ( dwr ), in which the width, depth, and resolution are all scaled in roughly equal proportions; depth and width scaling ( dw ) ; and the proposed fast compound scaling ( dWr ), which emphasizes scaling primarily, but not only, the model width. The left plot shows theoretic behavior of activations w.r.t. different scaling strategies, the right shows the actual runtime of scaling a small network (EfficientNet - B 0 in this case) using these scaling strategies. Observe that : ( 1 ) activations scale at asymptotically different rates for different scaling strategies, ( 2 ) activations are highly predictive of runtime (shape of left and right plots closely matches), and ( 3 ) fast compound scaling ( dWr ) is very fast (nearly as fast as width-only scaling which is the fastest scaling strategy) . Furthermore, the accuracy of dWr scaling matches the accuracy of dwr scaling and easily outperforms w scaling (accuracy not shown) . Hence fast compound scaling is a win-win : it is as accurate as compound scaling and nearly as fast as width only scaling. now. The field has a proliferation of different model architectures that are effective or fast, but rather than just trying to create an effective network, they have been trying to understand the principles that make it effective. So, not just finding a fast or accurate model, but trying to understand that makes models effective and how to scale them. Similarly, at CVPR 2020 , the team presented a paper calle d Designing Network Design Spaces , w hich explored principles for designing effective convolutional neural networks, rather than just looking for effective individual model instances. “ In our community, people have been really pushing absolute numbers, and one key aspect of getting good numbers is the training recipe, ” Piotr explains. “ When you train a neural network, you can add all kinds of augmentations, regularizations, and so on, you tune the learning rate, the weight decay, and other hyperparameters, all this is essential to getting high numbers. And that is Piotr Dollár 19 Best of CVPR 2021

Computer Vision News - July 2021