Computer Vision News - October 2016

28 Computer Vision News Research Research Deformable Part Models are CNNs Every month, Computer Vision News reviews a research from our field. This month we have chosen to review Deformable Part Models are Convolutional Neural Networks , a research paper showing a synthesis of these two widely used tools for visual recognition. In fact the paper, presented at CVPR 2015 , shows that a deformable part model (which is a graphical model) can be formulated as a convolutional neural networks . We are indebted to the authors ( Ross Girshick , Forrest Iandola , Trevor Darrell and Jitendra Malik ) for allowing us to use their images to illustrate this review. The full paper is here and the source code is here . Background: Deformable Part Models (DPMs) and Convolutional Neural Networks (CNNs) are two common tools for computer visual recognition. They are typically viewed as distinct approaches: DPMs are graphical models (Markov random fields), while CNNs are “black-box” nonlinear classifiers. DPMs typically operate on a scale-space pyramid of gradient orientation feature maps (HOG). Nowadays, the state of the art object detection is done with deep convolutional networks. The authors propose a new method DeepPyramid- DPM in which the HOG features can be replaced with features learned by a fully-convolutional network. Challenge: The challenge is to replace the HOG feature with its CNN counterpart and finding the optimal way mapping each step in the DPM inference algorithm to an equivalent CNN layer. Novelty: The implementation details of CNN are time-consuming and challenging to setup correctly. As a result, HOG-based detectors are still used in a wide range of systems, especially where region-based methods (i.e. poselets) are involved. The DeepPyramid-DPM should therefore be of broad practical interest to the visual recognition community. In addition, DPMs are usually thought of as flat models, but DeepPyramid-DPM suggest that DPMs actually have a second, implicit convolutional layer. Method: A DeepPyramid DPM is a convolutional network that takes as input a color image pyramid and produces as output a pyramid of object detection scores. We will start by defining input and output for both the training and the testing as well as the dataset used.