Computer Vision News - October 2016

30 Computer Vision News Research Research The distance transform (DT) pooling is a generalization of the familiar max- pooling operation, the object geometry layer that encodes the relative offsets of DPM parts. More details about that can be found in the paper. Lastly, a multi-component DPM-CNN composed of one DPM-CNN per component and a maxout layer that takes a max over component outputs at each location. The model is trained in two stages: first, the front-end CNN is fitted; second, the DPM-CNN is trained using latent SVM while keeping the front-end CNN fixed. Results: The authors start by comparing HOG with the conv 5 features on different pyramid levels. Next, we evaluate different DPM-CNN settings and compare them to R-CNN. HOG vs conv 5 The input is the same two face image (left). The first row shows a HOG feature pyramid, while the second shows the “face channel” of a conv 5 pyramid. In HOG pyramid level, an appropriately sized template could detect the face. In contrast, the conv 5 face channel is scale selective: it is almost entirely zero (black) in the first pyramid level and it peaks in level 6. Evaluated DPM-CNN Evaluated different DPM-CNN settings: “Can any DPM be formulated as an equivalent CNN?”