Computer Vision News - June 2019

for the first image. Lastly, the pyramid and the depth maps are inserted into a BA-layer that outputs (via the LM algorithm) a dense structure and camera motion parameters via minimizing the feature metric error. We next review each of the steps separately. Feature pyramid: instead of using the pixel intensity value, it might be more robust to use a feature that represents the pixel. The paper exploits the natural multi-scale of deep convolutional networks , in order to construct a feature pyramid. It does that by using the intermediate layers from the backbone DRN-54 architecture. Using the residual block of convolution 1 up to 4, they up- sample a feature map from each layer to create a feature map for the next level. At each sampling, 3x3 convolution is used to concatenate features and reduce the dimension of each level to 128. The final feature pyramid is then F i = [F i 1 ,F i 2 ,F i 3 ]. It is fundamentally a function of the image. This process is best explained through the following figure: Basis of depth maps: At the same time of the feature extraction, the basis depth maps is generated. To cope with the infeasibility of parameterizing directly a dense depth map, the authors use an encoder-decoder scheme to generate a monocular depth learning . They use the DRN-54 network computed in the feature pyramid stage as the encoder. For the decoder, they modify the last Research 6 Research Computer Vision News

RkJQdWJsaXNoZXIy NTc3NzU=