CVPR Daily - Thursday

The authors’ approach is built on the following components: 1) a CNN- based visual odometry module - referred to as VO- that, given two consecutive observations ( , −1 ), predicts the change between t − 1 and t and then updates the goal wrt the current pose ( −1 ) and 2) an RNN- based RL navigation policy module , which is given the estimated pose −1 and the current observation , and predicts the next action . Below you can observe an example of the combined VO+Navigation approach on the validation dataset with performance: SPL = 0.63, Success = 1, SoftSPL = 0.62. The Navigation policy consists of a two-layer Long Short-Term Memory (LSTM) and a half-width ResNet50 encoder. To evaluate the two components separately and understand the impact of localization on navigation, it was trained assuming perfect odometry (hence given ground- truth location) and, only later, the VO module was used to estimate the localization as a drop-in replacement without fine-tuning. With ground- truth localization, the agent achieves 99.8% Success and 80% SPL on Gibson-val PointNav-v2 dataset, showing that visual odometry is a limiting factor to a map-less approach to realistic point goal navigation, while noisy observations and actuations can be overcome easily. The VO module is made of a ResNet encoder followed by a compression block and two fully connected (FC) layers, where BatchNorm is replaced with GroupNorm, and the compression block consists of 3 × 3 Conv2d+GroupNorm+ReLU. It is trained on a static dataset D = {( −1 , , −1 , Δ )} and decoupled from the navigation policy. Ablation experiments to this module included several additions to the basic network and analyzed: ➢ The effect of action embedding, by incorporating knowledge of the action taken between two consecutive observations as an additional input. This is shown to improve performance, because the network received more context to learn more accurate egomotion for each action type. ➢ The effect of training with a larger dataset, which substantially improves performance. 5 DAILY CVPR Ruslan Partsey et al. Thursday UK AINE CORNER