Computer Vision News - September 2024

27 Erik Sandström Computer Vision News Computer Vision News handling input depth noise, identifying suitable scene representations, and improving RGB-only SLAM robustness. To address depth sensor noise, we both investigated approaches for multisensor fusion and for single sensors. The goal of multi-sensor fusion is to generate an output geometry more accurate than what could be achieved with any single sensor in isolation. We phrased this problem as a learning problem of sensor properties to predict an accurate weighting between the sensors in 3D. When using a single depth sensor, we phrased the problem as a self-supervised online learning problem of depth uncertainty, showing improvements over existing methods using offline pretrained strategies. To tackle the question of what 3D scene representation to use, we first noted that existing methods used grid-based data structures for storing the 3D map. This representation does not easily lend itself for global updates of the map as a result of e.g. loop closure. Instead, we developed a point based neural implicit representation for SLAM, where the density of points relates to the information density of the input frames. Furthermore, we coupled this to a robust pose graph optimization formulation, enabling globally consistent pose and map estimation. SLAM without depth from a sensor leads to geometric ambiguities, making it a harder problem to solve. We identified two key components in aiding the performance of such systems. First, we introduced an optimization layer which combines multi-view depth estimation via dense optical flow and monocular depth estimation. Second, we showed how to build globally consistent 3D Gaussian splats by deforming the Gaussians at loop closure and global bundle adjustment. Our code is open source on GitHub at eriksandstroem. I hope my thesis will inspire the next generation researchers to pursue dense SLAM!

RkJQdWJsaXNoZXIy NTc3NzU=