3 Computer Vision News OpenMask3D In an attempt to address and overcome the limitations of a closedvocabulary setting, there has been a growing interest in open-vocabulary approaches, which have the capacity to understand categories which are not present at all in the training set, and respond to free-form queries in a zero-shot manner. Recently proposed open-vocabulary 3D scene understanding approaches, however, typically output a heatmap over the points in the scene given a query, which has limited applications, particularly when one needs to identify object instances. To address these limitations, we introduce the task of open-vocabulary 3D instance segmentation. We propose OpenMask3D, which can perform zeroshot 3D instance segmentation in an open-vocabulary manner (Fig. 1). Let’s take a look at how OpenMask3D works: given a set of posed RGB-D frames together with the reconstructed 3D geometry of the scene, OpenMask3D outputs a set of class-agnostic 3D instance masks, and for each of these instances, it computes a feature vector in the CLIP space (Fig. 2). Figure 1: Given a 3D scene and free-form user queries, our method OpenMask3D segments object instances described by the open-vocabulary queries. Figure 2: Overview of OpenMask3D
RkJQdWJsaXNoZXIy NTc3NzU=